Jun Da's WebCentral

Home | Academic | Chinese | CALL | Systems | Personal | Contact

 

Chinese text computing

(This is the 1998 version. An updated 2004 version is now available)

 

Jun Da - Chinese text computing
Welcome! This website provides character frequency lists generated from 110 megabytes of Chinese text corpus. It also provides bigram lists as well as mutual information statistics generated from two sub-corpra. More work is underway that studies concordance and collocation (e.g., bigrams and trigrams) of individual Chinese characters using such statistics as mutual information, likelihood ratio, chi-square measures and HMM probabilities, etc.. Statistical results will be posted as they become available. The materials presented here are related to my research on automatic word/phrase identification and acquisition.

Browse hanzi frequency lists
Query individual hanzi information

Browse mutual information stats
Query mutual information scores

 

Chinese Computing Site Map

Chinese Text Computing Sitemap
Title page
Introduction
Statistics
Search
Technical notes
Chinese computing FAQ
Relevant links
Suggestions
What's new
Copyright notice
My homepage

Copyright. 1998-2000. Jun Da. jda@mtsu.edu