Jun Da's WebCentral

Home | Academic | Chinese | CALL | Systems | Personal | Contact


Chinese text computing

(This is the 1998 version. An updated 2004 version is now available)


Jun Da - Chinese text computing
Welcome! This website provides character frequency lists generated from 110 megabytes of Chinese text corpus. It also provides bigram lists as well as mutual information statistics generated from two sub-corpra. More work is underway that studies concordance and collocation (e.g., bigrams and trigrams) of individual Chinese characters using such statistics as mutual information, likelihood ratio, chi-square measures and HMM probabilities, etc.. Statistical results will be posted as they become available. The materials presented here are related to my research on automatic word/phrase identification and acquisition.

Browse hanzi frequency lists
Query individual hanzi information

Browse mutual information stats
Query mutual information scores


Chinese Computing Site Map

Chinese Text Computing Sitemap
Title page
Technical notes
Chinese computing FAQ
Relevant links
What's new
Copyright notice
My homepage

Copyright. 1998-2000. Jun Da. jda@mtsu.edu