Chinese text computing
         | | | | | | |      
 
 

Frequency statistics 频率统计

Bigram frequencies and mutual information in Modern Chinese
现代汉语双字组频率和相互信息分值

Note: A bigram may be a nonsense combination of characters.
Pick a corpus: 请选择语料库

Bigram frequency is equal or greater than:
(Enter a number Between 1 and 60,000. For example: 50)

频率最小为(介于1和60,000之间)

Mutual Information value is equal or greater than:
(Enter a number Between 0 and 30. For example: 3.5)

相互信息分值最小为(介于0和30之间)
       

Notes

  • JAVASCRIPT should be enabled within your browser if you want to download the data. The downloading button on this page relies on Javascript to redirect the browser to the downloading page.
  • At this moment, bigram frequency information is only available for two sub-corpora: The general fictions sub-corpus and the news sbu-corpus.
  • There are 973,338 bigrams in the general fiction sub-corpus and 730,067 in the news sub-corpus. It may take some time for the results to display (depending on the display criteria you set). Please wait patiently after you click on the Submit button.
  • To search for individual bigram information, click here!

 

 
Copyright. 1998-2014. Jun Da. jda@mtsu.edu. Page last updated: 2010-09-16