Chinese text computing 
Jun Da at lingua.mtsu.edu  Home  Academic  Chinese computing  Learning Chinese  CALL  System admin  Personal  Contact me 

Computing mutual information statistics for collocationHow mutual information scores are computed?Please refer to http://www.umiacs.umd.edu/users/resnik/nlstat_tutorial_summer1998/Lab_ngrams.html for computing procedure and formula. How to interpret mutual information scores?The following guidelines can be used:
If the scores are high or medium, the collocation strength is strong. If MI is below 1, it is less likely that the two tokens are related. MI scores between 1 and 3 are in the gray area. My intuitive judegement of the bigram lists with MI score larger than 2.5 appear to be bisyllabic words in Chinese, though such intuition needs to be verified. Other statistical measures of collocationOther statistical measures such as tscore, likelihood ratio, chisquare and Yule's Y are often used to measure collocation strength. For an introduction and comparison of those measures, please refer to, among others:
