This
website provides character frequency lists generated from a large corpus of Chinese texts collected from online sources. It also provides bigram frequency lists as well as individual mutual information scores
generated from two sub-corpra.
Documentation: Technical report about corpus data collection, computing procedure
and tools;
Chinese computing FAQ: Tutorials on how to display and edit Chinese on Windows
and Mac platforms.Chinese character segmenting scripts. Notes on Chinese encoding
standards;
Other online resources: Annotated links to websites which provide information
on statistical concepts and methods and Chinese encoding standards, etc.;