Chinese text computing
         | | | | | |      
 
 

Technical report

Under preparation!

Page last updated: 2010-09-16

In the meantime, please refer to my paper (pdf format) for detailed information about this project:

Note that the pdf file contains Simplified Chinese characters. You need to have Acrobat Chinese support package installed on your computer in order to view the file properly. For help, check out http://www.adobe.com/products/acrobat/acrrasianfontpack.html.

1. Data collection

1.1 Overview

1.2 Sources of Chinese texts

1.3 Sampling method

2. Data processing

2.1 The data set

2.2 Procedure

2.2.1 Pre-processing

2.2.2 Segmenting characters

2.2.3 Making bigrams

3. Results

3.1 Character frequencies

3.2 Bigram frequencies and other statistics

4. Discussions

5. Further information

 
Copyright. 1998-2024. Jun Da. jda@mtsu.edu. Page last updated: 2010-09-16