Chinese text computing
         | | | | | |      

Consolidated word list based on 6 online sources

( Page last updated: 2010-09-16 )

To view each individual list, click on the numbers in the Subtotal row.

Character/words/phrases

HSK

CEDICT

Robert

Word85

ICTCLAS

Richwin

Consolidated

Single character

1866

6851

 

 

 

 

 

Two-character

6373

12944

23167

11014

43164

73396

82532

Three-character

306

2686

3692

636

17877

19411

31965

Four-character

188

1983

2651

682

9287

25868

30806

More than four characters

10

563

487

14

465

1654

2529

Subtotal

8743

25027

29997

12346

70793

120329

147832

  1. HSK: Electronic version of the HSK vocabulary list was retrieved from http://www.chinese-forums.com/vocabulary/ on 2005-05-25. While the official count is 8.882, the online version contains 8,743 entries, excluding _极了, _之间, 从_起, 从_出发, 从不/没, 非_不可, 越来越_, 对_来说, 拿_来说 and 从_看来. If the 10 entries are included, the total is 8,753.
  2. CEDICT: CEDICT was created by Paul Denisowski and is currently maintained by Erik Peterson. Data from CEDICT was retrieved from http://www.mandarintools.com/cedict.html on 2005-05-20.
  3. Robert: Robert's consolidated list was retrieved from http://kamares.ucsd.edu/~arobert/chinese_f.html on 2005-05-25.
  4. Word85: Word frequency list from Chinese Word Frequency Statistics and Analysis (《汉语词频的统计和分析》) by Beijing Language and Culture University (formerly Beijing Institute of Languages) was retrieved from the Chinese Pinyin and Input Method Forum (〖汉语拼音与输入法论坛〗) at http://sh.netsh.com/bbs/1951/. The list was posted by Fengzi (冯子) on 2002-01-11.
  5. ICTCLAS: Information about ICTCLAS (中科院计算所汉语词法分析系统) can be found at http://mtgroup.ict.ac.cn/~zhp/ICTCLAS/index.html. Vocabulary data incorporated in our consolidated list was retrieved from http://download.pchome.net/php/dl.php?sid=12405 on 2005-05-20. The original vocabulary data are intended for automatic segmentation of Chinese words and phrases in running texts and may contain some entries that are portions of a word or phrase.
  6. Richwin: Word and phrase list from Richwin was retrieved from http://technology.chtsai.org/wordlist/duoyuanpinyin.zip on 2005-04-30. The list is intended for Chinese input used in the Richwin system and hence may contain entries that are portions of words or phrases.