To read this content please select one of the options below:

Automatic thesaurus for enhanced Chinese text retrieval

Schubert Foo (Schubert Foo is the Head and Associate Professor of the Division of Information Studies at Nanyang Technological University, Singapore. )
Siu Cheung Hui (Siu Cheung Hui is an Associate Professor in the Division of Software Systems at the Nanyang Technological University, Singapore.)
Hong Koon Lim (Hong Koon Lim is a Computer Engineering Graduate from the Nanyang Technological University, Singapore. )
Li Hui (Li Hui is a Library and Information Science Graduate from Peking University. )

Library Review

ISSN: 0024-2535

Article publication date: 1 July 2000



Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain. The quality of IR systems has traditionally been judged by the system’s retrieval effectiveness which, in turn, is commonly measured by data recall and data precision. This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user’s queries to enhance retrieval effectiveness. In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent. The automatic thesaurus is generated by computing the co‐occurrence values between domain‐specific terms found in a document collection. These co‐occurrence values are in turn derived from the term and document frequencies of the terms. A set of experiments was subsequently carried out on a document test set to evaluate the applicability of the thesaurus. Results obtained from these experiments confirmed that such an automatic generated thesaurus is able to improve the retrieval effectiveness of a Chinese IR system.



Foo, S., Cheung Hui, S., Koon Lim, H. and Hui, L. (2000), "Automatic thesaurus for enhanced Chinese text retrieval", Library Review, Vol. 49 No. 5, pp. 230-240.




Copyright © 2000, MCB UP Limited

Related articles