To read this content please select one of the options below:

Exploration and study of multilingual thesauri automation construction for digital libraries in China

Wen Zeng (IT Support Center, Institute of Scientific and Technical Information of China, Beijing, China)

The Electronic Library

ISSN: 0264-0473

Article publication date: 6 April 2012

881

Abstract

Purpose

The paper aims to explore multilingual thesauri automation construction based on the freely available digital library resources. The key methods and study results are presented in the paper. It also proposes a way that terms are automatically extracted from multilingual parallel corpus.

Design/methodology/approach

The study adopted the technology of natural language processing to analyze the linguistics characteristics of terms, and combined this with statistical analyses to extract the terms from technological documents. The methods consist of automatically extracting and filtering terms, judging and building relationship among terms, building the multilingual parallel corpus, and extracting term pairs between Chinese and foreign languages through calculating their associated probability. The experiments run on the Java test platform.

Findings

The study obtains the following conclusions: finding the similarities and differences between the Chinese thesaurus standard and international thesaurus standard. The methods for automatically extracting terms and building relationships among them are presented. Eventually the multilingual terms' translation sets are generated based on real corpora. The results of the study show that the proposed methods can obtain better performance. The effect of automatic terms' translation alignment method is better than that of traditional IBM model method.

Practical implications

The study results can provide references for further study and application of multilingual thesauri automation construction using Chinese as a pivot.

Originality/value

The paper proposes new ideas on thesaurus automation construction in the digital age. The presented method based on linguistics and statistics is a new attempt. According to the experimental results, this exploration and study is innovative and valuable. In addition, these ideas and methods give a good start for improving information services of the PRC's National Science and Technology Digital Library.

Keywords

Citation

Zeng, W. (2012), "Exploration and study of multilingual thesauri automation construction for digital libraries in China", The Electronic Library, Vol. 30 No. 2, pp. 233-247. https://doi.org/10.1108/02640471211221359

Publisher

:

Emerald Group Publishing Limited

Copyright © 2012, Emerald Group Publishing Limited

Related articles