Search results1 – 10 of over 1000
The paper aims to explore multilingual thesauri automation construction based on the freely available digital library resources. The key methods and study results are…
The paper aims to explore multilingual thesauri automation construction based on the freely available digital library resources. The key methods and study results are presented in the paper. It also proposes a way that terms are automatically extracted from multilingual parallel corpus.
The study adopted the technology of natural language processing to analyze the linguistics characteristics of terms, and combined this with statistical analyses to extract the terms from technological documents. The methods consist of automatically extracting and filtering terms, judging and building relationship among terms, building the multilingual parallel corpus, and extracting term pairs between Chinese and foreign languages through calculating their associated probability. The experiments run on the Java test platform.
The study obtains the following conclusions: finding the similarities and differences between the Chinese thesaurus standard and international thesaurus standard. The methods for automatically extracting terms and building relationships among them are presented. Eventually the multilingual terms' translation sets are generated based on real corpora. The results of the study show that the proposed methods can obtain better performance. The effect of automatic terms' translation alignment method is better than that of traditional IBM model method.
The study results can provide references for further study and application of multilingual thesauri automation construction using Chinese as a pivot.
The paper proposes new ideas on thesaurus automation construction in the digital age. The presented method based on linguistics and statistics is a new attempt. According to the experimental results, this exploration and study is innovative and valuable. In addition, these ideas and methods give a good start for improving information services of the PRC's National Science and Technology Digital Library.
The paper introduces bibliometrics to the research area of knowledge organization – more precisely in relation to construction and maintenance of thesauri. As such, the…
The paper introduces bibliometrics to the research area of knowledge organization – more precisely in relation to construction and maintenance of thesauri. As such, the paper reviews related work that has been of inspiration for the assembly of a semi‐automatic, bibliometric‐based, approach for construction and maintenance. Similarly, the paper discusses the methodical considerations behind the approach. Eventually, the semi‐automatic approach is used to verify the applicability of bibliometric methods as a supplement to construction and maintenance of thesauri. In the context of knowledge organization, the paper outlines two fundamental approaches to knowledge organization, that is, the manual intellectual approach and the automatic algorithmic approach. Bibliometric methods belong to the automatic algorithmic approach, though bibliometrics do have special characteristics that are substantially different from other methods within this approach.
With the growing recognition that thesauri aid information retrieval, organisations are beginning to adopt, and in many cases, create thesauri. This paper offers some guidance on the construction process.
An opinion piece with a practical focus, based on recent experiences gleaned from consultancy work.
A number of steps can be taken to ensure any thesaurus under construction is fit for purpose. Due consideration is therefore given to aspects such as term selection, structure and notation, thesauri standards, software and Web display issues, thesauri evaluation and maintenance. This paper also notes that creating new subject schemes from scratch, however attractive, contributes to the plethora of terminologies currently in existence and can limit user searching within particular contexts. The decision to create a “new” thesaurus should therefore be taken carefully and observance of standards is paramount.
This paper offers advice to assist practitioners in the development of thesauri.
Useful guidance for those practitioners new to the area of thesaurus construction is provided, together with an overview of selected key processes involved in the construction of a thesaurus.
The BSI ROOT Thesaurus has been developed first of all as a comprehensive indexing and searching tool for technological applications and secondly as a labour‐saving device…
The BSI ROOT Thesaurus has been developed first of all as a comprehensive indexing and searching tool for technological applications and secondly as a labour‐saving device in the construction of further thesauri. The classified section of the thesaurus was prepared first and input to the computer, where it is held in subject order, and from the subject file the alphabetical section was generated entirely automatically. In future thesaurus projects it will be possible to make use of the system developed and/or all or part of the existing subject schedules.
The second edition of the Bibliographic Classification of H. E. Bliss (BC2), being prepared under the editorship of Jack Mills, Vanda Broughton and others, is a rich…
The second edition of the Bibliographic Classification of H. E. Bliss (BC2), being prepared under the editorship of Jack Mills, Vanda Broughton and others, is a rich source of structure and terminology for thesauri covering different subject fields. The new edition employs facet analysis and is thesaurus‐compatible. A number of facet‐based thesauri have drawn upon Bliss for terms and relationships. In two of these thesauri the Bliss Classification was the source of both systematic and alphabetical displays. The DHSS‐DATA thesaurus, published by the United Kingdom Department of Health and Social Security, provides controlled terms and Bliss class numbers for indexing and searching the DHSS‐DATA database. The ECOT thesaurus (Educational courses and occupations thesaurus) prepared for the Department of Education and Science, uses the software designed for the British Standards Institution ROOT thesaurus to generate an alphabetical display from the systematic display derived from the Bliss schedules. Problems, benefits, and future prospects of Bliss‐based thesaurus construction are discussed.
IThis review has been sponsored by the Office for Scientific and Technical Information and the end product of the complete research will be a thesaurus of management…
IThis review has been sponsored by the Office for Scientific and Technical Information and the end product of the complete research will be a thesaurus of management terms. Parallel research in the business management area and also supported by OSTI is being conducted by David Dews, Librarian of the Manchester Business School, and K. D. C. Vernon, Librarian of the London Graduate School of Business Studies. As Mr Vernon is at present engaged in the construction of a faceted classification scheme for management, this investigation has concentrated on the possibility of utilizing faceted techniques to construct such a thesaurus.
A recent Aslib Research Department Project which investigated problems relating to the construction of thesauri for indexing and retrieval ended with two publications, to…
A recent Aslib Research Department Project which investigated problems relating to the construction of thesauri for indexing and retrieval ended with two publications, to be published shortly by Aslib. During the project, extensive use was made of the thesauri held in the Aslib Library, and information about them was tabulated. Information concerning openly available thesauri is displayed below.
Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain. The quality of IR systems…
Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain. The quality of IR systems has traditionally been judged by the system’s retrieval effectiveness which, in turn, is commonly measured by data recall and data precision. This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user’s queries to enhance retrieval effectiveness. In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent. The automatic thesaurus is generated by computing the co‐occurrence values between domain‐specific terms found in a document collection. These co‐occurrence values are in turn derived from the term and document frequencies of the terms. A set of experiments was subsequently carried out on a document test set to evaluate the applicability of the thesaurus. Results obtained from these experiments confirmed that such an automatic generated thesaurus is able to improve the retrieval effectiveness of a Chinese IR system.
The Sport Database is one of the most deservedly popular information tools in the field of physical activity and sport. One reason for its popularity is that the documents contained in it are received from throughout the world. But, as often happens, our deficiencies are the consequence of our merits. Precisely this wide geographical scope and the database's constant growth, combined with the isolation of indexers and the weak coordination of their work, can make problems for the database's constructors as well as users. Under such circumstances the quality of its main indexing and searching instrument — the Sport Thesaurus — acquires great significance. It must be noted that this tool exists both in printed form (Sport Thesaurus 1994 Edition) and on optical disc (Sport Discus 1975‐June 1995), and the differences between these two versions of the same thing are often substantial. One would like to hope that their constant improvement is the main reason for this situation, but some examples make one doubt it. From now on the printed version will be called ‘edition’ and the CDROM version ‘disc’. The insertion of the huge database SIRLS into the Sport Database, which took place some time ago without changing the database's specific indexing, was taken into account in all calculations. In this paper I want not only to analyse some basic deficiencies of this thesaurus and to trace their manifestations in the database, but to propose some ways it could be improved. I hope that they will be helpful for the users of the Sport Database as well as other databases on optical discs.
Within the framework of a research project into alternative ways of representing documentation languages and into their flexibility, an attempt is made to draw up a list…
Within the framework of a research project into alternative ways of representing documentation languages and into their flexibility, an attempt is made to draw up a list of performance criteria that an ‘ideal’ thesaurus graphic display should respect. However, a study of the main bibliographies listing thesauri, shows that less than 6 per cent of them contain graphic displays, even though a concurrent literature survey reveals that such displays offer many potential advantages. Up to now, use of displays was probably limited by technology and by the rarity of studies into the cognitive processes of the users of automated systems. Current research in several disciplines (computer graphics, ergonomic psychology and spatial representation) should contribute to the emergence of new types of documentation retrieval tools, well adapted to a broader and more diversified clientele.