Search results

1 – 10 of over 13000
Article
Publication date: 1 April 1975

B.J. FIELD

A number of techniques have been studied for the automatic assignment of controlled subject headings and classifications from free indexing. These techniques involve the automatic

Abstract

A number of techniques have been studied for the automatic assignment of controlled subject headings and classifications from free indexing. These techniques involve the automatic manipulation and truncation of the free‐index phrases assigned to a document and the use of a manually‐constructed thesaurus and automatically‐generated dictionaries together with statistical ranking and weighting methods. These are based on the use of a statistically‐generated ‘adhesion coefficient’ which reflects the degree of association between the free‐indexing terms, the controlled subject headings, and the classifications. By the analysis of a large sample of manually‐indexed documents the system generates dictionaries of free‐language and controlled‐language terms together with their associated classifications and adhesion coefficients. Having learnt from the manually‐indexed documents the system uses these dictionaries in the subsequent automatic classification procedure. The accuracy and cost‐effectiveness of the automatically‐assigned subject headings and classifications has been compared with that of the manual system. The results were encouraging and the costs comparable to those of a manual system.

Details

Journal of Documentation, vol. 31 no. 4
Type: Research Article
ISSN: 0022-0418

Abstract

Details

Automated Information Retrieval: Theory and Methods
Type: Book
ISBN: 978-0-12266-170-9

Article
Publication date: 1 April 1974

KAREN SPARCK JONES

This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in…

Abstract

This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in searching, and for generating the index language used for these purposes. It concentrates on the literature from 1968 to 1973. Section I defines the topic and its context. Sections II and III consider work in syntax and semantics respectively in detail. Section IV comments on ‘indirect’ indexing. Section V briefly surveys operating mechanized systems. In Section VI major experiments in automatic indexing are reviewed, and Section VII attempts an overall conclusion on the current state of automatic indexing techniques.

Details

Journal of Documentation, vol. 30 no. 4
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 6 May 2014

Hollie White, Craig Willis and Jane Greenberg

The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information…

Abstract

Purpose

The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE had on consistency; and third, challenges associated with using HIVE.

Design/methodology/approach

A within-subjects quasi-experimental research design was used for this study. Data were collected using a task-scenario based questionnaire. Analysis was performed on consistency results using Hooper's and Rolling's inter-indexer consistency measures. A series of t-tests was used to judge the significance between consistency measure results.

Findings

Results suggest that HIVE improves inter-indexing consistency. Working with HIVE increased consistency rates by 22 percent (Rolling's) and 25 percent (Hooper's) when selecting relevant terms from all vocabularies. A statistically significant difference exists between the assignment of free-text keywords and machine-aided keywords. Issues with homographs, disambiguation, vocabulary choice, and document structure were all identified as potential challenges.

Research limitations/implications

Research limitations for this study can be found in the small number of vocabularies used for the study. Future research will include implementing HIVE into the Dryad Repository and studying its application in a repository system.

Originality/value

This paper showcases several features used in HIVE system. By using traditional consistency measures to evaluate a semantic web technology, this paper emphasizes the link between traditional indexing and next generation machine-aided indexing (MAI) tools.

Details

Journal of Documentation, vol. 70 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 15 June 2015

Mari Vállez, Rafael Pedraza-Jiménez, Lluís Codina, Saúl Blanco and Cristòfol Rovira

The purpose of this paper is to describe and evaluate the tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents. The tool works by identifying and…

Abstract

Purpose

The purpose of this paper is to describe and evaluate the tool DigiDoc MetaEdit which allows the semi-automatic indexing of HTML documents. The tool works by identifying and suggesting keywords from a thesaurus according to the embedded information in HTML documents. This enables the parameterization of keyword assignment based on how frequently the terms appear in the document, the relevance of their position, and the combination of both.

Design/methodology/approach

In order to evaluate the efficiency of the indexing tool, the descriptors/keywords suggested by the indexing tool are compared to the keywords which have been indexed manually by human experts. To make this comparison a corpus of HTML documents are randomly selected from a journal devoted to Library and Information Science.

Findings

The results of the evaluation show that there: first, is close to a 50 per cent match or overlap between the two indexing systems, however, if you take into consideration the related terms and the narrow terms the matches can reach 73 per cent; and second, the first terms identified by the tool are the most relevant.

Originality/value

The tool presented identifies the most important keywords in an HTML document based on the embedded information in HTML documents. Nowadays, representing the contents of documents with keywords is an essential practice in areas such as information retrieval and e-commerce.

Details

Library Hi Tech, vol. 33 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 March 1969

SUSAN ARTANDI

An automatic indexing method is described in which index tags for documents are generated by the computer. The computer scans the text of periodical articles and automatically…

Abstract

An automatic indexing method is described in which index tags for documents are generated by the computer. The computer scans the text of periodical articles and automatically assigns to them index terms with their respective weights on the basis of explicitly defined text characteristics. A machine file of document references with their associated index terms is automatically produced which can be searched on a co‐ordinate basis for the retrieval of specified drug‐related information.

Details

Journal of Documentation, vol. 25 no. 3
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 October 2004

Jesper W. Schneider and Pia Borlund

The paper introduces bibliometrics to the research area of knowledge organization – more precisely in relation to construction and maintenance of thesauri. As such, the paper…

2758

Abstract

The paper introduces bibliometrics to the research area of knowledge organization – more precisely in relation to construction and maintenance of thesauri. As such, the paper reviews related work that has been of inspiration for the assembly of a semi‐automatic, bibliometric‐based, approach for construction and maintenance. Similarly, the paper discusses the methodical considerations behind the approach. Eventually, the semi‐automatic approach is used to verify the applicability of bibliometric methods as a supplement to construction and maintenance of thesauri. In the context of knowledge organization, the paper outlines two fundamental approaches to knowledge organization, that is, the manual intellectual approach and the automatic algorithmic approach. Bibliometric methods belong to the automatic algorithmic approach, though bibliometrics do have special characteristics that are substantially different from other methods within this approach.

Details

Journal of Documentation, vol. 60 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 January 1976

K. SPARCK JONES and C.J. VAN RIJSBERGEN

Many retrieval experiments have been based on inadequate test collections, and current research is hampered by the lack of proper collections. This short review does not attempt a…

Abstract

Many retrieval experiments have been based on inadequate test collections, and current research is hampered by the lack of proper collections. This short review does not attempt a fully documented survey of all the collections used in the past decade: hopefully representative examples have been studied to throw light on the requirements test collections should meet, to show how past collections have been defective, and to suggest guidelines for a future ‘ideal’ test collection. The specifications for this collection can be taken as an indirect comment on our present state of knowledge of major retrieval system variables, and experience in conducting experiments.

Details

Journal of Documentation, vol. 32 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 March 1998

David Ellis, Nigel Ford and Jonathan Furner

For the purposes of this article, the indexing of information is interpreted as the pre‐processing of information in order to enable its retrieval. This definition thus spans a…

730

Abstract

For the purposes of this article, the indexing of information is interpreted as the pre‐processing of information in order to enable its retrieval. This definition thus spans a dimension extending from classification‐based approaches (pre‐co‐ordinate) to keyword searching (post‐co‐ordinate). In the first section we clarify our use of terminology, by briefly describing a framework for modelling IR systems in terms of sets of objects, relationships and functions. In the following three sections, we discuss the application of indexing functions to document collections of three specific types: (1) ‘conventional’ text databases; (2) hypertext databases; and (3) the World Wide Web, globally distributed across the Internet.

Details

Journal of Documentation, vol. 54 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 April 1967

W.J. HUTCHINS

Given the demonstrable deficiencies of indexing and indexes as means of document analysis and selection, a system is proposed which matches uncondensed and unanalysed texts with…

Abstract

Given the demonstrable deficiencies of indexing and indexes as means of document analysis and selection, a system is proposed which matches uncondensed and unanalysed texts with search requests and semantically equivalent transformations derived from them. The method utilizes the results of machine translation and structural linguistics in syntactic analysis and in semantic classification with adaptations to the requirements of a document selection system.

Details

Journal of Documentation, vol. 23 no. 4
Type: Research Article
ISSN: 0022-0418

1 – 10 of over 13000