Search results

1 – 10 of over 13000

Abstract

Details

Automated Information Retrieval: Theory and Methods
Type: Book
ISBN: 978-0-12266-170-9

Article
Publication date: 1 April 1974

KAREN SPARCK JONES

This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in…

Abstract

This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in searching, and for generating the index language used for these purposes. It concentrates on the literature from 1968 to 1973. Section I defines the topic and its context. Sections II and III consider work in syntax and semantics respectively in detail. Section IV comments on ‘indirect’ indexing. Section V briefly surveys operating mechanized systems. In Section VI major experiments in automatic indexing are reviewed, and Section VII attempts an overall conclusion on the current state of automatic indexing techniques.

Details

Journal of Documentation, vol. 30 no. 4
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 August 1997

A. Macfarlane, S.E. Robertson and J.A. Mccann

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text…

Abstract

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text retrieval. We analyse parallel IR systems using a classification defined by Rasmussen and describe some parallel IR systems. We give a description of the retrieval models used in parallel information processing. We describe areas of research which we believe are needed.

Details

Journal of Documentation, vol. 53 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 April 1993

Niël van der Merwe

This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an…

Abstract

This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an electronic document database that broadens the user's capability to retrieve relevant information more accurately, without going through costly processes to get paper documents into electronic text. The principles of document image processing systems, as well as the problems and shortcomings of most of today's document image processing systems, will be discussed. Then concept retrieval as the latest development in text retrieval will be discussed, with specific reference to the ability of the TOPIC intelligent text retrieval system to allow users to build up a knowledge base of search objects or concepts that can be used at any point in time by all users for the system. This paper will further specifically look at the automatic processing of paper documents by converting the scanned document image pages through to electronic text. The use of optical character recognition technology, the indexing and loading of the documents in a text database, the automatic linking of the documents to the related document images and the retrieval technology available in TOPIC, specifically the TYPO operator that was developed to handle so‐called dirty data such as the common misspellings, character transpositions and ‘dirty’ text received as output from the OCR process, will be discussed. A possible solution to load paper documents quickly and cost‐effectively into an electronic document database will be discussed and demonstrated in detail. The advantages and disadvantages of this approach will be discussed with specific reference to an electronic news clipping service application.

Details

The Electronic Library, vol. 11 no. 4/5
Type: Research Article
ISSN: 0264-0473

Article
Publication date: 1 April 2001

D.C. Veal Doverton

Present and possible future developments in the techniques of document management are reviewed, the major ones being text retrieval and scanning and OCR. Acquisition, indexing and…

1496

Abstract

Present and possible future developments in the techniques of document management are reviewed, the major ones being text retrieval and scanning and OCR. Acquisition, indexing and thesauri, publishing and dissemination and the document management industry are also addressed. The emerging standards are reviewed and the impact of the Internet is analysed.

Details

Journal of Documentation, vol. 57 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 March 1991

Suliman Al‐Hawamdeh, Rachel de Vere, Geoff Smith and Peter Willett

Full‐text documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In…

Abstract

Full‐text documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In this paper, we summarise the results to date of a continuing programme of research at the University of Sheffield to investigate the use of nearest‐neighbour retrieval algorithms for full‐text searching. Given a natural‐language query statement, our methods result in a ranking of the paragraphs comprising a full‐text document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query. A full‐text document test collection has been created to allow systematic tests of retrieval effectiveness to be carried out. Experiments with this collection demonstrate that nearest‐neighbour searching provides a means for paragraph‐based access to full‐text documents that is of comparable effectiveness to both Boolean and hypertext searching and that index term weighting schemes which have been developed for the searching of bibliographical databases can also be used to improve the effectiveness of retrieval from full‐text databases. A current project is investigating the extent to which a paragraph‐based full‐text retrieval system can be used to augment the explication facilities of an expert system on welding.

Details

Online Review, vol. 15 no. 3/4
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 February 1999

Jennifer Rowley

Document publishing systems are systems that support the creation, storage and subsequent retrieval and dissemination of documents and/or document representation or metadata. They…

1767

Abstract

Document publishing systems are systems that support the creation, storage and subsequent retrieval and dissemination of documents and/or document representation or metadata. They are widely used in information retrieval applications, and in particular, are important in supporting the publication of documents on CD‐ROM or the Web. The publication process involves the following stages: identify content, database set‐up, populate database, publish, process search requests and view/download original. Document publishing systems fall into two categories: those that have developed from text management systems, and those that had their origins in document creation; this gives rise to systems with different ranges of facilities in areas such as data entry and document creation, information retrieval and security. Special issues associated, respectively, with publication on CD‐ROM and through the Web are considered. Future issues for document publishing systems include workgroup publishing, hybrid publication, globalisation, integration and seamless document publishing and management, and further integration with Web server technology.

Details

Online and CD-Rom Review, vol. 23 no. 1
Type: Research Article
ISSN: 1353-2642

Keywords

Article
Publication date: 1 March 1981

Jon Bing

After discussing the characteristics of text retrieval systems, their development is outlined. An explanation is given of the information requirements of lawyers in relation to…

Abstract

After discussing the characteristics of text retrieval systems, their development is outlined. An explanation is given of the information requirements of lawyers in relation to legal databases. The work undertaken by the Norwegian Research Centre (NRCCL) is described. Details are given of the use made of STATUS and the redesigned version subsequently called NOVA*STATUS, which has been used to run various services as part of the NORIS (previously JURIS) research programme for text retrieval systems. Various research results from the NORIS work are described. Details are given of the controlled experiments in evaluating text retrieval systems performance and the ways of formulating searches. Results are discussed in terms of precision and recall. A particular feature of the investigations relate to the use of ranked output and the criteria for ranking. A new software package called SIFT is how being developed for use on a NOR minicomputer, in the light of the experiences with STATUS and the NORIS research results. 17 refs.

Details

Program, vol. 15 no. 3
Type: Research Article
ISSN: 0033-0337

Article
Publication date: 1 January 1993

Ankie Visschedijk and Forbes Gibb

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…

Abstract

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.

Details

Online and CD-Rom Review, vol. 17 no. 1
Type: Research Article
ISSN: 1353-2642

Keywords

Article
Publication date: 1 January 1981

T. RADECKI

A new method of document retrieval is presented on the basis of fundamental fuzzy set theory operations and the notion of a semantic disjunctive normal form. Concepts of semantic…

Abstract

A new method of document retrieval is presented on the basis of fundamental fuzzy set theory operations and the notion of a semantic disjunctive normal form. Concepts of semantic normal forms are defined, i.e. the semantic disjunctive normal form and the semantic conjunctive normal form, and their elementary properties, are presented. The syntax and the semantics of the proposed document retrieval language are given and an algorithm for allocating documents to particular queries is described. The document retrieval strategy based on the concept of a semantic disjunctive normal form is exemplified. A basic advantage of the use of the fuzzy set theory for the document retrieval system description is that it takes, in a simple way, into consideration the differentiation of descriptor importance, document search patterns and the differentiation of formal relevance grades of individual documents to a given query. In an information system the documents of the highest grades of formal relevance to a given query are retrieved by means of the application of simple operations of the fuzzy set theory.

Details

Kybernetes, vol. 10 no. 1
Type: Research Article
ISSN: 0368-492X

1 – 10 of over 13000