Search results

1 – 10 of over 10000
Article
Publication date: 1 March 1991

Suliman Al‐Hawamdeh, Rachel de Vere, Geoff Smith and Peter Willett

Fulltext documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In…

Abstract

Fulltext documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In this paper, we summarise the results to date of a continuing programme of research at the University of Sheffield to investigate the use of nearest‐neighbour retrieval algorithms for fulltext searching. Given a natural‐language query statement, our methods result in a ranking of the paragraphs comprising a fulltext document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query. A fulltext document test collection has been created to allow systematic tests of retrieval effectiveness to be carried out. Experiments with this collection demonstrate that nearest‐neighbour searching provides a means for paragraph‐based access to fulltext documents that is of comparable effectiveness to both Boolean and hypertext searching and that index term weighting schemes which have been developed for the searching of bibliographical databases can also be used to improve the effectiveness of retrieval from fulltext databases. A current project is investigating the extent to which a paragraph‐based fulltext retrieval system can be used to augment the explication facilities of an expert system on welding.

Details

Online Review, vol. 15 no. 3/4
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 February 1985

Carol Tenopir

Complete texts of many journals are now available for online searching. Most of these full text databases have been made available on the same or similar search systems that…

Abstract

Complete texts of many journals are now available for online searching. Most of these full text databases have been made available on the same or similar search systems that provide access to bibliographic information. The systems use inverted files that retain limited context information (e.g., paragraphs and location of words within paragraphs). The retrieval techniques used are simply those that were developed earlier for bibliographic databases. Retrieval relies on Boolean logic, word stem searching with truncation, and word proximity specification. Minor adjustments have been made for the display of full text databases, allowing words resulting in retrieval to be displayed in context; but changes have not been made in retrieval techniques. This is due to the reliance on search systems that provide access to many types of databases, all of which are by‐products of improved techniques for creating printed publications.

Details

Online Review, vol. 9 no. 2
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 June 1991

Moid A. Siddiqui

This article describes the fastest growing category of machine‐readable data‐bases — fulltext databases. A selection of articles from the literature on fulltext databases was…

Abstract

This article describes the fastest growing category of machine‐readable data‐bases — fulltext databases. A selection of articles from the literature on fulltext databases was explored and this provides a basis for the information presented here on search strategy, performance measurement, and benefits and limitations of fulltext databases. Various use studies and uses of fulltext databases have also been listed.

Details

Online Review, vol. 15 no. 6
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 January 1996

PETER INGWERSEN

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a…

2532

Abstract

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic fulltext entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data fusion in IR. By explicitly incorporating all the cognitive structures participating in the interactive communication processes during IR, the cognitive theory provides a comprehensive view of these processes. It encompasses the ad hoc theories of text retrieval and IR techniques hitherto developed in mainstream retrieval research. It has elements in common with van Rijsbergen and Lalmas' logical uncertainty theory and may be regarded as compatible with that conception of IR. Epistemologically speaking, the theory views IR interaction as processes of cognition, potentially occurring in all the information processing components of IR, that may be applied, in particular, to the user in a situational context. The theory draws upon basic empirical results from information seeking investigations in the operational online environment, and from mainstream IR research on partial matching techniques and relevance feedback. By viewing users, source systems, intermediary mechanisms and information in a global context, the cognitive perspective attempts a comprehensive understanding of essential IR phenomena and concepts, such as the nature of information needs, cognitive inconsistency and retrieval overlaps, logical uncertainty, the concept of ‘document’, relevance measures and experimental settings. An inescapable consequence of this approach is to rely more on sociological and psychological investigative methods when evaluating systems and to view relevance in IR as situational, relative, partial, differentiated and non‐linear. The lack of consistency among authors, indexers, evaluators or users is of an identical cognitive nature. It is unavoidable, and indeed favourable to IR. In particular, for fulltext retrieval, alternative semantic entities, including Salton et al.'s ‘passage retrieval’, are proposed to replace the traditional document record as the basic retrieval entity. These empirically observed phenomena of inconsistency and of semantic entities and values associated with data interpretation support strongly a cognitive approach to IR and the logical use of polyrepresentation, cognitive overlaps, and both data fusion and data diffusion.

Details

Journal of Documentation, vol. 52 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 May 1985

C.R. Watters, M.A. Shepherd, E.W. Grundke and P. Bodorik

Although the Boolean combination of keywords and/or subject codes is the predominant access method for the retrieval of passages from fulltext databases, menu access is an…

Abstract

Although the Boolean combination of keywords and/or subject codes is the predominant access method for the retrieval of passages from fulltext databases, menu access is an attractive alternative. The selection of an access method and the ensuing satisfaction with the results is based on the type of query and on the experience and knowledge of the user. This paper describes a prototype system which has integrated Boolean, menu, and direct access methods for the retrieval of passages from fulltext databases. The integration is based on the hierarchical structure inherent in such databases as legal statutes and regulations and engineering standards. The user may switch freely among access methods in order to develop the most appropriate search strategy. The retrieved passages are presented to the user within the context of the hierarchical structure.

Details

Online Review, vol. 9 no. 5
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 April 1992

E.G. Sieverts, M. Hofstede and B. Oude Groeniger

In this article, the fourth in a series on microcomputer software for information storage and retrieval, test results of six indexing and fulltext retrieval programs are…

Abstract

In this article, the fourth in a series on microcomputer software for information storage and retrieval, test results of six indexing and fulltext retrieval programs are presented and various properties and qualities of these programs are discussed. The common feature of programs in these categories is that they are primarily meant to retrieve words (or combinations of them) in large text files. To do this they either simply index existing text files in one or more formats (indexing programs), or they store and index them in their own database format (fulltext retrieval programs). The programs reviewed in this issue are the indexing programs Ask‐It, Texplore and ZYindex and the fulltext retrieval programs KAware, TextMaster and WordCruncher. All programs run under MS‐DOS. In addition ZYindex has a Windows and a Unix version and TextMaster is also available for Unix. For each of the six programs almost 100 facts and test results are tabulated. The programs are also discussed individually.

Details

The Electronic Library, vol. 10 no. 4
Type: Research Article
ISSN: 0264-0473

Article
Publication date: 1 February 1991

W. Tauchert, J. Hospodarsky, J. Krause, C. Schneider and C. Womser‐Hacker

This paper reports the results of the information retrieval project PADOK‐II. This project, which began in November 1987, is being carried out by the Linguistic Information…

Abstract

This paper reports the results of the information retrieval project PADOK‐II. This project, which began in November 1987, is being carried out by the Linguistic Information Science Group of the University of Regensburg (LIR) in cooperation with the German Patent Office (GPO) and is sponsored by the German Ministry for Research and Technology. The long‐term aim is to integrate artificial intelligence into information retrieval research without neglecting traditional information retrieval methodology. In PADOK‐II an information retrieval system is considered which indexes documents rather shallowly using free‐text or morphological components. A large‐scale retrieval test has been carried out, based on the German Patent Information System. Answers have been obtained to some 400 queries made by 10 users in simulated real‐life situations. These results have been used to attempt to answer the question: ‘How do the linguistically‐based functions of an indexing system contribute to its performance?’ As a spinoff of this test, the influence of document size and structure was studied with a view to identifying the most reasonable basic content for a German Patent Information System.

Details

Online Review, vol. 15 no. 2
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 February 1991

Suliman Al‐Hawamdeh, Geoff Smith and Peter Willett

This paper considers the use of a hypertext system, GUIDE, for paragraph‐based searching in fulltext documents. Searching can be effected in GUIDE using both a conventional…

Abstract

This paper considers the use of a hypertext system, GUIDE, for paragraph‐based searching in fulltext documents. Searching can be effected in GUIDE using both a conventional, word‐based approach and using the inter‐textual linkage facilities. The effectiveness of these retrieval techniques are evaluated by means of searches of three fulltext documents for which relevance data are available. The results of the searches are compared with those obtained from use of a nearest neighbour retrieval system that has been developed for the ranking of paragraphs within fulltext documents. The comparison suggests that the linkage facilities in hypertext do not provide a very cost‐effective mechanism for paragraph‐based retrieval.

Details

Program, vol. 25 no. 2
Type: Research Article
ISSN: 0033-0337

Article
Publication date: 1 April 1993

Niël van der Merwe

This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an…

Abstract

This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an electronic document database that broadens the user's capability to retrieve relevant information more accurately, without going through costly processes to get paper documents into electronic text. The principles of document image processing systems, as well as the problems and shortcomings of most of today's document image processing systems, will be discussed. Then concept retrieval as the latest development in text retrieval will be discussed, with specific reference to the ability of the TOPIC intelligent text retrieval system to allow users to build up a knowledge base of search objects or concepts that can be used at any point in time by all users for the system. This paper will further specifically look at the automatic processing of paper documents by converting the scanned document image pages through to electronic text. The use of optical character recognition technology, the indexing and loading of the documents in a text database, the automatic linking of the documents to the related document images and the retrieval technology available in TOPIC, specifically the TYPO operator that was developed to handle so‐called dirty data such as the common misspellings, character transpositions and ‘dirty’ text received as output from the OCR process, will be discussed. A possible solution to load paper documents quickly and cost‐effectively into an electronic document database will be discussed and demonstrated in detail. The advantages and disadvantages of this approach will be discussed with specific reference to an electronic news clipping service application.

Details

The Electronic Library, vol. 11 no. 4/5
Type: Research Article
ISSN: 0264-0473

Article
Publication date: 1 January 1993

Ankie Visschedijk and Forbes Gibb

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…

Abstract

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.

Details

Online and CD-Rom Review, vol. 17 no. 1
Type: Research Article
ISSN: 1353-2642

Keywords

1 – 10 of over 10000