Full‐text documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In this paper, we summarise the results to date of a continuing programme of research at the University of Sheffield to investigate the use of nearest‐neighbour retrieval algorithms for full‐text searching. Given a natural‐language query statement, our methods result in a ranking of the paragraphs comprising a full‐text document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query. A full‐text document test collection has been created to allow systematic tests of retrieval effectiveness to be carried out. Experiments with this collection demonstrate that nearest‐neighbour searching provides a means for paragraph‐based access to full‐text documents that is of comparable effectiveness to both Boolean and hypertext searching and that index term weighting schemes which have been developed for the searching of bibliographical databases can also be used to improve the effectiveness of retrieval from full‐text databases. A current project is investigating the extent to which a paragraph‐based full‐text retrieval system can be used to augment the explication facilities of an expert system on welding.
Al‐Hawamdeh, S., de Vere, R., Smith, G. and Willett, P. (1991), "Using nearest‐neighbour searching techniques to access full‐text documents", Online Review, Vol. 15 No. 3/4, pp. 173-191. https://doi.org/10.1108/eb024372
MCB UP Ltd
Copyright © 1991, MCB UP Limited