A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVAL

C.J. VAN RIJSBERGEN (Computer Laboratory, University of Cambridge)

Journal of Documentation

ISSN: 0022-0418

Publication date: 1 February 1977

Abstract

This paper provides a foundation for a practical way of improving the effectiveness of an automatic retrieval system. Its main concern is with the weighting of index terms as a device for increasing retrieval effectiveness. Previously index terms have been assumed to be independent for the good reason that then a very simple weighting scheme can be used. In reality index terms are most unlikely to be independent. This paper explores one way of removing the independence assumption. Instead the extent of the dependence between index terms is measured and used to construct a non‐linear weighting function. In a practical situation the values of some of the parameters of such a function must be estimated from small samples of documents. So a number of estimation rules are discussed and one in particular is recommended. Finally the feasibility of the computations required for a non‐linear weighting scheme is examined.

Citation

VAN RIJSBERGEN, C. (1977), "A THEORETICAL BASIS FOR THE USE OF CO‐OCCURRENCE DATA IN INFORMATION RETRIEVAL", Journal of Documentation, Vol. 33 No. 2, pp. 106-119. https://doi.org/10.1108/eb026637

Download as .RIS

Publisher

:

MCB UP Ltd

Copyright © 1977, MCB UP Limited

Please note you might not have access to this content

You may be able to access this content by login via Shibboleth, Open Athens or with your Emerald account.
If you would like to contact us about accessing this content, click the button and fill out the form.
To rent this content from Deepdyve, please click the button.