STOCHASTIC MODELS FOR THE DISTRIBUTION OF INDEX TERMS
Abstract
Distributions of index terms have been used in modelling information retrieval systems and databases. Most previous models used some form of the Zipf distribution. This work uses a probability model of the occurrence of index terms to derive discrete distributions which are mixtures of Poisson and negative binomial distributions. These distributions, the generalised inverse Gaussian‐Poisson and the Generalised Waring give better fits than the simpler Zipf distribution, particularly in the tails of the distribution where the high frequency terms are found. They have the advantage of being more explanatory and can incorporate a time parameter if necessary.
Citation
NELSON, M.J. (1989), "STOCHASTIC MODELS FOR THE DISTRIBUTION OF INDEX TERMS", Journal of Documentation, Vol. 45 No. 3, pp. 227-237. https://doi.org/10.1108/eb026845
Publisher
:MCB UP Ltd
Copyright © 1989, MCB UP Limited