A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The vocabularies considered are sets of variable‐length character strings chosen from the fronts of document and query terms so as to occur with approximate equifrequency. Sets containing between 120 and 720 members were tested both using an application of the Cluster Hypothesis and in a series of linear associative retrieval experiments. The effectiveness of the smaller sets is low but the larger ones exhibit retrieval characteristics comparable to those of words.
BURNETT, J.E., COOPER, D., LYNCH, M.F., WILLETT, P. and WYCHERLEY, M. (1979), "DOCUMENT RETRIEVAL EXPERIMENTS USING INDEXING VOCABULARIES OF VARYING SIZE. I. VARIETY GENERATION SYMBOLS ASSIGNED TO THE FRONTS OF INDEX TERMS", Journal of Documentation, Vol. 35 No. 3, pp. 197-206. https://doi.org/10.1108/eb026680
MCB UP Ltd
Copyright © 1979, MCB UP Limited