The term discrimination value of an index term has been proposed as a quantitative measure of the extent to which that term can discriminate between documents in bibliographic databases. Previous work has suggested that the most discriminating terms are those with medium frequencies of occurrence. This paper discusses the effect of including relevance data on the calculation of term discrimination values. Two algorithms are described that calculate the ability of index terms to discriminate between relevant documents, between non‐relevant documents or between relevant and non‐relevant documents. The application of these algorithms to several standard document test collections demonstrates that the exact form of the relationship between term frequency and term discrimination depends upon the particular type of discrimination which is being measured; in particular, medium frequency terms are not necessarily the best discriminators when relevance data is available. These results are compared with the discriminatory ability of terms as measured by their relevance weights, where the most discriminating terms are those with low frequencies of occurrence.
BIRU, T., EL‐HAMDOUCHI, A., REES, R. and WILLETT, P. (1989), "INCLUSION OF RELEVANCE INFORMATION IN THE TERM DISCRIMINATION MODEL", Journal of Documentation, Vol. 45 No. 2, pp. 85-109. https://doi.org/10.1108/eb026840Download as .RIS
MCB UP Ltd
Copyright © 1989, MCB UP Limited