This work aims to investigate the sensitivity of ranking performance with respect to the topic distribution of queries selected for ranking evaluation.
The authors reweight queries used in two TREC tasks to make them match three real background topic distributions, and show that the performance rankings of retrieval systems are quite different.
It is found that search engines tend to perform similarly on queries about the same topic; and search engine performance is sensitive to the topic distribution of queries used in evaluation.
Using experiments with multiple real‐world query logs, the paper demonstrates weaknesses in the current evaluation model of retrieval systems.
CitationDownload as .RIS
Emerald Group Publishing Limited
Copyright © 2011, Emerald Group Publishing Limited