The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries.
Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document’s weight, which play the role of the mean average precision (MAP) score of the systems as a significance test’s statics. The experiments were conducted using the TREC 9 Web track collection.
The p-values generated through the two types of significance tests, namely the Student’s t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores.
Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.
This research is supported by UMRG Program RP028E-14AET. Also, this work was supported by the Exploratory Research Grant Scheme (ERGS) ER027-2013A.
Ravana, S., TAHERI, M. and Rajagopal, P. (2015), "Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems", Aslib Journal of Information Management, Vol. 67 No. 4, pp. 408-421. https://doi.org/10.1108/AJIM-12-2014-0171Download as .RIS
Emerald Group Publishing Limited
Copyright © 2015, Emerald Group Publishing Limited