A study of user profile representation for personalized cross-language information retrieval
Abstract
Purpose
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.
Design/methodology/approach
The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.
Findings
Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.
Originality/value
Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
Keywords
Acknowledgements
The work described in this paper was supported by the National Natural Science Foundation of China under Project No. 61300129, No. 61572187 and No. 61272063, and a project sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, China under Grant Number [2013] 1792. This work is also supported by the ADAPT Centre for Digital Content Technology, which is funded under the Science Foundation Ireland Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. The authors would like to thank the anonymous reviewers who significantly improved the quality of this manuscript during preparation.
Citation
Zhou, D., Lawless, S., Wu, X., Zhao, W. and Liu, J. (2016), "A study of user profile representation for personalized cross-language information retrieval", Aslib Journal of Information Management, Vol. 68 No. 4, pp. 448-477. https://doi.org/10.1108/AJIM-06-2015-0091
Publisher
:Emerald Group Publishing Limited
Copyright © 2016, Emerald Group Publishing Limited