To read the full version of this content please select one of the options below:

A study of user profile representation for personalized cross-language information retrieval

Dong Zhou (School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China)
Séamus Lawless (School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland)
Xuan Wu (School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China)
Wenyu Zhao (School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China)
Jianxun Liu (School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, China)

Aslib Journal of Information Management

ISSN: 2050-3806

Article publication date: 18 July 2016

Downloads
747

Abstract

Purpose

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.

Design/methodology/approach

The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.

Findings

Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.

Originality/value

Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.

Keywords

Acknowledgements

The work described in this paper was supported by the National Natural Science Foundation of China under Project No. 61300129, No. 61572187 and No. 61272063, and a project sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, China under Grant Number [2013] 1792. This work is also supported by the ADAPT Centre for Digital Content Technology, which is funded under the Science Foundation Ireland Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. The authors would like to thank the anonymous reviewers who significantly improved the quality of this manuscript during preparation.

Citation

Zhou, D., Lawless, S., Wu, X., Zhao, W. and Liu, J. (2016), "A study of user profile representation for personalized cross-language information retrieval", Aslib Journal of Information Management, Vol. 68 No. 4, pp. 448-477. https://doi.org/10.1108/AJIM-06-2015-0091

Publisher

:

Emerald Group Publishing Limited

Copyright © 2016, Emerald Group Publishing Limited