To read this content please select one of the options below:

Selecting a text similarity measure for a content-based recommender system: A comparison in two corpora

Manjula Wijewickrema (Berlin School of Library and Information Science, Humboldt University of Berlin, Germany)

Vivien Petras (Berlin School of Library and Information Science, Humboldt University of Berlin, Germany)

Naomal Dias (Department of Computer Systems Engineering, University of Kelaniya, Kelaniya, Sri Lanka)

The Electronic Library

ISSN: 0264-0473

Article publication date: 11 July 2019

Issue publication date: 14 August 2019

Downloads

687

Abstract

Purpose

The purpose of this paper is to develop a journal recommender system, which compares the content similarities between a manuscript and the existing journal articles in two subject corpora (covering the social sciences and medicine). The study examines the appropriateness of three text similarity measures and the impact of numerous aspects of corpus documents on system performance.

Design/methodology/approach

Implemented three similarity measures one at a time on a journal recommender system with two separate journal corpora. Two distinct samples of test abstracts were classified and evaluated based on the normalized discounted cumulative gain.

Findings

The BM25 similarity measure outperforms both the cosine and unigram language similarity measures overall. The unigram language measure shows the lowest performance. The performance results are significantly different between each pair of similarity measures, while the BM25 and cosine similarity measures are moderately correlated. The cosine similarity achieves better performance for subjects with higher density of technical vocabulary and shorter corpus documents. Moreover, increasing the number of corpus journals in the domain of social sciences achieved better performance for cosine similarity and BM25.

Originality/value

This is the first work related to comparing the suitability of a number of string-based similarity measures with distinct corpora for journal recommender systems.

Keywords

Acknowledgements

The authors would like to thank the National Centre for Advanced Studies in Humanities and Social Sciences, Colombo, Sri Lanka for providing partial funding to conduct this research under the reference number 16/NCAS/SUSL/Lib/08.

Citation

Wijewickrema, M., Petras, V. and Dias, N. (2019), "Selecting a text similarity measure for a content-based recommender system: A comparison in two corpora", The Electronic Library, Vol. 37 No. 3, pp. 506-527. https://doi.org/10.1108/EL-08-2018-0165

Publisher

:

Emerald Publishing Limited

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Selecting a text similarity measure for a content-based recommender system: A comparison in two corpora

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Acknowledgements

Citation

Publisher

Related articles

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Acknowledgements

Citation

Publisher

Related articles

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information