Semantic relatedness measurement based on Wikipedia link co‐occurrence analysis

Masahiro Ito (Department of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, Japan)
Kotaro Nakayama (Center for Knowledge Structuring, The University of Tokyo, Tokyo, Japan)
Takahiro Hara (Department of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, Japan)
Shojiro Nishio (Department of Multimedia Engineering, Graduate School of Information Science and Technology, Osaka University, Osaka, Japan)

International Journal of Web Information Systems

ISSN: 1744-0084

Publication date: 5 April 2011

Abstract

Purpose

Recently, the importance and effectiveness of Wikipedia Mining has been shown in several researches. One popular research area on Wikipedia Mining focuses on semantic relatedness measurement, and research in this area has shown that Wikipedia can be used for semantic relatedness measurement. However, previous methods are facing two problems; accuracy and scalability. To solve these problems, the purpose of this paper is to propose an efficient semantic relatedness measurement method that leverages global statistical information of Wikipedia. Furthermore, a new test collection is constructed based on Wikipedia concepts for evaluating semantic relatedness measurement methods.

Design/methodology/approach

The authors' approach leverages global statistical information of the whole Wikipedia to compute semantic relatedness among concepts (disambiguated terms) by analyzing co‐occurrences of link pairs in all Wikipedia articles. In Wikipedia, an article represents a concept and a link to another article represents a semantic relation between these two concepts. Thus, the co‐occurrence of a link pair indicates the relatedness of a concept pair. Furthermore, the authors propose an integration method with tfidf as an improved method to additionally leverage local information in an article. Besides, for constructing a new test collection, the authors select a large number of concepts from Wikipedia. The relatedness of these concepts is judged by human test subjects.

Findings

An experiment was conducted for evaluating calculation cost and accuracy of each method. The experimental results show that the calculation cost of this approach is very low compared to one of the previous methods and more accurate than all previous methods for computing semantic relatedness.

Originality/value

This is the first proposal of co‐occurrence analysis of Wikipedia links for semantic relatedness measurement. The authors show that this approach is effective to measure semantic relatedness among concepts regarding calculation cost and accuracy. The findings may be useful to researchers who are interested in knowledge extraction, as well as ontology researches.

Keywords

Citation

Ito, M., Nakayama, K., Hara, T. and Nishio, S. (2011), "Semantic relatedness measurement based on Wikipedia link co‐occurrence analysis", International Journal of Web Information Systems, Vol. 7 No. 1, pp. 44-61. https://doi.org/10.1108/17440081111125653

Download as .RIS

Publisher

:

Emerald Group Publishing Limited

Copyright © 2011, Emerald Group Publishing Limited

Please note you might not have access to this content

You may be able to access this content by login via Shibboleth, Open Athens or with your Emerald account.
If you would like to contact us about accessing this content, click the button and fill out the form.
To rent this content from Deepdyve, please click the button.