To read the full version of this content please select one of the options below:

What is your tweet worldview? Mapping the topic structure of tweets on the Wikipedia

Yu Suzuki (Nara Institute of Science and Technology, Ikoma, Nara, Japan)
Hiromitsu Ohara (Konan University, Higashi-Nada, Kobe, Hyogo)
Akiyo Nadamoto (Konan University, Higashi-Nada, Kobe, Hyogo)

International Journal of Pervasive Computing and Communications

ISSN: 1742-7371

Publication date: 3 April 2018



This paper aims to propose a method for summarizing the topics of tweets using the Wikipedia category structure as common knowledge for supplementing the understanding of the Twitter user’s interests. There are many topics in the tweets, and the topics can be treated as a tree structure. However, when the topic hierarchy is constructed using existing hierarchal clustering approach, the granularity of tweet groups differs for each user. For summarizing the topics, identification of the topics which are heterogeneous and which are not is necessary because it is easy to understand if several groups are categorized into parent groups. However, if the group units are different for each user, a number of users’ interests cannot be summarized. If some tweets are grouped into the presidential election, and the others are into Donald Trump, there cannot be a count of how many users are interested in Donald Trump.


One solution of this issue is to construct topic structures by mapping one common tree structure. In this paper, a method is proposed for constructing the topic structure using the Wikipedia category tree similar to a common tree structure. The tweets are categorized, mapped to titles of articles in the Wikipedia category tree and then visualized as the hierarchal structure to the users.


The effectiveness of the proposed hierarchal topic structure is confirmed. In theme “politics”, the proposed method works well. The main reason is that there are many technical terms about politics in the Wikipedia categories and articles. It was found that a number of the terms of politics do not have multiple meanings, multiple semantics. However, in theme “sports”, the proposed method does not perform well. The main reason for this case is that there are a number of names of people present as topic names.


One important feature of the proposed method is that it is easy to grasp not only about the topics which are heterogeneous or homogeneous with each other but also consider the missing time when extracting topics. Another feature is that the topic structures for multiple users are easy to compare with each other.



Suzuki, Y., Ohara, H. and Nadamoto, A. (2018), "What is your tweet worldview? Mapping the topic structure of tweets on the Wikipedia", International Journal of Pervasive Computing and Communications, Vol. 14 No. 1, pp. 2-14.



Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited