Two phase estimation method for multi-classifying real life tweets

Shuhei Yamamoto (Graduate School of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Japan)
Tetsuji Satoh (Faculty of Library, Information and Media Studies, University of Tsukuba, Tsukuba, Japan)

International Journal of Web Information Systems

ISSN: 1744-0084

Publication date: 11 November 2014

Abstract

Purpose

This paper aims to propose a multi-label method that estimates appropriate aspects against unknown tweets using the two-phase estimation method. Many Twitter users share daily events and opinions. Some beneficial comments are posted on such real-life aspects as eating, traffic, weather and so on. Such posts as “The train is not coming” are categorized in the Traffic aspect. Such tweets as “The train is delayed by heavy rain” are categorized in both the Traffic and Weather aspects.

Design/methodology/approach

The proposed method consists of two phases. In the first, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). In the second, associations among many topics and fewer aspects are built using a small set of labeled tweets. The aspect scores for tweets were calculated using associations based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging the aspect scores.

Findings

Using a large amount of actual tweets, the sophisticated experimental evaluations demonstrate the high efficiency of the proposed multi-label classification method. It is confirmed that high F-measure aspects are strongly associated with topics that have high relevance. Low F-measure aspects are associated with topics that are connected to many other aspects.

Originality/value

The proposed method features two-phase semi-supervised learning. Many topics are extracted using an unsupervised learning model called LDA. Associations among many topics and fewer aspects are built using labeled tweets.

Keywords

Acknowledgements

This work was supported by a Grant-in-Aid for scientific Research No. 25280110.

Citation

Yamamoto, S. and Satoh, T. (2014), "Two phase estimation method for multi-classifying real life tweets", International Journal of Web Information Systems, Vol. 10 No. 4, pp. 378-393. https://doi.org/10.1108/IJWIS-04-2014-0013

Publisher

:

Emerald Group Publishing Limited

Copyright © 2014, Emerald Group Publishing Limited

To read the full version of this content please select one of the options below

You may be able to access this content by logging in via Shibboleth, Open Athens or with your Emerald account.
To rent this content from Deepdyve, please click the button.
If you think you should have access to this content, click the button to contact our support team.