This paper aims to propose a multi-label method that estimates appropriate aspects against unknown tweets using the two-phase estimation method. Many Twitter users share daily events and opinions. Some beneficial comments are posted on such real-life aspects as eating, traffic, weather and so on. Such posts as “The train is not coming” are categorized in the Traffic aspect. Such tweets as “The train is delayed by heavy rain” are categorized in both the Traffic and Weather aspects.
The proposed method consists of two phases. In the first, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). In the second, associations among many topics and fewer aspects are built using a small set of labeled tweets. The aspect scores for tweets were calculated using associations based on the extracted terms. Appropriate aspects are labeled for unknown tweets by averaging the aspect scores.
Using a large amount of actual tweets, the sophisticated experimental evaluations demonstrate the high efficiency of the proposed multi-label classification method. It is confirmed that high F-measure aspects are strongly associated with topics that have high relevance. Low F-measure aspects are associated with topics that are connected to many other aspects.
The proposed method features two-phase semi-supervised learning. Many topics are extracted using an unsupervised learning model called LDA. Associations among many topics and fewer aspects are built using labeled tweets.
This work was supported by a Grant-in-Aid for scientific Research No. 25280110.
Yamamoto, S. and Satoh, T. (2014), "Two phase estimation method for multi-classifying real life tweets", International Journal of Web Information Systems, Vol. 10 No. 4, pp. 378-393. https://doi.org/10.1108/IJWIS-04-2014-0013
Emerald Group Publishing Limited
Copyright © 2014, Emerald Group Publishing Limited