To read this content please select one of the options below:

An enhanced cosine-based visual technique for the robust tweets data clustering

Narasimhulu K (Annamalai University, Chidambaram, India)

Meena Abarna KT (Annamalai University, Chidambaram, India)

Sivakumar B (Rajeev Gandhi Memorial College of Engineering and Technology, Nandyal, India)

International Journal of Intelligent Computing and Cybernetics

ISSN: 1756-378X

Article publication date: 1 February 2021

Issue publication date: 23 April 2021

Downloads

105

Abstract

Purpose

The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for achieving the robust tweets data clustering results.

Design/methodology/approach

Let “N” be the number of tweets documents for the topics extraction. Unwanted texts, punctuations and other symbols are removed, tokenization and stemming operations are performed in the initial tweets pre-processing step. Bag-of-features are determined for the tweets; later tweets are modelled with the obtained bag-of-features during the process of topics extraction. Approximation of topics features are extracted for every tweet document. These set of topics features of N documents are treated as multi-viewpoints. The key idea of the proposed work is to use multi-viewpoints in the similarity features computation. The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents (here N = 5) and corresponding documents are defined in projected space with five viewpoints, say, v₁,v₂, v₃, v₄, and v₅. For example, similarity features between two documents (viewpoints v₁, and v₂) are computed concerning the other three multi-viewpoints (v₃, v₄, and v₅), unlike a single viewpoint in traditional cosine metric.

Findings

Healthcare problems with tweets data. Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding term frequency and inverse document frequency (TF–IDF) for unlabelled tweets.

Originality/value

Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding TF-IDF for unlabelled tweets.

Keywords

Citation

K, N., KT, M.A. and B, S. (2021), "An enhanced cosine-based visual technique for the robust tweets data clustering", International Journal of Intelligent Computing and Cybernetics, Vol. 14 No. 2, pp. 170-184. https://doi.org/10.1108/IJICC-10-2020-0151

Publisher

:

Emerald Publishing Limited

To read this content please select one of the options below:

Please note you do not have access to teaching notes

An enhanced cosine-based visual technique for the robust tweets data clustering

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

Related articles

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

Related articles

All feedback is valuable

Report an issue or find answers to frequently asked questions