To read this content please select one of the options below:

A scalable eigenspace-based fuzzy c-means for topic detection

Hendri Murfi (Department of Mathematics, Universitas Indonesia, Depok, Indonesia)

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 23 March 2021

Issue publication date: 5 August 2021

129

Abstract

Purpose

The aim of this research is to develop an eigenspace-based fuzzy c-means method for scalable topic detection.

Design/methodology/approach

The eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.

Findings

Our simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.

Originality/value

This research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.

Keywords

Acknowledgements

This work was supported by Universitas Indonesia under PDUPT 2019 grant. Any opinions, findings, conclusions or recommendations are the authors' and do not necessarily reflect those of the sponsor.

Citation

Murfi, H. (2021), "A scalable eigenspace-based fuzzy c-means for topic detection", Data Technologies and Applications, Vol. 55 No. 4, pp. 527-541. https://doi.org/10.1108/DTA-11-2020-0262

Publisher

:

Emerald Publishing Limited

Copyright © 2021, Emerald Publishing Limited

Related articles