To read this content please select one of the options below:

Latent topics identification from the articles of Sri Lankan authors using LDA

S. Ravikumar (Department of Library and Information Science, North-Eastern Hill University, Shillong, India)
Bidyut Bikash Boruah (Department of Library and Information Science, North-Eastern Hill University, Shillong, India)
Fullstar Lamin Gayang (Department of Library and Information Science, North-Eastern Hill University, Shillong, India)

Global Knowledge, Memory and Communication

ISSN: 2514-9342

Article publication date: 16 February 2023

103

Abstract

Purpose

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.

Design/methodology/approach

Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.

Findings

It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.

Practical implications

Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.

Originality/value

This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.

Keywords

Acknowledgements

The authors received no financial support for the research, authorship and/or publication of this article. The article is the authors’ original work and has not received prior publication.

Statement and declaration: It is to certify that the authors have no affiliations with or involvement in any organization or entity with any financial or nonfinancial interest in the subject matter or materials discussed in this manuscript.

Citation

Ravikumar, S., Boruah, B.B. and Gayang, F.L. (2023), "Latent topics identification from the articles of Sri Lankan authors using LDA", Global Knowledge, Memory and Communication, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/GKMC-08-2022-0206

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles