Search results

1 – 10 of over 1000

View access options

Article

Publication date: 22 July 2024

MD-LDA: a supervised LDA topic model for identifying mechanism of disease in TCM

Meiwen Li, Liye Xia, Qingtao Wu, Lin Wang, Junlong Zhu and Mingchuan Zhang

In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential element of syndrome differentiation and treatment, elucidating the mechanisms…

HTML

PDF (1.2 MB)

Downloads

Abstract

Purpose

In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential element of syndrome differentiation and treatment, elucidating the mechanisms underlying the occurrence, progression, alterations and outcomes of diseases. However, there is a dearth of research in the field of intelligent diagnosis concerning the analysis of MD.

Design/methodology/approach

In this paper, we propose a supervised Latent Dirichlet Allocation (LDA) topic model, termed MD-LDA, which elucidates the process of MDs identification. We leverage the label information inherent in the data as prior knowledge and incorporate it into the model’s training. Additionally, we devise two parallel parameter estimation algorithms for efficient training. Furthermore, we introduce a benchmark MD identification dataset, named TMD, for training MD-LDA. Finally, we validate the performance of MD-LDA through comprehensive experiments.

Findings

The results show that MD-LDA is effective and efficient. Moreover, MD-LDA outperforms the state-of-the-art topic models on perplexity, Kullback–Leibler (KL) and classification performance.

Originality/value

The proposed MD-LDA can be applied for the MD discovery and analysis of TCM clinical diagnosis, so as to improve the interpretability and reliability of intelligent diagnosis and treatment.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 4 June 2021

Citation context-based topic models: discovering cited and citing topics from full text

Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC…

HTML

PDF (1.2 MB)

Downloads

392

Abstract

Purpose

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.

Design/methodology/approach

The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.

Findings

The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.

Originality/value

The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.

Details

Library Hi Tech, vol. 39 no. 4

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 29 September 2021

Mining numerical measure of consumers’ product evaluation expressed in words based on latent Dirichlet allocation

Ziang Wang and Feng Yang

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no…

HTML

PDF (1.7 MB)

Downloads

231

Abstract

Purpose

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word.

Design/methodology/approach

First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model.

Findings

The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment.

Originality/value

This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.

Details

Journal of Modelling in Management, vol. 18 no. 1

Type: Research Article

DOI:

ISSN: 1746-5664

Keywords

View access options

Article

Publication date: 5 September 2017

MFS-LDA: a multi-feature space tag recommendation model for cold start problem

Muhammad Ali Masood, Rabeeh Ayaz Abbasi, Onaiza Maqbool, Mubashar Mushtaq, Naif R. Aljohani, Ali Daud, Muhammad Ahtisham Aslam and Jalal S. Alowibdi

Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the…

HTML

PDF (321 KB)

Downloads

489

Abstract

Purpose

Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the cold start problem), popularity-based tag recommendation methods fail to work. The purpose of this paper is to propose a novel model for tag recommendation called multi-feature space latent Dirichlet allocation (MFS-LDA) for cold start problem.

Design/methodology/approach

MFS-LDA is a novel latent Dirichlet allocation (LDA)-based model which exploits multiple feature spaces (title, contents, and tags) for recommending tags. Exploiting multiple feature spaces allows MFS-LDA to recommend tags even if data from a feature space is missing (the cold start problem).

Findings

Evaluation of a publicly available data set consisting of around 20,000 Wikipedia articles that are tagged on a social bookmarking website shows a significant improvement over existing LDA-based tag recommendation methods.

Originality/value

The originality of MFS-LDA lies in segregation of features for removing bias toward dominant features and in synchronization of multiple feature space for tag recommendation.

Details

Program, vol. 51 no. 3

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 14 October 2021

Identifying hidden semantic structures in Instagram data: a topic modelling comparison

Roman Egger and Joanne Yu

Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based…

HTML

PDF (439 KB)

Downloads

788

Abstract

Purpose

Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based on Instagram textual data.

Design/methodology/approach

By taking Instagram posts captioned with #darktourism as the study context, this research applies latent Dirichlet allocation (LDA), correlation explanation (CorEx), and non-negative matrix factorisation (NMF) to uncover tourist experiences.

Findings

CorEx outperforms LDA and NMF by classifying emerging dark sites and activities into 17 distinct topics. The results of LDA appear homogeneous and overlapping, whereas the extracted topics of NMF are not specific enough to gain deep insights.

Originality/value

This study assesses different topic modelling algorithms for knowledge extraction in the highly heterogeneous tourism industry. The findings unfold the complexity of analysing short-text social media data and strengthen the use of CorEx in analysing Instagram content.

研究目的

基于对文本复杂性的兴趣, 本研究以Instagram文本数据为基准, 旨在比较不同主题建模的算法的有效性。

研究方法

本研究以标有 #darktourism的Instagram帖子作为背景, 评估直观理解（LDA）, 相关解释（CorEx）和非负矩阵分解（NMF）在分析与黑暗观光相关的帖子的实用性。

研究结果

CorEx分析出17个新兴的黑暗景点和活动, 亦胜过LDA和NMF。虽然LDA能探讨出较多的主题数, 但它们的内容几乎重复。同样的, 尽管NMF适用于短文本数据, 但它提取出主题相当笼统且不够具体。

原创性

透过将营销和数据科学学科相结合, 本研究为分析非结构化的文本奠定了基础, 并证实了CorEx在分析短文本社交媒体数据（如Instagram数据）中的效益。

Propósito

Intrigado por los desafíos metodológicos que surgen de la complejidad del texto, este estudio evalúa la efectividad de diferentes algoritmos de modelado de temas basados en datos textuales de Instagram.

Metodología

Al tomar publicaciones de Instagram con #darktourism como contexto de estudio, esta investigación aplica la asignación de Dirichlet latente (LDA), la explicación de correlación (CorEx) y la factorización matricial no negativa (NMF) para descubrir experiencias turísticas.

Resultados

CorEx supera a LDA y NMF al clasificar los sitios y actividades oscuros emergentes en 17 temas distintos. Los resultados de LDA son homogéneos y se superponen, mientras que los temas extraídos de NMF no son lo suficientemente específicos como para obtener conocimientos profundos.

Originalidad

Este estudio evalúa diferentes algoritmos de modelado de temas para la extracción de conocimiento en la industria del turismo. Los hallazgos revelan la complejidad de analizar datos de redes sociales de texto corto y fortalecen el uso de CorEx para analizar el contenido de Instagram.

Details

Tourism Review, vol. 77 no. 4

Type: Research Article

DOI:

ISSN: 1660-5373

Keywords

View access options

Article

Publication date: 16 February 2023

Latent topics identification from the articles of Sri Lankan authors using LDA

S. Ravikumar, Bidyut Bikash Boruah and Fullstar Lamin Gayang

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989…

HTML

PDF (1.9 MB)

Downloads

108

Abstract

Purpose

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.

Design/methodology/approach

Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.

Findings

It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.

Practical implications

Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.

Originality/value

This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

Open Access

Article

Publication date: 19 June 2020

Information flows and topic modeling in corporate governance

Jeffrey D. Kushkowski, Charles B. Shrader, Marc H. Anderson and Robert E. White

Multiple disciplines such as finance, management and economics have contributed to governance research over time. However, the full intellectual structure of the governance…

HTML

PDF (1.2 MB)

Downloads

5161

Abstract

Purpose

Multiple disciplines such as finance, management and economics have contributed to governance research over time. However, the full intellectual structure of the governance “field” including the exchange of knowledge across disciplines and the large variety of governance topics remains to be uncovered. To appreciate the breadth of corporate governance research, it is necessary to understand the disciplinary sources from which the research stems. This manuscript focuses on the interdisciplinary underpinnings of corporate governance research.

Design/methodology/approach

This paper employs bibliometric analysis to trace the evolution of corporate governance using articles included in the ISI Web of Science database between 1990 and 2015. Journals included in these categories encompass a full range of business disciplines and provide evidence of the multi-disciplinary nature of corporate governance. It also uncovers the topics treated by disciplines under the governance umbrella using a machine learning method called latent Dirichtlet allocation (LDA).

Findings

Corporate governance research deals with a number of strategy-related topics. Unlike strategy topics that reside in a single discipline, corporate governance crosses disciplinary boundaries and includes contributions from accounting, finance, economics, law and management. Our analysis shows that over 80% of corporate governance articles come from outside the field of management. Our LDA solution indicates that the major topics in governance research include corporate governance theory, control of family firms, executive compensation and audit committees.

Originality/value

The results illustrate that corporate governance is far more interdisciplinary than previously thought. This is an important insight for corporate governance academics and may lead to collaborative research. More importantly, this research illustrates the usefulness of LDA for investigating interdisciplinary fields. This method is easily transferable to other interdisciplinary fields and it provides a powerful alternative to existing bibliometric methods. We suggest a number of topic areas within library and information science where this method may be applied, including collection development, support for interdisciplinary faculty and basic research into emerging interdisciplinary areas.

Details

Journal of Documentation, vol. 76 no. 6

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 14 June 2021

Incorporating LDA with LSTM for followee recommendation on Twitter network

Brahim Dib, Fahd Kalloubi, El Habib Nfaoui and Abdelhak Boulaalam

The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that…

HTML

PDF (772 KB)

Downloads

130

Abstract

Purpose

The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that more people keep up with new streaming information on Twitter micro-blogging service. With the immense number of micro-posts shared via the follower/followee network graph, Twitter users find themselves in front of millions of tweets, which makes the task crucial.

Design/methodology/approach

In this paper, a long short–term memory (LSTM) model that relies on the latent Dirichlet allocation (LDA) output vector for followee recommendation, the LDA model applied as a topic modeling strategy is proposed.

Findings

This study trains the model using a real-life data set extracted based on Twitter follower/followee architecture. It confirms the effectiveness and scalability of the proposed approach. The approach improves the state-of-the-art models average-LSTM and time-LSTM.

Research limitations/implications

This study improves mainly the existing followee recommendation systems. Because, unlike previous studies, it applied a non-hand-crafted method which is the LSTM neural network with LDA model for topics extraction. The main limitation of this study is the cold-start users cannot be treated, also some active fake accounts may not be detected.

Practical implications

The aim of this approach is to assist users seeking appropriate information to read about, by choosing appropriate profiles to follow.

Social implications

This approach consolidates the social relationship between users in a microblogging platform by suggesting like-minded people to each other. Thus, finding users with the same interests will be easy without spending a lot of time seeking relevant users.

Originality/value

Instead of classic recommendation models, the paper provides an efficient neural network searching method to make it easier to find appropriate users to follow. Therefore, affording an effective followee recommendation system.

Details

International Journal of Web Information Systems, vol. 17 no. 3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 19 February 2018

Discovering research topics from library electronic references using latent Dirichlet allocation

Debin Fang, Haixia Yang, Baojun Gao and Xiaojun Li

Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly…

HTML

PDF (161 KB)

Downloads

1195

Abstract

Purpose

Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly depends on human justification. The purpose of this paper is to demonstrate how to identify research topics and evolution in trends from library electronic references efficiently and effectively by employing automatic text analysis algorithms.

Design/methodology/approach

The authors used the latent Dirichlet allocation (LDA), a probabilistic generative topic model to extract the latent topic from the large quantity of research abstracts. Then, the authors conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics.

Findings

First, this paper discovers 32 significant research topics from the abstracts of 3,737 articles published in the six top accounting journals during the period of 1992-2014. Second, based on the document-topic distributions generated by LDA, the authors identified seven hot topics and six cold topics from the 32 topics.

Originality/value

The topics discovered by LDA are highly consistent with the topics identified by human experts, indicating the validity and effectiveness of the methodology. Therefore, this paper provides novel knowledge to the accounting literature and demonstrates a methodology and process for topic discovery with lower cost and higher efficiency than the current methods.

Details

Library Hi Tech, vol. 36 no. 3

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 12 June 2017

Coauthorship network-based literature recommendation with topic model

San-Yih Hwang, Chih-Ping Wei, Chien-Hsiang Lee and Yu-Siang Chen

The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles…

HTML

PDF (658 KB)

Downloads

630

Abstract

Purpose

The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles. Previous works on recommending articles to satisfy users’ short-term interests have utilized article content, usage logs, and more recently, coauthorship networks. The usefulness of coauthorship has been demonstrated by some research works, which, however, tend to adopt a simple coauthorship network that records only the strength of coauthorships. The purpose of this paper is to enhance the effectiveness of coauthorship-based recommendation by incorporating scholars’ collaboration topics into the coauthorship network.

Design/methodology/approach

The authors propose a latent Dirichlet allocation (LDA)-coauthorship-network-based method that integrates topic information into the links of the coauthorship networks using LDA, and a task-focused technique is developed for recommending literature articles.

Findings

The experimental results using information systems journal articles show that the proposed method is more effective than the previous coauthorship network-based method over all scenarios examined. The authors further develop a hybrid method that combines the results of content-based and LDA-coauthorship-network-based recommendations. The resulting hybrid method achieves greater or comparable recommendation effectiveness under all scenarios when compared to the content-based method.

Originality/value

This paper makes two contributions. The authors first show that topic model is indeed useful and can be incorporated into the construction of coaurthoship-network to improve literature recommendation. The authors subsequently demonstrate that coauthorship-network-based and content-based recommendations are complementary in their hit article rank distributions, and then devise a hybrid recommendation method to further improve the effectiveness of literature recommendation.

Details

Online Information Review, vol. 41 no. 3

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

Access

Year

Content type

1 – 10 of over 1000