Search results
1 – 10 of over 1000Meiwen Li, Liye Xia, Qingtao Wu, Lin Wang, Junlong Zhu and Mingchuan Zhang
In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential element of syndrome differentiation and treatment, elucidating the mechanisms…
Abstract
Purpose
In traditional Chinese medicine (TCM), the mechanism of disease (MD) constitutes an essential element of syndrome differentiation and treatment, elucidating the mechanisms underlying the occurrence, progression, alterations and outcomes of diseases. However, there is a dearth of research in the field of intelligent diagnosis concerning the analysis of MD.
Design/methodology/approach
In this paper, we propose a supervised Latent Dirichlet Allocation (LDA) topic model, termed MD-LDA, which elucidates the process of MDs identification. We leverage the label information inherent in the data as prior knowledge and incorporate it into the model’s training. Additionally, we devise two parallel parameter estimation algorithms for efficient training. Furthermore, we introduce a benchmark MD identification dataset, named TMD, for training MD-LDA. Finally, we validate the performance of MD-LDA through comprehensive experiments.
Findings
The results show that MD-LDA is effective and efficient. Moreover, MD-LDA outperforms the state-of-the-art topic models on perplexity, Kullback–Leibler (KL) and classification performance.
Originality/value
The proposed MD-LDA can be applied for the MD discovery and analysis of TCM clinical diagnosis, so as to improve the interpretability and reliability of intelligent diagnosis and treatment.
Details
Keywords
Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu
Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC…
Abstract
Purpose
Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.
Design/methodology/approach
The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.
Findings
The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.
Originality/value
The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.
Details
Keywords
Ziang Wang and Feng Yang
It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no…
Abstract
Purpose
It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word.
Design/methodology/approach
First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model.
Findings
The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment.
Originality/value
This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.
Details
Keywords
Muhammad Ali Masood, Rabeeh Ayaz Abbasi, Onaiza Maqbool, Mubashar Mushtaq, Naif R. Aljohani, Ali Daud, Muhammad Ahtisham Aslam and Jalal S. Alowibdi
Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the…
Abstract
Purpose
Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the cold start problem), popularity-based tag recommendation methods fail to work. The purpose of this paper is to propose a novel model for tag recommendation called multi-feature space latent Dirichlet allocation (MFS-LDA) for cold start problem.
Design/methodology/approach
MFS-LDA is a novel latent Dirichlet allocation (LDA)-based model which exploits multiple feature spaces (title, contents, and tags) for recommending tags. Exploiting multiple feature spaces allows MFS-LDA to recommend tags even if data from a feature space is missing (the cold start problem).
Findings
Evaluation of a publicly available data set consisting of around 20,000 Wikipedia articles that are tagged on a social bookmarking website shows a significant improvement over existing LDA-based tag recommendation methods.
Originality/value
The originality of MFS-LDA lies in segregation of features for removing bias toward dominant features and in synchronization of multiple feature space for tag recommendation.
Details
Keywords
Roman Egger and Joanne Yu
Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based…
Abstract
Purpose
Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based on Instagram textual data.
Design/methodology/approach
By taking Instagram posts captioned with #darktourism as the study context, this research applies latent Dirichlet allocation (LDA), correlation explanation (CorEx), and non-negative matrix factorisation (NMF) to uncover tourist experiences.
Findings
CorEx outperforms LDA and NMF by classifying emerging dark sites and activities into 17 distinct topics. The results of LDA appear homogeneous and overlapping, whereas the extracted topics of NMF are not specific enough to gain deep insights.
Originality/value
This study assesses different topic modelling algorithms for knowledge extraction in the highly heterogeneous tourism industry. The findings unfold the complexity of analysing short-text social media data and strengthen the use of CorEx in analysing Instagram content.
研究目的
基于对文本复杂性的兴趣, 本研究以Instagram文本数据为基准, 旨在比较不同主题建模的算法的有效性。
研究方法
本研究以标有 #darktourism的Instagram帖子作为背景, 评估直观理解(LDA), 相关解释(CorEx)和非负矩阵分解(NMF)在分析与黑暗观光相关的帖子的实用性。
研究结果
CorEx分析出17个新兴的黑暗景点和活动, 亦胜过LDA和NMF。虽然LDA能探讨出较多的主题数, 但它们的内容几乎重复。同样的, 尽管NMF适用于短文本数据, 但它提取出主题相当笼统且不够具体。
原创性
透过将营销和数据科学学科相结合, 本研究为分析非结构化的文本奠定了基础, 并证实了CorEx在分析短文本社交媒体数据(如Instagram数据)中的效益。
Propósito
Intrigado por los desafíos metodológicos que surgen de la complejidad del texto, este estudio evalúa la efectividad de diferentes algoritmos de modelado de temas basados en datos textuales de Instagram.
Metodología
Al tomar publicaciones de Instagram con #darktourism como contexto de estudio, esta investigación aplica la asignación de Dirichlet latente (LDA), la explicación de correlación (CorEx) y la factorización matricial no negativa (NMF) para descubrir experiencias turísticas.
Resultados
CorEx supera a LDA y NMF al clasificar los sitios y actividades oscuros emergentes en 17 temas distintos. Los resultados de LDA son homogéneos y se superponen, mientras que los temas extraídos de NMF no son lo suficientemente específicos como para obtener conocimientos profundos.
Originalidad
Este estudio evalúa diferentes algoritmos de modelado de temas para la extracción de conocimiento en la industria del turismo. Los hallazgos revelan la complejidad de analizar datos de redes sociales de texto corto y fortalecen el uso de CorEx para analizar el contenido de Instagram.
Details
Keywords
S. Ravikumar, Bidyut Bikash Boruah and Fullstar Lamin Gayang
The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989…
Abstract
Purpose
The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.
Design/methodology/approach
Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.
Findings
It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.
Practical implications
Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.
Originality/value
This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.
Details
Keywords
Jeffrey D. Kushkowski, Charles B. Shrader, Marc H. Anderson and Robert E. White
Multiple disciplines such as finance, management and economics have contributed to governance research over time. However, the full intellectual structure of the governance…
Abstract
Purpose
Multiple disciplines such as finance, management and economics have contributed to governance research over time. However, the full intellectual structure of the governance “field” including the exchange of knowledge across disciplines and the large variety of governance topics remains to be uncovered. To appreciate the breadth of corporate governance research, it is necessary to understand the disciplinary sources from which the research stems. This manuscript focuses on the interdisciplinary underpinnings of corporate governance research.
Design/methodology/approach
This paper employs bibliometric analysis to trace the evolution of corporate governance using articles included in the ISI Web of Science database between 1990 and 2015. Journals included in these categories encompass a full range of business disciplines and provide evidence of the multi-disciplinary nature of corporate governance. It also uncovers the topics treated by disciplines under the governance umbrella using a machine learning method called latent Dirichtlet allocation (LDA).
Findings
Corporate governance research deals with a number of strategy-related topics. Unlike strategy topics that reside in a single discipline, corporate governance crosses disciplinary boundaries and includes contributions from accounting, finance, economics, law and management. Our analysis shows that over 80% of corporate governance articles come from outside the field of management. Our LDA solution indicates that the major topics in governance research include corporate governance theory, control of family firms, executive compensation and audit committees.
Originality/value
The results illustrate that corporate governance is far more interdisciplinary than previously thought. This is an important insight for corporate governance academics and may lead to collaborative research. More importantly, this research illustrates the usefulness of LDA for investigating interdisciplinary fields. This method is easily transferable to other interdisciplinary fields and it provides a powerful alternative to existing bibliometric methods. We suggest a number of topic areas within library and information science where this method may be applied, including collection development, support for interdisciplinary faculty and basic research into emerging interdisciplinary areas.
Details
Keywords
Brahim Dib, Fahd Kalloubi, El Habib Nfaoui and Abdelhak Boulaalam
The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that…
Abstract
Purpose
The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that more people keep up with new streaming information on Twitter micro-blogging service. With the immense number of micro-posts shared via the follower/followee network graph, Twitter users find themselves in front of millions of tweets, which makes the task crucial.
Design/methodology/approach
In this paper, a long short–term memory (LSTM) model that relies on the latent Dirichlet allocation (LDA) output vector for followee recommendation, the LDA model applied as a topic modeling strategy is proposed.
Findings
This study trains the model using a real-life data set extracted based on Twitter follower/followee architecture. It confirms the effectiveness and scalability of the proposed approach. The approach improves the state-of-the-art models average-LSTM and time-LSTM.
Research limitations/implications
This study improves mainly the existing followee recommendation systems. Because, unlike previous studies, it applied a non-hand-crafted method which is the LSTM neural network with LDA model for topics extraction. The main limitation of this study is the cold-start users cannot be treated, also some active fake accounts may not be detected.
Practical implications
The aim of this approach is to assist users seeking appropriate information to read about, by choosing appropriate profiles to follow.
Social implications
This approach consolidates the social relationship between users in a microblogging platform by suggesting like-minded people to each other. Thus, finding users with the same interests will be easy without spending a lot of time seeking relevant users.
Originality/value
Instead of classic recommendation models, the paper provides an efficient neural network searching method to make it easier to find appropriate users to follow. Therefore, affording an effective followee recommendation system.
Details
Keywords
Debin Fang, Haixia Yang, Baojun Gao and Xiaojun Li
Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly…
Abstract
Purpose
Discovering the research topics and trends from a large quantity of library electronic references is essential for scientific research. Current research of this kind mainly depends on human justification. The purpose of this paper is to demonstrate how to identify research topics and evolution in trends from library electronic references efficiently and effectively by employing automatic text analysis algorithms.
Design/methodology/approach
The authors used the latent Dirichlet allocation (LDA), a probabilistic generative topic model to extract the latent topic from the large quantity of research abstracts. Then, the authors conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics.
Findings
First, this paper discovers 32 significant research topics from the abstracts of 3,737 articles published in the six top accounting journals during the period of 1992-2014. Second, based on the document-topic distributions generated by LDA, the authors identified seven hot topics and six cold topics from the 32 topics.
Originality/value
The topics discovered by LDA are highly consistent with the topics identified by human experts, indicating the validity and effectiveness of the methodology. Therefore, this paper provides novel knowledge to the accounting literature and demonstrates a methodology and process for topic discovery with lower cost and higher efficiency than the current methods.
Details
Keywords
San-Yih Hwang, Chih-Ping Wei, Chien-Hsiang Lee and Yu-Siang Chen
The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles…
Abstract
Purpose
The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles. Previous works on recommending articles to satisfy users’ short-term interests have utilized article content, usage logs, and more recently, coauthorship networks. The usefulness of coauthorship has been demonstrated by some research works, which, however, tend to adopt a simple coauthorship network that records only the strength of coauthorships. The purpose of this paper is to enhance the effectiveness of coauthorship-based recommendation by incorporating scholars’ collaboration topics into the coauthorship network.
Design/methodology/approach
The authors propose a latent Dirichlet allocation (LDA)-coauthorship-network-based method that integrates topic information into the links of the coauthorship networks using LDA, and a task-focused technique is developed for recommending literature articles.
Findings
The experimental results using information systems journal articles show that the proposed method is more effective than the previous coauthorship network-based method over all scenarios examined. The authors further develop a hybrid method that combines the results of content-based and LDA-coauthorship-network-based recommendations. The resulting hybrid method achieves greater or comparable recommendation effectiveness under all scenarios when compared to the content-based method.
Originality/value
This paper makes two contributions. The authors first show that topic model is indeed useful and can be incorporated into the construction of coaurthoship-network to improve literature recommendation. The authors subsequently demonstrate that coauthorship-network-based and content-based recommendations are complementary in their hit article rank distributions, and then devise a hybrid recommendation method to further improve the effectiveness of literature recommendation.
Details