Search results

1 – 10 of 1000

View access options

Article

Publication date: 4 June 2021

Citation context-based topic models: discovering cited and citing topics from full text

Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC…

HTML

PDF (1.2 MB)

Downloads

374

Abstract

Purpose

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.

Design/methodology/approach

The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.

Findings

The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.

Originality/value

The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.

Details

Library Hi Tech, vol. 39 no. 4

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 29 September 2021

Mining numerical measure of consumers’ product evaluation expressed in words based on latent Dirichlet allocation

Ziang Wang and Feng Yang

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no…

HTML

PDF (1.7 MB)

Downloads

219

Abstract

Purpose

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word.

Design/methodology/approach

First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model.

Findings

The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment.

Originality/value

This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.

Details

Journal of Modelling in Management, vol. 18 no. 1

Type: Research Article

DOI:

ISSN: 1746-5664

Keywords

View access options

Article

Publication date: 5 September 2017

MFS-LDA: a multi-feature space tag recommendation model for cold start problem

Muhammad Ali Masood, Rabeeh Ayaz Abbasi, Onaiza Maqbool, Mubashar Mushtaq, Naif R. Aljohani, Ali Daud, Muhammad Ahtisham Aslam and Jalal S. Alowibdi

Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the…

HTML

PDF (321 KB)

Downloads

487

Abstract

Purpose

Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the cold start problem), popularity-based tag recommendation methods fail to work. The purpose of this paper is to propose a novel model for tag recommendation called multi-feature space latent Dirichlet allocation (MFS-LDA) for cold start problem.

Design/methodology/approach

MFS-LDA is a novel latent Dirichlet allocation (LDA)-based model which exploits multiple feature spaces (title, contents, and tags) for recommending tags. Exploiting multiple feature spaces allows MFS-LDA to recommend tags even if data from a feature space is missing (the cold start problem).

Findings

Evaluation of a publicly available data set consisting of around 20,000 Wikipedia articles that are tagged on a social bookmarking website shows a significant improvement over existing LDA-based tag recommendation methods.

Originality/value

The originality of MFS-LDA lies in segregation of features for removing bias toward dominant features and in synchronization of multiple feature space for tag recommendation.

Details

Program, vol. 51 no. 3

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 14 October 2021

Identifying hidden semantic structures in Instagram data: a topic modelling comparison

Roman Egger and Joanne Yu

Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based…

HTML

PDF (439 KB)

Downloads

738

Abstract

Purpose

Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based on Instagram textual data.

Design/methodology/approach

By taking Instagram posts captioned with #darktourism as the study context, this research applies latent Dirichlet allocation (LDA), correlation explanation (CorEx), and non-negative matrix factorisation (NMF) to uncover tourist experiences.

Findings

CorEx outperforms LDA and NMF by classifying emerging dark sites and activities into 17 distinct topics. The results of LDA appear homogeneous and overlapping, whereas the extracted topics of NMF are not specific enough to gain deep insights.

Originality/value

This study assesses different topic modelling algorithms for knowledge extraction in the highly heterogeneous tourism industry. The findings unfold the complexity of analysing short-text social media data and strengthen the use of CorEx in analysing Instagram content.

研究目的

基于对文本复杂性的兴趣, 本研究以Instagram文本数据为基准, 旨在比较不同主题建模的算法的有效性。

研究方法

本研究以标有 #darktourism的Instagram帖子作为背景, 评估直观理解（LDA）, 相关解释（CorEx）和非负矩阵分解（NMF）在分析与黑暗观光相关的帖子的实用性。

研究结果

CorEx分析出17个新兴的黑暗景点和活动, 亦胜过LDA和NMF。虽然LDA能探讨出较多的主题数, 但它们的内容几乎重复。同样的, 尽管NMF适用于短文本数据, 但它提取出主题相当笼统且不够具体。

原创性

透过将营销和数据科学学科相结合, 本研究为分析非结构化的文本奠定了基础, 并证实了CorEx在分析短文本社交媒体数据（如Instagram数据）中的效益。

Propósito

Intrigado por los desafíos metodológicos que surgen de la complejidad del texto, este estudio evalúa la efectividad de diferentes algoritmos de modelado de temas basados en datos textuales de Instagram.

Metodología

Al tomar publicaciones de Instagram con #darktourism como contexto de estudio, esta investigación aplica la asignación de Dirichlet latente (LDA), la explicación de correlación (CorEx) y la factorización matricial no negativa (NMF) para descubrir experiencias turísticas.

Resultados

CorEx supera a LDA y NMF al clasificar los sitios y actividades oscuros emergentes en 17 temas distintos. Los resultados de LDA son homogéneos y se superponen, mientras que los temas extraídos de NMF no son lo suficientemente específicos como para obtener conocimientos profundos.

Originalidad

Este estudio evalúa diferentes algoritmos de modelado de temas para la extracción de conocimiento en la industria del turismo. Los hallazgos revelan la complejidad de analizar datos de redes sociales de texto corto y fortalecen el uso de CorEx para analizar el contenido de Instagram.

Details

Tourism Review, vol. 77 no. 4

Type: Research Article

DOI:

ISSN: 1660-5373

Keywords

View access options

Article

Publication date: 16 February 2023

Latent topics identification from the articles of Sri Lankan authors using LDA

S. Ravikumar, Bidyut Bikash Boruah and Fullstar Lamin Gayang

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989…

HTML

PDF (1.9 MB)

Downloads

103

Abstract

Purpose

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.

Design/methodology/approach

Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.

Findings

It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.

Practical implications

Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.

Originality/value

This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

View access options

Article

Publication date: 14 June 2021

Incorporating LDA with LSTM for followee recommendation on Twitter network

Brahim Dib, Fahd Kalloubi, El Habib Nfaoui and Abdelhak Boulaalam

The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that…

HTML

PDF (772 KB)

Downloads

128

Abstract

Purpose

The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that more people keep up with new streaming information on Twitter micro-blogging service. With the immense number of micro-posts shared via the follower/followee network graph, Twitter users find themselves in front of millions of tweets, which makes the task crucial.

Design/methodology/approach

In this paper, a long short–term memory (LSTM) model that relies on the latent Dirichlet allocation (LDA) output vector for followee recommendation, the LDA model applied as a topic modeling strategy is proposed.

Findings

This study trains the model using a real-life data set extracted based on Twitter follower/followee architecture. It confirms the effectiveness and scalability of the proposed approach. The approach improves the state-of-the-art models average-LSTM and time-LSTM.

Research limitations/implications

This study improves mainly the existing followee recommendation systems. Because, unlike previous studies, it applied a non-hand-crafted method which is the LSTM neural network with LDA model for topics extraction. The main limitation of this study is the cold-start users cannot be treated, also some active fake accounts may not be detected.

Practical implications

The aim of this approach is to assist users seeking appropriate information to read about, by choosing appropriate profiles to follow.

Social implications

This approach consolidates the social relationship between users in a microblogging platform by suggesting like-minded people to each other. Thus, finding users with the same interests will be easy without spending a lot of time seeking relevant users.

Originality/value

Instead of classic recommendation models, the paper provides an efficient neural network searching method to make it easier to find appropriate users to follow. Therefore, affording an effective followee recommendation system.

Details

International Journal of Web Information Systems, vol. 17 no. 3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Book part

Publication date: 13 December 2017

Semantic Search of Online Reviews on E-Business Platforms

Qiongwei Ye and Baojun Ma

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to…

HTML

PDF (2.2 MB)

EPUB (930 KB)

Abstract

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to revolutionize business and society. Split into four distinct sections, the book first lays out the theoretical foundations and fundamental concepts of E-Business before moving on to look at internet+ innovation models and their applications in different industries such as agriculture, finance and commerce. The book then provides a comprehensive analysis of E-business platforms and their applications in China before finishing with four comprehensive case studies of major E-business projects, providing readers with successful examples of implementing E-Business entrepreneurship projects.

Internet + and Electronic Business in China is a comprehensive resource that provides insights and analysis into how E-commerce has revolutionized and continues to revolutionize business and society in China.

Details

Internet+ and Electronic Business in China: Innovation and Applications

Type: Book

DOI:

ISBN: 978-1-78743-115-7

View access options

Article

Publication date: 18 July 2016

A study of user profile representation for personalized cross-language information retrieval

Dong Zhou, Séamus Lawless, Xuan Wu, Wenyu Zhao and Jianxun Liu

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native…

HTML

PDF (446 KB)

Downloads

1144

Abstract

Purpose

With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.

Design/methodology/approach

The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.

Findings

Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.

Originality/value

Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.

Details

Aslib Journal of Information Management, vol. 68 no. 4

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 12 June 2017

Coauthorship network-based literature recommendation with topic model

San-Yih Hwang, Chih-Ping Wei, Chien-Hsiang Lee and Yu-Siang Chen

The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles…

HTML

PDF (658 KB)

Downloads

624

Abstract

Purpose

The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles. Previous works on recommending articles to satisfy users’ short-term interests have utilized article content, usage logs, and more recently, coauthorship networks. The usefulness of coauthorship has been demonstrated by some research works, which, however, tend to adopt a simple coauthorship network that records only the strength of coauthorships. The purpose of this paper is to enhance the effectiveness of coauthorship-based recommendation by incorporating scholars’ collaboration topics into the coauthorship network.

Design/methodology/approach

The authors propose a latent Dirichlet allocation (LDA)-coauthorship-network-based method that integrates topic information into the links of the coauthorship networks using LDA, and a task-focused technique is developed for recommending literature articles.

Findings

The experimental results using information systems journal articles show that the proposed method is more effective than the previous coauthorship network-based method over all scenarios examined. The authors further develop a hybrid method that combines the results of content-based and LDA-coauthorship-network-based recommendations. The resulting hybrid method achieves greater or comparable recommendation effectiveness under all scenarios when compared to the content-based method.

Originality/value

This paper makes two contributions. The authors first show that topic model is indeed useful and can be incorporated into the construction of coaurthoship-network to improve literature recommendation. The authors subsequently demonstrate that coauthorship-network-based and content-based recommendations are complementary in their hit article rank distributions, and then devise a hybrid recommendation method to further improve the effectiveness of literature recommendation.

Details

Online Information Review, vol. 41 no. 3

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 9 November 2021

Exploring destination image through online reviews: an augmented mining model using latent Dirichlet allocation combined with probabilistic hesitant fuzzy algorithm

Yuyan Luo, Tao Tong, Xiaoxu Zhang, Zheng Yang and Ling Li

In the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for…

HTML

PDF (2 MB)

Downloads

425

Abstract

Purpose

In the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for tourists and scenic-area managers. The study aims to help scenic-area managers determine the strengths and weaknesses in the development process of scenic areas and to solve the practical problem of tourists' difficulty in quickly and accurately obtaining the destination image of a scenic area and finding a scenic area that meets their needs.

Design/methodology/approach

The study uses a variety of machine learning methods, namely, the latent Dirichlet allocation (LDA) theme extraction model, term frequency-inverse document frequency (TF-IDF) weighting method and sentiment analysis. This work also incorporates probabilistic hesitant fuzzy algorithm (PHFA) in multi-attribute decision-making to form an enhanced tourism destination image mining and analysis model based on visitor expression information. The model is intended to help managers and visitors identify the strengths and weaknesses in the development of scenic areas. Jiuzhaigou is used as an example for empirical analysis.

Findings

In the study, a complete model for the mining analysis of tourism destination image was constructed, and 24,222 online reviews on Jiuzhaigou, China were analyzed in text. The results revealed a total of 10 attributes and 100 attribute elements. From the identified attributes, three negative attributes were identified, namely, crowdedness, tourism cost and accommodation environment. The study provides suggestions for tourists to select attractions and offers recommendations and improvement measures for Jiuzhaigou in terms of crowd control and post-disaster reconstruction.

Originality/value

Previous research in this area has used small sample data for qualitative analysis. Thus, the current study fills this gap in the literature by proposing a machine learning method that incorporates PHFA through the combination of the ideas of management and multi-attribute decision theory. In addition, the study considers visitors' emotions and thematic preferences from the perspective of their expressed information, based on which the tourism destination image is analyzed. Optimization strategies are provided to help managers of scenic spots in their decision-making.

Details

Kybernetes, vol. 52 no. 3

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

Access

Year

Content type

1 – 10 of 1000