Search results

1 – 10 of over 93000
Article
Publication date: 29 April 2021

Heng-Yang Lu, Yi Zhang and Yuntao Du

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…

Abstract

Purpose

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.

Design/methodology/approach

SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.

Findings

Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.

Originality/value

The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 4 June 2021

Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC…

Abstract

Purpose

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.

Design/methodology/approach

The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.

Findings

The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.

Originality/value

The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.

Details

Library Hi Tech, vol. 39 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Open Access
Article
Publication date: 30 November 2021

Federico Barravecchia, Luca Mastrogiacomo and Fiorenzo Franceschini

Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs…

1733

Abstract

Purpose

Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs, habits and expectations. To this end, the most popular approach is based on the application of text mining algorithms named topic modelling. These algorithms can identify latent topics discussed within digital VoC and categorise each source (e.g. each review) based on its content. This paper aims to propose a structured procedure for validating the results produced by topic modelling algorithms.

Design/methodology/approach

The proposed procedure compares, on random samples, the results produced by topic modelling algorithms with those generated by human evaluators. The use of specific metrics allows to make a comparison between the two approaches and to provide a preliminary empirical validation.

Findings

The proposed procedure can address users of topic modelling algorithms in validating the obtained results. An application case study related to some car-sharing services supports the description.

Originality/value

Despite the vast success of topic modelling-based approaches, metrics and procedures to validate the obtained results are still lacking. This paper provides a first practical and structured validation procedure specifically employed for quality-related applications.

Details

International Journal of Quality & Reliability Management, vol. 39 no. 6
Type: Research Article
ISSN: 0265-671X

Keywords

Article
Publication date: 5 September 2019

Nastaran Hajiheydari, Mojtaba Talafidaryani, SeyedHossein Khabiri and Masoud Salehi

Although the business model field of study has been a focus of attention for both researchers and practitioners within the past two decades, it still suffers from concern about…

Abstract

Purpose

Although the business model field of study has been a focus of attention for both researchers and practitioners within the past two decades, it still suffers from concern about its identity. Accordingly, this paper aims to clarify the intellectual structure of business model through identifying the research clusters and their sub-clusters, the prominent relations and the dominant research trends.

Design/methodology/approach

This paper uses some common text mining methods including co-word analysis, burst analysis, timeline analysis and topic modeling to analyze and mine the title, abstract and keywords of 14,081 research documents related to the domain of business model.

Findings

The results revealed that the business model field of study consists of three main research areas including electronic business model, business model innovation and sustainable business model, each of which has some sub-areas and has been more evident in some particular industries. Additionally, from the time perspective, research issues in the domain of sustainable development are considered as the hot and emerging topics in this field. In addition, the results confirmed that information technology has been one of the most important drivers, influencing the appearance of different study topics in the various periods.

Originality/value

The contribution of this study is to quantitatively uncover the dominant knowledge structure and prominent research trends in the business model field of study, considering a broad range of scholarly publications and using some promising and reliable text mining techniques.

Details

foresight, vol. 21 no. 6
Type: Research Article
ISSN: 1463-6689

Keywords

Article
Publication date: 9 October 2023

Xiaoguang Wang, Yue Cheng, Tao Lv and Rongjiang Cai

The authors hope to filter valuable information from online reviews, obtain objective and accurate information about the demands of auto consumers and help auto companies develop…

Abstract

Purpose

The authors hope to filter valuable information from online reviews, obtain objective and accurate information about the demands of auto consumers and help auto companies develop more reasonable production and marketing strategies for healthy and sustainable development. This paper aims to discuss the aforementioned objectives.

Design/methodology/approach

The authors collected review data from online automotive forums and generated a corpus after pre-processing. Then, the authors extracted consumer demands and topics using the LDA model. Finally, the authors used a trained Word2vec tool to extend the consumer demand topics.

Findings

Different types of vehicle consumers have the same demands, such as “Space,” “Power Performance,” and “Brand Comparison,” and distinct demands, such as “Appearance,” “Safety,” “Service,” and “New Energy Features”; consumers who buy new energy vehicles are still accustomed to comparing with the brands or models of fuel vehicles; new energy vehicles consumers pay more attention to services and service quality during the purchasing and using process.

Research limitations/implications

The development time of new energy vehicles is relatively short, with some models being available for only one year or even six months. The smaller amount of available data may impact the applicability of topic models. The sample size, especially for new energy vehicles, needs to be increased to improve the general applicability of topic models further.

Practical implications

First, this measure helps online review websites improve their existing review publication mechanisms, enhance the overall quality of online review content, increase user traffic and promote the healthy development of online review websites. Second, this allows for timely adjustments in future product production and sales plans and further enhances automotive companies' ability to leverage online reviews for Internet marketing.

Originality/value

The authors have improved the accuracy and stability of the fused topic model, providing a scientific and efficient research tool for multi-dimensional topic mining of online reviews. With the help of research results, consumers can more easily understand the discussion topics and thus filter out valuable reference information. As a result, automotive companies may gain information about consumer demands and product quality feedback and thus quickly adjust production and marketing strategies to increase sales and market share.

Details

Marketing Intelligence & Planning, vol. 41 no. 8
Type: Research Article
ISSN: 0263-4503

Keywords

Article
Publication date: 10 February 2023

Van-Ho Nguyen and Thanh Ho

This study aims to analyse online customer experience in the hospitality industry through dynamic topic modelling (DTM) and net promoter score (NPS). A novel model that was used…

618

Abstract

Purpose

This study aims to analyse online customer experience in the hospitality industry through dynamic topic modelling (DTM) and net promoter score (NPS). A novel model that was used for collecting, pre-processing and analysing online reviews was proposed to understand the hidden information in the corpus and gain customer experience.

Design/methodology/approach

A corpus with 259,470 customer comments in English was collected. The researchers experimented and selected the best K parameter (number of topics) by perplexity and coherence score measurements as the input parameter for the model. Finally, the team experimented on the corpus using the Latent Dirichlet allocation (LDA) model and DTM with K coefficient to explore latent topics and trends of topics in the corpus over time.

Findings

The results of the topic model show hidden topics with the top high-probability keywords that are concerned with customers and the trends of topics over time. In addition, this study also calculated and analysed the NPS from customer rating scores and presented it on an overview dashboard.

Research limitations/implications

The data used in the experiment are only a part of all user comments; therefore, it may not reflect all of the current customer experience.

Practical implications

The management and business development of companies in the hotel industry can also benefit from the empirical findings from the topic model and NPS analytics, which will support decision-making to help businesses improve products and services, increase existing customer satisfaction and draw in new customers.

Originality/value

This study differs from previous works in that it attempts to fill a gap in research focused on online customer experience in the hospitality industry and uses text analytics and NPS to reach this goal.

研究目的

本研究旨在通过动态主题建模和净推荐值分析酒店业的在线客户体验。 提出了一种用于收集、预处理和分析在线评论的新模型, 以了解语料库中的隐藏信息并获得客户体验。

研究设计/方法/途径

收集了一个包含 259,470 条英文客户评论的语料库。 研究人员通过 Perplexity 和 Coherence Score 测量结果进行了实验, 并选择了最佳的 K 参数(主题数量)作为模型的输入参数。 最后, 团队使用 Latent Dirichlet allocation (LDA) 模型和具有 K 系数的 Dynamic Topic Model (DTM) 在语料库上进行实验, 以探索语料库中的潜在主题和主题随时间变化的趋势。

研究发现

主题模型的结果显示了隐藏的主题, 其中包含与客户相关的顶级高概率关键字以及主题随时间的变化趋势。 此外, 该研究还根据客户评分计算和分析净推荐值 (NPS), 并将其显示在概览仪表板上。

研究局限性/意义

实验中使用的数据只是所有用户评论的一部分; 因此, 它可能无法反映所有当前的客户体验。

实践意义

酒店业公司的管理和业务发展也可以受益于主题模型和 NPS 分析的实证结果, 这将支持决策制定, 帮助企业改进产品和服务, 提高现有客户满意度, 并吸引新客户 .

研究原创性/价值

本研究不同于以往的研究, 因为它试图填补以酒店业在线客户体验为重点的研究空白, 并使用文本分析和 NPS 来实现这一目标。

Details

Journal of Hospitality and Tourism Technology, vol. 14 no. 2
Type: Research Article
ISSN: 1757-9880

Keywords

Article
Publication date: 29 September 2021

Ziang Wang and Feng Yang

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no…

Abstract

Purpose

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word.

Design/methodology/approach

First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model.

Findings

The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment.

Originality/value

This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.

Details

Journal of Modelling in Management, vol. 18 no. 1
Type: Research Article
ISSN: 1746-5664

Keywords

Article
Publication date: 10 May 2022

Qiang Cao, Xian Cheng and Shaoyi Liao

How to extract useful information from a very large volume of literature is a great challenge for librarians. Topic modeling technique, which is a machine learning algorithm to…

Abstract

Purpose

How to extract useful information from a very large volume of literature is a great challenge for librarians. Topic modeling technique, which is a machine learning algorithm to uncover latent thematic structures from large collections of documents, is a widespread approach in literature analysis, especially with the rapid growth of academic literature. In this paper, a comparison of topic modeling based literature analysis has been done using full texts and abstracts of articles.

Design/methodology/approach

The authors conduct a comparison study of topic modeling on full-text paper and corresponding abstract to assess the influence of the different types of documents been used as input for topic modeling. In particular, the authors use the large volumes of COVID-19 research literature as a case study for topic modeling based literature analysis. The authors illustrate the research topics, research trends and topic similarity of COVID-19 research by using Latent Dirichlet allocation (LDA) and topic visualization method.

Findings

The authors found 14 research topics for COVID-19 research. The authors also found that the topic similarity between using full-text paper and corresponding abstract is higher when more documents are analyzed.

Originality/value

First, this study contributes to the literature analysis approach. The comparison study can help us understand the influence of the different types of documents on the results of topic modeling analysis. Second, the authors present an overview of COVID-19 research by summarizing 14 research topics for it. This automated literature analysis can help specialists in the health and medical domain or other people to quickly grasp the structured morphology of the current studies for COVID-19.

Details

Library Hi Tech, vol. 41 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 22 July 2021

Linxia Zhong, Wei Wei and Shixuan Li

Because of the extensive user coverage of news sites and apps, greater social and commercial value can be realized if users can access their favourite news as easily as possible…

Abstract

Purpose

Because of the extensive user coverage of news sites and apps, greater social and commercial value can be realized if users can access their favourite news as easily as possible. However, news has a timeliness factor; there are serious cold start and data sparsity in news recommendation, and news users are more susceptible to recent topical news. Therefore, this study aims to propose a personalized news recommendation approach based on topic model and restricted Boltzmann machine (RBM).

Design/methodology/approach

Firstly, the model extracts the news topic information based on the LDA2vec topic model. Then, the implicit behaviour data are analysed and converted into explicit rating data according to the rules. The highest weight is assigned to recent hot news stories. Finally, the topic information and the rating data are regarded as the conditional layer and visual layer of the conditional RBM (CRBM) model, respectively, to implement news recommendations.

Findings

The experimental results show that using LDA2vec-based news topic as a conditional layer in the CRBM model provides a higher prediction rating and improves the effectiveness of news recommendations.

Originality/value

This study proposes a personalized news recommendation approach based on an improved CRBM. Topic model is applied to news topic extraction and used as the conditional layer of the CRBM. It not only alleviates the sparseness of rating data to improve the efficient in CRBM but also considers that readers are more susceptible to popular or trending news.

Details

The Electronic Library , vol. 39 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 29 June 2021

Jongdae Kim, Youseok Lee and Inseong Song

The purpose of this paper is to develop a predictive model for box office performance based on the textual information in movie scripts in the green-lighting process of movie…

Abstract

Purpose

The purpose of this paper is to develop a predictive model for box office performance based on the textual information in movie scripts in the green-lighting process of movie production.

Design/methodology/approach

The authors use Latent Dirichlet Allocation to determine the hidden textual structure in movie scripts by extracting topic probabilities as predictors for classification. The extracted topic probabilities are used as inputs for the predictive model for the box office performance. For the predictive model, the authors utilize a variety of classification algorithms such as logistic classification, decision trees, random forests, k-nearest neighbor algorithms, support vector machines and artificial neural networks, and compare their relative performances in predicting movies' market performance.

Findings

This approach for extracting textual information from movie scripts produces a valuable typology for movies. Moreover, our modeling approach has significant power to predict movie scripts' profitability. It provides a superior prediction performance compared to previous benchmarks, such as that of Eliashberg et al. (2007).

Research limitations/implications

This work contributes to literature on predicting the box office performance in the green-lighting process and literature regarding suggesting models for the idea screening stage in the new product development process. Besides, this is one of the few studies that use movie script data to predict movies' financial performance by proposing an approach to integrate text mining models and machine learning algorithms with movie experts' intuition.

Practical implications

First, the authors’ approach can significantly reduce the financial risk associated with movie production decisions before the pre-production stage. Second, this paper proposes an approach that is applicable at a very early stage of new product development, such as the idea screening stage. The authors also introduce an online-based movie scenario database system that can help movie studios make more systematic and profitable decisions in the green-lighting process. Third, this approach can help movie studios estimate movie scripts' financial value.

Originality/value

This study is one of the few studies to forecast market performance in the green-lighting process.

Details

Internet Research, vol. 32 no. 3
Type: Research Article
ISSN: 1066-2243

Keywords

1 – 10 of over 93000