Search results

1 – 10 of over 93000
To view the access options for this content please click here
Article
Publication date: 29 April 2021

Heng-Yang Lu, Yi Zhang and Yuntao Du

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…

Abstract

Purpose

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.

Design/methodology/approach

SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.

Findings

Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.

Originality/value

The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

To view the access options for this content please click here
Article
Publication date: 4 June 2021

Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation…

Abstract

Purpose

Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.

Design/methodology/approach

The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.

Findings

The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.

Originality/value

The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

To view the access options for this content please click here
Article
Publication date: 5 September 2019

Nastaran Hajiheydari, Mojtaba Talafidaryani, SeyedHossein Khabiri and Masoud Salehi

Although the business model field of study has been a focus of attention for both researchers and practitioners within the past two decades, it still suffers from concern…

Abstract

Purpose

Although the business model field of study has been a focus of attention for both researchers and practitioners within the past two decades, it still suffers from concern about its identity. Accordingly, this paper aims to clarify the intellectual structure of business model through identifying the research clusters and their sub-clusters, the prominent relations and the dominant research trends.

Design/methodology/approach

This paper uses some common text mining methods including co-word analysis, burst analysis, timeline analysis and topic modeling to analyze and mine the title, abstract and keywords of 14,081 research documents related to the domain of business model.

Findings

The results revealed that the business model field of study consists of three main research areas including electronic business model, business model innovation and sustainable business model, each of which has some sub-areas and has been more evident in some particular industries. Additionally, from the time perspective, research issues in the domain of sustainable development are considered as the hot and emerging topics in this field. In addition, the results confirmed that information technology has been one of the most important drivers, influencing the appearance of different study topics in the various periods.

Originality/value

The contribution of this study is to quantitatively uncover the dominant knowledge structure and prominent research trends in the business model field of study, considering a broad range of scholarly publications and using some promising and reliable text mining techniques.

Details

foresight, vol. 21 no. 6
Type: Research Article
ISSN: 1463-6689

Keywords

To view the access options for this content please click here
Article
Publication date: 29 September 2021

Ziang Wang and Feng Yang

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no…

Abstract

Purpose

It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word.

Design/methodology/approach

First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model.

Findings

The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment.

Originality/value

This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.

Details

Journal of Modelling in Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1746-5664

Keywords

To view the access options for this content please click here
Article
Publication date: 22 July 2021

Linxia Zhong, Wei Wei and Shixuan Li

Because of the extensive user coverage of news sites and apps, greater social and commercial value can be realized if users can access their favourite news as easily as…

Abstract

Purpose

Because of the extensive user coverage of news sites and apps, greater social and commercial value can be realized if users can access their favourite news as easily as possible. However, news has a timeliness factor; there are serious cold start and data sparsity in news recommendation, and news users are more susceptible to recent topical news. Therefore, this study aims to propose a personalized news recommendation approach based on topic model and restricted Boltzmann machine (RBM).

Design/methodology/approach

Firstly, the model extracts the news topic information based on the LDA2vec topic model. Then, the implicit behaviour data are analysed and converted into explicit rating data according to the rules. The highest weight is assigned to recent hot news stories. Finally, the topic information and the rating data are regarded as the conditional layer and visual layer of the conditional RBM (CRBM) model, respectively, to implement news recommendations.

Findings

The experimental results show that using LDA2vec-based news topic as a conditional layer in the CRBM model provides a higher prediction rating and improves the effectiveness of news recommendations.

Originality/value

This study proposes a personalized news recommendation approach based on an improved CRBM. Topic model is applied to news topic extraction and used as the conditional layer of the CRBM. It not only alleviates the sparseness of rating data to improve the efficient in CRBM but also considers that readers are more susceptible to popular or trending news.

Details

The Electronic Library , vol. 39 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

To view the access options for this content please click here
Book part
Publication date: 30 August 2019

Fulya Ozcan

This chapter investigates the behavior of Reddit’s news subreddit users and the relationship between their sentiment on exchange rates. Using graphical models and natural…

Abstract

This chapter investigates the behavior of Reddit’s news subreddit users and the relationship between their sentiment on exchange rates. Using graphical models and natural language processing, hidden online communities among Reddit users are discovered. The data set used in this project is a mixture of text and categorical data from Reddit’s news subreddit. These data include the titles of the news pages, as well as a few user characteristics, in addition to users’ comments. This data set is an excellent resource to study user reaction to news since their comments are directly linked to the webpage contents. The model considered in this chapter is a hierarchical mixture model which is a generative model that detects overlapping networks using the sentiment from the user generated content. The advantage of this model is that the communities (or groups) are assumed to follow a Chinese restaurant process, and therefore it can automatically detect and cluster the communities. The hidden variables and the hyperparameters for this model are obtained using Gibbs sampling.

Details

Topics in Identification, Limited Dependent Variables, Partial Observability, Experimentation, and Flexible Modeling: Part A
Type: Book
ISBN: 978-1-78973-241-2

Keywords

To view the access options for this content please click here
Article
Publication date: 29 June 2021

Jongdae Kim, Youseok Lee and Inseong Song

The purpose of this paper is to develop a predictive model for box office performance based on the textual information in movie scripts in the green-lighting process of…

Abstract

Purpose

The purpose of this paper is to develop a predictive model for box office performance based on the textual information in movie scripts in the green-lighting process of movie production.

Design/methodology/approach

The authors use Latent Dirichlet Allocation to determine the hidden textual structure in movie scripts by extracting topic probabilities as predictors for classification. The extracted topic probabilities are used as inputs for the predictive model for the box office performance. For the predictive model, the authors utilize a variety of classification algorithms such as logistic classification, decision trees, random forests, k-nearest neighbor algorithms, support vector machines and artificial neural networks, and compare their relative performances in predicting movies' market performance.

Findings

This approach for extracting textual information from movie scripts produces a valuable typology for movies. Moreover, our modeling approach has significant power to predict movie scripts' profitability. It provides a superior prediction performance compared to previous benchmarks, such as that of Eliashberg et al. (2007).

Research limitations/implications

This work contributes to literature on predicting the box office performance in the green-lighting process and literature regarding suggesting models for the idea screening stage in the new product development process. Besides, this is one of the few studies that use movie script data to predict movies' financial performance by proposing an approach to integrate text mining models and machine learning algorithms with movie experts' intuition.

Practical implications

First, the authors’ approach can significantly reduce the financial risk associated with movie production decisions before the pre-production stage. Second, this paper proposes an approach that is applicable at a very early stage of new product development, such as the idea screening stage. The authors also introduce an online-based movie scenario database system that can help movie studios make more systematic and profitable decisions in the green-lighting process. Third, this approach can help movie studios estimate movie scripts' financial value.

Originality/value

This study is one of the few studies to forecast market performance in the green-lighting process.

Details

Internet Research, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1066-2243

Keywords

To view the access options for this content please click here
Article
Publication date: 7 August 2017

Daniel Carnerud

The purpose of this paper is to explore and describe research presented in the International Journal of Quality & Reliability Management (IJQRM), thereby creating an…

Abstract

Purpose

The purpose of this paper is to explore and describe research presented in the International Journal of Quality & Reliability Management (IJQRM), thereby creating an increased understanding of how the areas of research have evolved through the years. An additional purpose is to show how text mining methodology can be used as a tool for exploration and description of research publications.

Design/methodology/approach

The study applies text mining methodologies to explore and describe the digital library of IJQRM from 1984 up to 2014. To structure and condense the data, k-means clustering and probabilistic topic modeling with latent Dirichlet allocation is applied. The data set consists of research paper abstracts.

Findings

The results support the suggestion of the occurrence of trends, fads and fashion in research publications. Research on quality function deployment (QFD) and reliability management are noted to be on the downturn whereas research on Six Sigma with a focus on lean, innovation, performance and improvement on the rise. Furthermore, the study confirms IJQRM as a scientific journal with quality and reliability management as primary areas of coverage, accompanied by specific topics such as total quality management, service quality, process management, ISO, QFD and Six Sigma. The study also gives an insight into how text mining can be used as a way to efficiently explore and describe large quantities of research paper abstracts.

Research limitations/implications

The study focuses on abstracts of research papers, thus topics and categories that could be identified via other journal publications, such as book reviews; general reviews; secondary articles; editorials; guest editorials; awards for excellence (notifications); introductions or summaries from conferences; notes from the publisher; and articles without an abstract, are excluded.

Originality/value

There do not seem to be any prior text mining studies that apply cluster modeling and probabilistic topic modeling to research article abstracts in the IJQRM. This study therefore offers a unique perspective on the journal’s content.

Details

International Journal of Quality & Reliability Management, vol. 34 no. 7
Type: Research Article
ISSN: 0265-671X

Keywords

To view the access options for this content please click here
Article
Publication date: 20 August 2021

Ming K. Lim, Yan Li and Xinyu Song

With the fierce competition in the cold chain logistics market, achieving and maintaining excellent customer satisfaction is the key to an enterprise's ability to stand…

Abstract

Purpose

With the fierce competition in the cold chain logistics market, achieving and maintaining excellent customer satisfaction is the key to an enterprise's ability to stand out. This research aims to determine the factors that affect customer satisfaction in cold chain logistics, which helps cold chain logistics enterprises identify the main aspects of the problem. Further, the suggestions are provided for cold chain logistics enterprises to improve customer satisfaction.

Design/methodology/approach

This research uses the text mining approach, including topic modeling and sentiment analysis, to analyze the information implicit in customer-generated reviews. First, latent Dirichlet allocation (LDA) model is used to identify the topics that customers focus on. Furthermore, to explore the sentiment polarity of different topics, bi-directional long short-term memory (Bi-LSTM), a type of deep learning model, is adopted to quantify the sentiment score. Last, regression analysis is performed to identify the significant factors that affect positive, neutral and negative sentiment.

Findings

The results show that eight topics that customer focus are determined, namely, speed, price, cold chain transportation, package, quality, error handling, service staff and logistics information. Among them, speed, price, transportation and product quality significantly affect customer positive sentiment, and error handling and service staff are significant factors affecting customer neutral and negative sentiment, respectively.

Research limitations/implications

The data of the customer-generated reviews in this research are in Chinese. In the future, multi-lingual research can be conducted to obtain more comprehensive insights.

Originality/value

Prior studies on customer satisfaction in cold chain logistics predominantly used questionnaire method, and the disadvantage of which is that interviewees may fill out the questionnaire arbitrarily, which leads to inaccurate data. For this reason, it is more scientific to discover customer satisfaction from real behavioral data. In response, customer-generated reviews that reflect true emotions are used as the data source for this research.

Details

Industrial Management & Data Systems, vol. 121 no. 12
Type: Research Article
ISSN: 0263-5577

Keywords

To view the access options for this content please click here
Book part
Publication date: 12 November 2018

Adriana Perez-Encinas and Jesus Rodriguez-Pomeda

Studies in higher education tend to use different methods and methodologies, from documentary analysis to auto/biographical and observational studies. Most studies are…

Abstract

Studies in higher education tend to use different methods and methodologies, from documentary analysis to auto/biographical and observational studies. Most studies are either qualitative or qualitative. A mixed-methods approach has emerged in recent years, in which the qualitative approach generally plays an important role. The purpose of this chapter is to show the potential of a new methodology that is also appropriate for higher education research and widely used in the social sciences: probabilistic topic models. A probabilistic method can be used to analyse and categorise thousands of words. After collecting large sets of texts, content analysis is used to deeply analyse the meaning of these words. The huge number of texts published today pushes researchers to employ new techniques in their search for hidden structures built upon a set of core ideas. These methods are called topic modelling algorithms, with Latent Dirichlet Allocation being the basic probabilistic topic model. The application of these new techniques to the field of higher education is extremely useful, for two reasons: (1) studies in this area deal in some cases with a great volume of data and (2) these techniques allow one to devise models in a way that is unsupervised by humans (even when researchers operate on the resulting model); thus they are less subjective than other types of analyses and methods used for qualitative purposes. This chapter shows the foundations and recent applications of the technique in the higher education field, as well as challenges related to this new technique.

Details

Theory and Method in Higher Education Research
Type: Book
ISBN: 978-1-78769-277-0

Keywords

1 – 10 of over 93000