Search results

1 – 10 of over 1000

View access options

Book part

Publication date: 12 November 2018

A Probabilistic Approach to Studies in Higher Education

Adriana Perez-Encinas and Jesus Rodriguez-Pomeda

Studies in higher education tend to use different methods and methodologies, from documentary analysis to auto/biographical and observational studies. Most studies are either…

HTML

PDF (150 KB)

EPUB (56 KB)

Abstract

Studies in higher education tend to use different methods and methodologies, from documentary analysis to auto/biographical and observational studies. Most studies are either qualitative or qualitative. A mixed-methods approach has emerged in recent years, in which the qualitative approach generally plays an important role. The purpose of this chapter is to show the potential of a new methodology that is also appropriate for higher education research and widely used in the social sciences: probabilistic topic models. A probabilistic method can be used to analyse and categorise thousands of words. After collecting large sets of texts, content analysis is used to deeply analyse the meaning of these words. The huge number of texts published today pushes researchers to employ new techniques in their search for hidden structures built upon a set of core ideas. These methods are called topic modelling algorithms, with Latent Dirichlet Allocation being the basic probabilistic topic model. The application of these new techniques to the field of higher education is extremely useful, for two reasons: (1) studies in this area deal in some cases with a great volume of data and (2) these techniques allow one to devise models in a way that is unsupervised by humans (even when researchers operate on the resulting model); thus they are less subjective than other types of analyses and methods used for qualitative purposes. This chapter shows the foundations and recent applications of the technique in the higher education field, as well as challenges related to this new technique.

Details

Theory and Method in Higher Education Research

Type: Book

DOI:

ISBN: 978-1-78769-277-0

Keywords

Open Access

Article

Publication date: 30 November 2021

Digital voice-of-customer processing by topic modelling algorithms: insights to validate empirical results

Federico Barravecchia, Luca Mastrogiacomo and Fiorenzo Franceschini

Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs…

HTML

PDF (1.3 MB)

Downloads

1735

Abstract

Purpose

Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs, habits and expectations. To this end, the most popular approach is based on the application of text mining algorithms named topic modelling. These algorithms can identify latent topics discussed within digital VoC and categorise each source (e.g. each review) based on its content. This paper aims to propose a structured procedure for validating the results produced by topic modelling algorithms.

Design/methodology/approach

The proposed procedure compares, on random samples, the results produced by topic modelling algorithms with those generated by human evaluators. The use of specific metrics allows to make a comparison between the two approaches and to provide a preliminary empirical validation.

Findings

The proposed procedure can address users of topic modelling algorithms in validating the obtained results. An application case study related to some car-sharing services supports the description.

Originality/value

Despite the vast success of topic modelling-based approaches, metrics and procedures to validate the obtained results are still lacking. This paper provides a first practical and structured validation procedure specifically employed for quality-related applications.

Details

International Journal of Quality & Reliability Management, vol. 39 no. 6

Type: Research Article

DOI:

ISSN: 0265-671X

Keywords

View access options

Article

Publication date: 29 April 2021

SenU-PTM: a novel phrase-based topic model for short-text topic discovery by exploiting word embeddings

Heng-Yang Lu, Yi Zhang and Yuntao Du

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…

HTML

PDF (917 KB)

Downloads

239

Abstract

Purpose

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.

Design/methodology/approach

SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.

Findings

Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.

Originality/value

The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Details

Data Technologies and Applications, vol. 55 no. 5

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 30 May 2018

Kindles, card catalogs, and the future of libraries: a collaborative digital humanities project

Anna L. Neatrour, Elizabeth Callaway and Rebekah Cummings

This paper aims to determine if the digital humanities technique of topic modeling would reveal interesting patterns in a corpus of library-themed literature focused on the future…

HTML

PDF (1.7 MB)

Downloads

1302

Abstract

Purpose

This paper aims to determine if the digital humanities technique of topic modeling would reveal interesting patterns in a corpus of library-themed literature focused on the future of libraries and pioneer a collaboration model in librarian-led digital humanities projects. By developing the project, librarians learned how to better support digital humanities by actually doing digital humanities, as well as gaining insight on the variety of approaches taken by researchers and commenters to the idea of the future of libraries.

Design/methodology/approach

The researchers collected a corpus of over 150 texts (articles, blog posts, book chapters, websites, etc.) that all addressed the future of the library. They ran several instances of latent Dirichlet allocation style topic modeling on the corpus using the programming language R. Once they produced a run in which the topics were cohesive and discrete, they produced word-clouds of the words associated with each topic, visualized topics through time and examined in detail the top five documents associated with each topic.

Findings

The research project provided an effective way for librarians to gain practical experience in digital humanities and develop a greater understanding of collaborative workflows in digital humanities. By examining a corpus of library-themed literature, the researchers gained new insight into how the profession grapples with the idea of the future and an appreciation for topic modeling as a form of literature review.

Originality/value

Topic modeling a future-themed corpus of library literature is a unique research project and provides a way to support collaboration between library faculty and researchers from outside the library.

Details

Digital Library Perspectives, vol. 34 no. 3

Type: Research Article

DOI:

ISSN: 2059-5816

Keywords

View access options

Article

Publication date: 10 May 2022

A comparison study of topic modeling based literature analysis by using full texts and abstracts of scientific articles: a case of COVID-19 research

Qiang Cao, Xian Cheng and Shaoyi Liao

How to extract useful information from a very large volume of literature is a great challenge for librarians. Topic modeling technique, which is a machine learning algorithm to…

HTML

PDF (8.5 MB)

Downloads

540

Abstract

Purpose

How to extract useful information from a very large volume of literature is a great challenge for librarians. Topic modeling technique, which is a machine learning algorithm to uncover latent thematic structures from large collections of documents, is a widespread approach in literature analysis, especially with the rapid growth of academic literature. In this paper, a comparison of topic modeling based literature analysis has been done using full texts and abstracts of articles.

Design/methodology/approach

The authors conduct a comparison study of topic modeling on full-text paper and corresponding abstract to assess the influence of the different types of documents been used as input for topic modeling. In particular, the authors use the large volumes of COVID-19 research literature as a case study for topic modeling based literature analysis. The authors illustrate the research topics, research trends and topic similarity of COVID-19 research by using Latent Dirichlet allocation (LDA) and topic visualization method.

Findings

The authors found 14 research topics for COVID-19 research. The authors also found that the topic similarity between using full-text paper and corresponding abstract is higher when more documents are analyzed.

Originality/value

First, this study contributes to the literature analysis approach. The comparison study can help us understand the influence of the different types of documents on the results of topic modeling analysis. Second, the authors present an overview of COVID-19 research by summarizing 14 research topics for it. This automated literature analysis can help specialists in the health and medical domain or other people to quickly grasp the structured morphology of the current studies for COVID-19.

Details

Library Hi Tech, vol. 41 no. 2

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 22 March 2024

Decoding mood of the Twitterverse on ESG investing: opinion mining and key themes using machine learning

Rachana Jaiswal, Shashank Gupta and Aviral Kumar Tiwari

Grounded in the stakeholder theory and signaling theory, this study aims to broaden the research agenda on environmental, social and governance (ESG) investing by uncovering…

HTML

PDF (745 KB)

Downloads

Abstract

Purpose

Grounded in the stakeholder theory and signaling theory, this study aims to broaden the research agenda on environmental, social and governance (ESG) investing by uncovering public sentiments and key themes using Twitter data spanning from 2009 to 2022.

Design/methodology/approach

Using various machine learning models for text tonality analysis and topic modeling, this research scrutinizes 1,842,985 Twitter texts to extract prevalent ESG investing trends and gauge their sentiment.

Findings

Gibbs Sampling Dirichlet Multinomial Mixture emerges as the optimal topic modeling method, unveiling significant topics such as “Physical risk of climate change,” “Employee Health, Safety and well-being” and “Water management and Scarcity.” RoBERTa, an attention-based model, outperforms other machine learning models in sentiment analysis, revealing a predominantly positive shift in public sentiment toward ESG investing over the past five years.

Research limitations/implications

This study establishes a framework for sentiment analysis and topic modeling on alternative data, offering a foundation for future research. Prospective studies can enhance insights by incorporating data from additional social media platforms like LinkedIn and Facebook.

Practical implications

Leveraging unstructured data on ESG from platforms like Twitter provides a novel avenue to capture company-related information, supplementing traditional self-reported sustainability disclosures. This approach opens new possibilities for understanding a company’s ESG standing.

Social implications

By shedding light on public perceptions of ESG investing, this research uncovers influential factors that often elude traditional corporate reporting. The findings empower both investors and the general public, aiding managers in refining ESG and management strategies.

Originality/value

This study marks a groundbreaking contribution to scholarly exploration, to the best of the authors’ knowledge, by being the first to analyze unstructured Twitter data in the context of ESG investing, offering unique insights and advancing the understanding of this emerging field.

Details

Management Research Review, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2040-8269

Keywords

View access options

Article

Publication date: 16 February 2023

Latent topics identification from the articles of Sri Lankan authors using LDA

S. Ravikumar, Bidyut Bikash Boruah and Fullstar Lamin Gayang

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989…

HTML

PDF (1.9 MB)

Downloads

103

Abstract

Purpose

The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.

Design/methodology/approach

Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.

Findings

It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.

Practical implications

Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.

Originality/value

This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

Open Access

Article

Publication date: 14 February 2023

Uncovering the structures of privacy research using bibliometric network analysis and topic modelling

Friso van Dijk, Joost Gadellaa, Chaïm van Toledo, Marco Spruit, Sjaak Brinkkemper and Matthieu Brinkhuis

This paper aims that privacy research is divided in distinct communities and rarely considered as a singular field, harming its disciplinary identity. The authors collected…

HTML

PDF (2.5 MB)

Downloads

622

Abstract

Purpose

This paper aims that privacy research is divided in distinct communities and rarely considered as a singular field, harming its disciplinary identity. The authors collected 119.810 publications and over 3 million references to perform a bibliometric domain analysis as a quantitative approach to uncover the structures within the privacy research field.

Design/methodology/approach

The bibliometric domain analysis consists of a combined directed network and topic model of published privacy research. The network contains 83,159 publications and 462,633 internal references. A Latent Dirichlet allocation (LDA) topic model from the same dataset offers an additional lens on structure by classifying each publication on 36 topics with the network data. The combined outcomes of these methods are used to investigate the structural position and topical make-up of the privacy research communities.

Findings

The authors identified the research communities as well as categorised their structural positioning. Four communities form the core of privacy research: individual privacy and law, cloud computing, location data and privacy-preserving data publishing. The latter is a macro-community of data mining, anonymity metrics and differential privacy. Surrounding the core are applied communities. Further removed are communities with little influence, most notably the medical communities that make up 14.4% of the network. The topic model shows system design as a potentially latent community. Noteworthy is the absence of a centralised body of knowledge on organisational privacy management.

Originality/value

This is the first in-depth, quantitative mapping study of all privacy research.

Details

Organizational Cybersecurity Journal: Practice, Process and People, vol. 3 no. 2

Type: Research Article

DOI:

ISSN: 2635-0270

Keywords

View access options

Article

Publication date: 14 October 2021

Identifying hidden semantic structures in Instagram data: a topic modelling comparison

Roman Egger and Joanne Yu

Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based…

HTML

PDF (439 KB)

Downloads

739

Abstract

Purpose

Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based on Instagram textual data.

Design/methodology/approach

By taking Instagram posts captioned with #darktourism as the study context, this research applies latent Dirichlet allocation (LDA), correlation explanation (CorEx), and non-negative matrix factorisation (NMF) to uncover tourist experiences.

Findings

CorEx outperforms LDA and NMF by classifying emerging dark sites and activities into 17 distinct topics. The results of LDA appear homogeneous and overlapping, whereas the extracted topics of NMF are not specific enough to gain deep insights.

Originality/value

This study assesses different topic modelling algorithms for knowledge extraction in the highly heterogeneous tourism industry. The findings unfold the complexity of analysing short-text social media data and strengthen the use of CorEx in analysing Instagram content.

研究目的

基于对文本复杂性的兴趣, 本研究以Instagram文本数据为基准, 旨在比较不同主题建模的算法的有效性。

研究方法

本研究以标有 #darktourism的Instagram帖子作为背景, 评估直观理解（LDA）, 相关解释（CorEx）和非负矩阵分解（NMF）在分析与黑暗观光相关的帖子的实用性。

研究结果

CorEx分析出17个新兴的黑暗景点和活动, 亦胜过LDA和NMF。虽然LDA能探讨出较多的主题数, 但它们的内容几乎重复。同样的, 尽管NMF适用于短文本数据, 但它提取出主题相当笼统且不够具体。

原创性

透过将营销和数据科学学科相结合, 本研究为分析非结构化的文本奠定了基础, 并证实了CorEx在分析短文本社交媒体数据（如Instagram数据）中的效益。

Propósito

Intrigado por los desafíos metodológicos que surgen de la complejidad del texto, este estudio evalúa la efectividad de diferentes algoritmos de modelado de temas basados en datos textuales de Instagram.

Metodología

Al tomar publicaciones de Instagram con #darktourism como contexto de estudio, esta investigación aplica la asignación de Dirichlet latente (LDA), la explicación de correlación (CorEx) y la factorización matricial no negativa (NMF) para descubrir experiencias turísticas.

Resultados

CorEx supera a LDA y NMF al clasificar los sitios y actividades oscuros emergentes en 17 temas distintos. Los resultados de LDA son homogéneos y se superponen, mientras que los temas extraídos de NMF no son lo suficientemente específicos como para obtener conocimientos profundos.

Originalidad

Este estudio evalúa diferentes algoritmos de modelado de temas para la extracción de conocimiento en la industria del turismo. Los hallazgos revelan la complejidad de analizar datos de redes sociales de texto corto y fortalecen el uso de CorEx para analizar el contenido de Instagram.

Details

Tourism Review, vol. 77 no. 4

Type: Research Article

DOI:

ISSN: 1660-5373

Keywords

View access options

Article

Publication date: 24 August 2018

The structural topic model for online review analysis: Comparison between green and non-green restaurants

Eunhye (Olivia) Park, Bongsug (Kevin) Chae and Junehee Kwon

The purpose of this study was to explore influences of review-related information on topical proportions and the pattern of word appearances in each topic (topical content) using…

HTML

PDF (1.2 MB)

Downloads

1164

Abstract

Purpose

The purpose of this study was to explore influences of review-related information on topical proportions and the pattern of word appearances in each topic (topical content) using structural topic model (STM).

Design/methodology/approach

For 173,607 Yelp.com reviews written in 2005-2016, STM-based topic modeling was applied with inclusion of covariates in addition to traditional statistical analyses.

Findings

Differences in topic prevalence and topical contents were found between certified green and non-certified restaurants. Customers’ recognition in sustainable food topics were changed over time.

Research limitations/implications

This study demonstrates the application of STM for the systematic analysis of a large amount of text data.

Originality/value

Limited study in the hospitality literature examined the influence of review-level metadata on topic and term estimation. Through topic modeling, customers’ natural responses toward green practices were identified.

研究目的

本研究旨在通过结构性话题建模（STM）方法以开拓评论性内容对于话题组成和词条构成的影响。

研究设计/方法/途径

本论文采用 173，607 份 Yelp.com 在 2015 至 2016 年间的评论内容为样本，STM 分析结合共变量形成话题性建模。

研究结果

话题趋势和话题内容的不同存在于认证过的绿色餐馆与非认证的绿色餐馆中。消费者对于可持续性的食物话题兴趣随着时间而改变。

研究理论限制/意义

本研究对 STM 相关大规模文本型数据的系统分析方法给与启示。

研究原创性/价值

在酒店管理文献中很少有文章研究评论性元数据对于话题和词条预估的影响。通过话题建模，消费者对于绿色措施的反馈获得了梳理和确认。

Details

Journal of Hospitality and Tourism Technology, vol. 11 no. 1

Type: Research Article

DOI:

ISSN: 1757-9880

Keywords

Access

Year

Content type

1 – 10 of over 1000