Search results
1 – 10 of over 122000Bernardo Cerqueira de Lima, Renata Maria Abrantes Baracho, Thomas Mandl and Patricia Baracho Porto
Social media platforms that disseminate scientific information to the public during the COVID-19 pandemic highlighted the importance of the topic of scientific communication…
Abstract
Purpose
Social media platforms that disseminate scientific information to the public during the COVID-19 pandemic highlighted the importance of the topic of scientific communication. Content creators in the field, as well as researchers who study the impact of scientific information online, are interested in how people react to these information resources and how they judge them. This study aims to devise a framework for extracting large social media datasets and find specific feedback to content delivery, enabling scientific content creators to gain insights into how the public perceives scientific information.
Design/methodology/approach
To collect public reactions to scientific information, the study focused on Twitter users who are doctors, researchers, science communicators or representatives of research institutes, and processed their replies for two years from the start of the pandemic. The study aimed in developing a solution powered by topic modeling enhanced by manual validation and other machine learning techniques, such as word embeddings, that is capable of filtering massive social media datasets in search of documents related to reactions to scientific communication. The architecture developed in this paper can be replicated for finding any documents related to niche topics in social media data. As a final step of our framework, we also fine-tuned a large language model to be able to perform the classification task with even more accuracy, forgoing the need of more human validation after the first step.
Findings
We provided a framework capable of receiving a large document dataset, and, with the help of with a small degree of human validation at different stages, is able to filter out documents within the corpus that are relevant to a very underrepresented niche theme inside the database, with much higher precision than traditional state-of-the-art machine learning algorithms. Performance was improved even further by the fine-tuning of a large language model based on BERT, which would allow for the use of such model to classify even larger unseen datasets in search of reactions to scientific communication without the need for further manual validation or topic modeling.
Research limitations/implications
The challenges of scientific communication are even higher with the rampant increase of misinformation in social media, and the difficulty of competing in a saturated attention economy of the social media landscape. Our study aimed at creating a solution that could be used by scientific content creators to better locate and understand constructive feedback toward their content and how it is received, which can be hidden as a minor subject between hundreds of thousands of comments. By leveraging an ensemble of techniques ranging from heuristics to state-of-the-art machine learning algorithms, we created a framework that is able to detect texts related to very niche subjects in very large datasets, with just a small amount of examples of texts related to the subject being given as input.
Practical implications
With this tool, scientific content creators can sift through their social media following and quickly understand how to adapt their content to their current user’s needs and standards of content consumption.
Originality/value
This study aimed to find reactions to scientific communication in social media. We applied three methods with human intervention and compared their performance. This study shows for the first time, the topics of interest which were discussed in Brazil during the COVID-19 pandemic.
Details
Keywords
Wondwesen Tafesse and Anders Wien
ChatGPT is a versatile technology with practical use cases spanning many professional disciplines including marketing. Being a recent innovation, however, there is a lack of…
Abstract
Purpose
ChatGPT is a versatile technology with practical use cases spanning many professional disciplines including marketing. Being a recent innovation, however, there is a lack of academic insight into its tangible applications in the marketing realm. To address this gap, the current study explores ChatGPT’s application in marketing by mining social media data. Additionally, the study employs the stages-of- growth model to assess the current state of ChatGPT’s adoption in marketing organizations.
Design/methodology/approach
The study collected tweets related to ChatGPT and marketing using a web-scraping technique (N = 23,757). A topic model was trained on the tweet corpus using latent Dirichlet allocation to delineate ChatGPT’s major areas of applications in marketing.
Findings
The topic model produced seven latent topics that encapsulated ChatGPT’s major areas of applications in marketing including content marketing, digital marketing, search engine optimization, customer strategy, B2B marketing and prompt engineering. Further analyses reveal the popularity of and interest in these topics among marketing practitioners.
Originality/value
The findings contribute to the literature by offering empirical evidence of ChatGPT’s applications in marketing. They demonstrate the core use cases of ChatGPT in marketing. Further, the study applies the stages-of-growth model to situate ChatGPT’s current state of adoption in marketing organizations and anticipate its future trajectory.
Details
Keywords
Donghui Yang, Yan Wang, Zhaoyang Shi and Huimin Wang
Improving the diversity of recommendation information has become one of the latest research hotspots to solve information cocoons. Aiming to achieve both high accuracy and…
Abstract
Purpose
Improving the diversity of recommendation information has become one of the latest research hotspots to solve information cocoons. Aiming to achieve both high accuracy and diversity of recommender system, a hybrid method has been proposed in this paper. This study aims to discuss the aforementioned method.
Design/methodology/approach
This paper integrates latent Dirichlet allocation (LDA) model and locality-sensitive hashing (LSH) algorithm to design topic recommendation system. To measure the effectiveness of the method, this paper builds three-level categories of journal paper abstracts on the Web of Science platform as experimental data.
Findings
(1) The results illustrate that the diversity of recommended items has been significantly enhanced by leveraging hashing function to overcome information cocoons. (2) Integrating topic model and hashing algorithm, the diversity of recommender systems could be achieved without losing the accuracy of recommender systems in a certain degree of refined topic levels.
Originality/value
The hybrid recommendation algorithm developed in this paper can overcome the dilemma of high accuracy and low diversity. The method could ameliorate the recommendation in business and service industries to address the problems of information overload and information cocoons.
Details
Keywords
Heng-Yang Lu, Yi Zhang and Yuntao Du
Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…
Abstract
Purpose
Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.
Design/methodology/approach
SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.
Findings
Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.
Originality/value
The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.
Details
Keywords
Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu
Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC…
Abstract
Purpose
Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.
Design/methodology/approach
The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.
Findings
The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.
Originality/value
The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.
Details
Keywords
Federico Barravecchia, Luca Mastrogiacomo and Fiorenzo Franceschini
Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs…
Abstract
Purpose
Digital voice-of-customer (digital VoC) analysis is gaining much attention in the field of quality management. Digital VoC can be a great source of knowledge about customer needs, habits and expectations. To this end, the most popular approach is based on the application of text mining algorithms named topic modelling. These algorithms can identify latent topics discussed within digital VoC and categorise each source (e.g. each review) based on its content. This paper aims to propose a structured procedure for validating the results produced by topic modelling algorithms.
Design/methodology/approach
The proposed procedure compares, on random samples, the results produced by topic modelling algorithms with those generated by human evaluators. The use of specific metrics allows to make a comparison between the two approaches and to provide a preliminary empirical validation.
Findings
The proposed procedure can address users of topic modelling algorithms in validating the obtained results. An application case study related to some car-sharing services supports the description.
Originality/value
Despite the vast success of topic modelling-based approaches, metrics and procedures to validate the obtained results are still lacking. This paper provides a first practical and structured validation procedure specifically employed for quality-related applications.
Details
Keywords
Nastaran Hajiheydari, Mojtaba Talafidaryani, SeyedHossein Khabiri and Masoud Salehi
Although the business model field of study has been a focus of attention for both researchers and practitioners within the past two decades, it still suffers from concern about…
Abstract
Purpose
Although the business model field of study has been a focus of attention for both researchers and practitioners within the past two decades, it still suffers from concern about its identity. Accordingly, this paper aims to clarify the intellectual structure of business model through identifying the research clusters and their sub-clusters, the prominent relations and the dominant research trends.
Design/methodology/approach
This paper uses some common text mining methods including co-word analysis, burst analysis, timeline analysis and topic modeling to analyze and mine the title, abstract and keywords of 14,081 research documents related to the domain of business model.
Findings
The results revealed that the business model field of study consists of three main research areas including electronic business model, business model innovation and sustainable business model, each of which has some sub-areas and has been more evident in some particular industries. Additionally, from the time perspective, research issues in the domain of sustainable development are considered as the hot and emerging topics in this field. In addition, the results confirmed that information technology has been one of the most important drivers, influencing the appearance of different study topics in the various periods.
Originality/value
The contribution of this study is to quantitatively uncover the dominant knowledge structure and prominent research trends in the business model field of study, considering a broad range of scholarly publications and using some promising and reliable text mining techniques.
Details
Keywords
This chapter investigates the behavior of Reddit’s news subreddit users and the relationship between their sentiment on exchange rates. Using graphical models and natural language…
Abstract
This chapter investigates the behavior of Reddit’s news subreddit users and the relationship between their sentiment on exchange rates. Using graphical models and natural language processing, hidden online communities among Reddit users are discovered. The data set used in this project is a mixture of text and categorical data from Reddit’s news subreddit. These data include the titles of the news pages, as well as a few user characteristics, in addition to users’ comments. This data set is an excellent resource to study user reaction to news since their comments are directly linked to the webpage contents. The model considered in this chapter is a hierarchical mixture model which is a generative model that detects overlapping networks using the sentiment from the user generated content. The advantage of this model is that the communities (or groups) are assumed to follow a Chinese restaurant process, and therefore it can automatically detect and cluster the communities. The hidden variables and the hyperparameters for this model are obtained using Gibbs sampling.
Details
Keywords
Rachana Jaiswal, Shashank Gupta and Aviral Kumar Tiwari
Grounded in the stakeholder theory and signaling theory, this study aims to broaden the research agenda on environmental, social and governance (ESG) investing by uncovering…
Abstract
Purpose
Grounded in the stakeholder theory and signaling theory, this study aims to broaden the research agenda on environmental, social and governance (ESG) investing by uncovering public sentiments and key themes using Twitter data spanning from 2009 to 2022.
Design/methodology/approach
Using various machine learning models for text tonality analysis and topic modeling, this research scrutinizes 1,842,985 Twitter texts to extract prevalent ESG investing trends and gauge their sentiment.
Findings
Gibbs Sampling Dirichlet Multinomial Mixture emerges as the optimal topic modeling method, unveiling significant topics such as “Physical risk of climate change,” “Employee Health, Safety and well-being” and “Water management and Scarcity.” RoBERTa, an attention-based model, outperforms other machine learning models in sentiment analysis, revealing a predominantly positive shift in public sentiment toward ESG investing over the past five years.
Research limitations/implications
This study establishes a framework for sentiment analysis and topic modeling on alternative data, offering a foundation for future research. Prospective studies can enhance insights by incorporating data from additional social media platforms like LinkedIn and Facebook.
Practical implications
Leveraging unstructured data on ESG from platforms like Twitter provides a novel avenue to capture company-related information, supplementing traditional self-reported sustainability disclosures. This approach opens new possibilities for understanding a company’s ESG standing.
Social implications
By shedding light on public perceptions of ESG investing, this research uncovers influential factors that often elude traditional corporate reporting. The findings empower both investors and the general public, aiding managers in refining ESG and management strategies.
Originality/value
This study marks a groundbreaking contribution to scholarly exploration, to the best of the authors’ knowledge, by being the first to analyze unstructured Twitter data in the context of ESG investing, offering unique insights and advancing the understanding of this emerging field.
Details
Keywords
Xiaoguang Wang, Yue Cheng, Tao Lv and Rongjiang Cai
The authors hope to filter valuable information from online reviews, obtain objective and accurate information about the demands of auto consumers and help auto companies develop…
Abstract
Purpose
The authors hope to filter valuable information from online reviews, obtain objective and accurate information about the demands of auto consumers and help auto companies develop more reasonable production and marketing strategies for healthy and sustainable development. This paper aims to discuss the aforementioned objectives.
Design/methodology/approach
The authors collected review data from online automotive forums and generated a corpus after pre-processing. Then, the authors extracted consumer demands and topics using the LDA model. Finally, the authors used a trained Word2vec tool to extend the consumer demand topics.
Findings
Different types of vehicle consumers have the same demands, such as “Space,” “Power Performance,” and “Brand Comparison,” and distinct demands, such as “Appearance,” “Safety,” “Service,” and “New Energy Features”; consumers who buy new energy vehicles are still accustomed to comparing with the brands or models of fuel vehicles; new energy vehicles consumers pay more attention to services and service quality during the purchasing and using process.
Research limitations/implications
The development time of new energy vehicles is relatively short, with some models being available for only one year or even six months. The smaller amount of available data may impact the applicability of topic models. The sample size, especially for new energy vehicles, needs to be increased to improve the general applicability of topic models further.
Practical implications
First, this measure helps online review websites improve their existing review publication mechanisms, enhance the overall quality of online review content, increase user traffic and promote the healthy development of online review websites. Second, this allows for timely adjustments in future product production and sales plans and further enhances automotive companies' ability to leverage online reviews for Internet marketing.
Originality/value
The authors have improved the accuracy and stability of the fused topic model, providing a scientific and efficient research tool for multi-dimensional topic mining of online reviews. With the help of research results, consumers can more easily understand the discussion topics and thus filter out valuable reference information. As a result, automotive companies may gain information about consumer demands and product quality feedback and thus quickly adjust production and marketing strategies to increase sales and market share.
Details