Search results
1 – 10 of 1000Lixue Zou, Xiwen Liu, Wray Buntine and Yanli Liu
Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC…
Abstract
Purpose
Full text of a document is a rich source of information that can be used to provide meaningful topics. The purpose of this paper is to demonstrate how to use citation context (CC) in the full text to identify the cited topics and citing topics efficiently and effectively by employing automatic text analysis algorithms.
Design/methodology/approach
The authors present two novel topic models, Citation-Context-LDA (CC-LDA) and Citation-Context-Reference-LDA (CCRef-LDA). CC is leveraged to extract the citing text from the full text, which makes it possible to discover topics with accuracy. CC-LDA incorporates CC, citing text, and their latent relationship, while CCRef-LDA incorporates CC, citing text, their latent relationship and reference information in CC. Collapsed Gibbs sampling is used to achieve an approximate estimation. The capacity of CC-LDA to simultaneously learn cited topics and citing topics together with their links is investigated. Moreover, a topic influence measure method based on CC-LDA is proposed and applied to create links between the two-level topics. In addition, the capacity of CCRef-LDA to discover topic influential references is also investigated.
Findings
The results indicate CC-LDA and CCRef-LDA achieve improved or comparable performance in terms of both perplexity and symmetric Kullback–Leibler (sKL) divergence. Moreover, CC-LDA is effective in discovering the cited topics and citing topics with topic influence, and CCRef-LDA is able to find the cited topic influential references.
Originality/value
The automatic method provides novel knowledge for cited topics and citing topics discovery. Topic influence learnt by our model can link two-level topics and create a semantic topic network. The method can also use topic specificity as a feature to rank references.
Details
Keywords
Ziang Wang and Feng Yang
It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no…
Abstract
Purpose
It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word.
Design/methodology/approach
First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model.
Findings
The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment.
Originality/value
This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.
Details
Keywords
Muhammad Ali Masood, Rabeeh Ayaz Abbasi, Onaiza Maqbool, Mubashar Mushtaq, Naif R. Aljohani, Ali Daud, Muhammad Ahtisham Aslam and Jalal S. Alowibdi
Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the…
Abstract
Purpose
Tags are used to annotate resources on social media platforms. Most tag recommendation methods use popular tags, but in the case of new resources that are as yet untagged (the cold start problem), popularity-based tag recommendation methods fail to work. The purpose of this paper is to propose a novel model for tag recommendation called multi-feature space latent Dirichlet allocation (MFS-LDA) for cold start problem.
Design/methodology/approach
MFS-LDA is a novel latent Dirichlet allocation (LDA)-based model which exploits multiple feature spaces (title, contents, and tags) for recommending tags. Exploiting multiple feature spaces allows MFS-LDA to recommend tags even if data from a feature space is missing (the cold start problem).
Findings
Evaluation of a publicly available data set consisting of around 20,000 Wikipedia articles that are tagged on a social bookmarking website shows a significant improvement over existing LDA-based tag recommendation methods.
Originality/value
The originality of MFS-LDA lies in segregation of features for removing bias toward dominant features and in synchronization of multiple feature space for tag recommendation.
Details
Keywords
Roman Egger and Joanne Yu
Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based…
Abstract
Purpose
Intrigued by the methodological challenges emerging from text complexity, the purpose of this study is to evaluate the effectiveness of different topic modelling algorithms based on Instagram textual data.
Design/methodology/approach
By taking Instagram posts captioned with #darktourism as the study context, this research applies latent Dirichlet allocation (LDA), correlation explanation (CorEx), and non-negative matrix factorisation (NMF) to uncover tourist experiences.
Findings
CorEx outperforms LDA and NMF by classifying emerging dark sites and activities into 17 distinct topics. The results of LDA appear homogeneous and overlapping, whereas the extracted topics of NMF are not specific enough to gain deep insights.
Originality/value
This study assesses different topic modelling algorithms for knowledge extraction in the highly heterogeneous tourism industry. The findings unfold the complexity of analysing short-text social media data and strengthen the use of CorEx in analysing Instagram content.
研究目的
基于对文本复杂性的兴趣, 本研究以Instagram文本数据为基准, 旨在比较不同主题建模的算法的有效性。
研究方法
本研究以标有 #darktourism的Instagram帖子作为背景, 评估直观理解(LDA), 相关解释(CorEx)和非负矩阵分解(NMF)在分析与黑暗观光相关的帖子的实用性。
研究结果
CorEx分析出17个新兴的黑暗景点和活动, 亦胜过LDA和NMF。虽然LDA能探讨出较多的主题数, 但它们的内容几乎重复。同样的, 尽管NMF适用于短文本数据, 但它提取出主题相当笼统且不够具体。
原创性
透过将营销和数据科学学科相结合, 本研究为分析非结构化的文本奠定了基础, 并证实了CorEx在分析短文本社交媒体数据(如Instagram数据)中的效益。
Propósito
Intrigado por los desafíos metodológicos que surgen de la complejidad del texto, este estudio evalúa la efectividad de diferentes algoritmos de modelado de temas basados en datos textuales de Instagram.
Metodología
Al tomar publicaciones de Instagram con #darktourism como contexto de estudio, esta investigación aplica la asignación de Dirichlet latente (LDA), la explicación de correlación (CorEx) y la factorización matricial no negativa (NMF) para descubrir experiencias turísticas.
Resultados
CorEx supera a LDA y NMF al clasificar los sitios y actividades oscuros emergentes en 17 temas distintos. Los resultados de LDA son homogéneos y se superponen, mientras que los temas extraídos de NMF no son lo suficientemente específicos como para obtener conocimientos profundos.
Originalidad
Este estudio evalúa diferentes algoritmos de modelado de temas para la extracción de conocimiento en la industria del turismo. Los hallazgos revelan la complejidad de analizar datos de redes sociales de texto corto y fortalecen el uso de CorEx para analizar el contenido de Instagram.
Details
Keywords
S. Ravikumar, Bidyut Bikash Boruah and Fullstar Lamin Gayang
The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989…
Abstract
Purpose
The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.
Design/methodology/approach
Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.
Findings
It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.
Practical implications
Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.
Originality/value
This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.
Details
Keywords
Brahim Dib, Fahd Kalloubi, El Habib Nfaoui and Abdelhak Boulaalam
The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that…
Abstract
Purpose
The purpose of this study is to facilitate the task of finding appropriate information to read about, and searching for people who are in the same field of interest. Knowing that more people keep up with new streaming information on Twitter micro-blogging service. With the immense number of micro-posts shared via the follower/followee network graph, Twitter users find themselves in front of millions of tweets, which makes the task crucial.
Design/methodology/approach
In this paper, a long short–term memory (LSTM) model that relies on the latent Dirichlet allocation (LDA) output vector for followee recommendation, the LDA model applied as a topic modeling strategy is proposed.
Findings
This study trains the model using a real-life data set extracted based on Twitter follower/followee architecture. It confirms the effectiveness and scalability of the proposed approach. The approach improves the state-of-the-art models average-LSTM and time-LSTM.
Research limitations/implications
This study improves mainly the existing followee recommendation systems. Because, unlike previous studies, it applied a non-hand-crafted method which is the LSTM neural network with LDA model for topics extraction. The main limitation of this study is the cold-start users cannot be treated, also some active fake accounts may not be detected.
Practical implications
The aim of this approach is to assist users seeking appropriate information to read about, by choosing appropriate profiles to follow.
Social implications
This approach consolidates the social relationship between users in a microblogging platform by suggesting like-minded people to each other. Thus, finding users with the same interests will be easy without spending a lot of time seeking relevant users.
Originality/value
Instead of classic recommendation models, the paper provides an efficient neural network searching method to make it easier to find appropriate users to follow. Therefore, affording an effective followee recommendation system.
Details
Keywords
Qiongwei Ye and Baojun Ma
Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to…
Abstract
Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to revolutionize business and society. Split into four distinct sections, the book first lays out the theoretical foundations and fundamental concepts of E-Business before moving on to look at internet+ innovation models and their applications in different industries such as agriculture, finance and commerce. The book then provides a comprehensive analysis of E-business platforms and their applications in China before finishing with four comprehensive case studies of major E-business projects, providing readers with successful examples of implementing E-Business entrepreneurship projects.
Internet + and Electronic Business in China is a comprehensive resource that provides insights and analysis into how E-commerce has revolutionized and continues to revolutionize business and society in China.
Dong Zhou, Séamus Lawless, Xuan Wu, Wenyu Zhao and Jianxun Liu
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native…
Abstract
Purpose
With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion.
Design/methodology/approach
The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods.
Findings
Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level.
Originality/value
Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
Details
Keywords
San-Yih Hwang, Chih-Ping Wei, Chien-Hsiang Lee and Yu-Siang Chen
The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles…
Abstract
Purpose
The information needs of the users of literature database systems often come from the task at hand, which is short term and can be represented as a small number of articles. Previous works on recommending articles to satisfy users’ short-term interests have utilized article content, usage logs, and more recently, coauthorship networks. The usefulness of coauthorship has been demonstrated by some research works, which, however, tend to adopt a simple coauthorship network that records only the strength of coauthorships. The purpose of this paper is to enhance the effectiveness of coauthorship-based recommendation by incorporating scholars’ collaboration topics into the coauthorship network.
Design/methodology/approach
The authors propose a latent Dirichlet allocation (LDA)-coauthorship-network-based method that integrates topic information into the links of the coauthorship networks using LDA, and a task-focused technique is developed for recommending literature articles.
Findings
The experimental results using information systems journal articles show that the proposed method is more effective than the previous coauthorship network-based method over all scenarios examined. The authors further develop a hybrid method that combines the results of content-based and LDA-coauthorship-network-based recommendations. The resulting hybrid method achieves greater or comparable recommendation effectiveness under all scenarios when compared to the content-based method.
Originality/value
This paper makes two contributions. The authors first show that topic model is indeed useful and can be incorporated into the construction of coaurthoship-network to improve literature recommendation. The authors subsequently demonstrate that coauthorship-network-based and content-based recommendations are complementary in their hit article rank distributions, and then devise a hybrid recommendation method to further improve the effectiveness of literature recommendation.
Details
Keywords
Yuyan Luo, Tao Tong, Xiaoxu Zhang, Zheng Yang and Ling Li
In the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for…
Abstract
Purpose
In the era of information overload, the density of tourism information and the increasingly sophisticated information needs of consumers have created information confusion for tourists and scenic-area managers. The study aims to help scenic-area managers determine the strengths and weaknesses in the development process of scenic areas and to solve the practical problem of tourists' difficulty in quickly and accurately obtaining the destination image of a scenic area and finding a scenic area that meets their needs.
Design/methodology/approach
The study uses a variety of machine learning methods, namely, the latent Dirichlet allocation (LDA) theme extraction model, term frequency-inverse document frequency (TF-IDF) weighting method and sentiment analysis. This work also incorporates probabilistic hesitant fuzzy algorithm (PHFA) in multi-attribute decision-making to form an enhanced tourism destination image mining and analysis model based on visitor expression information. The model is intended to help managers and visitors identify the strengths and weaknesses in the development of scenic areas. Jiuzhaigou is used as an example for empirical analysis.
Findings
In the study, a complete model for the mining analysis of tourism destination image was constructed, and 24,222 online reviews on Jiuzhaigou, China were analyzed in text. The results revealed a total of 10 attributes and 100 attribute elements. From the identified attributes, three negative attributes were identified, namely, crowdedness, tourism cost and accommodation environment. The study provides suggestions for tourists to select attractions and offers recommendations and improvement measures for Jiuzhaigou in terms of crowd control and post-disaster reconstruction.
Originality/value
Previous research in this area has used small sample data for qualitative analysis. Thus, the current study fills this gap in the literature by proposing a machine learning method that incorporates PHFA through the combination of the ideas of management and multi-attribute decision theory. In addition, the study considers visitors' emotions and thematic preferences from the perspective of their expressed information, based on which the tourism destination image is analyzed. Optimization strategies are provided to help managers of scenic spots in their decision-making.
Details