Search results

1 – 10 of over 1000
Article
Publication date: 2 January 2020

Futao Zhao, Zhong Yao, Jing Luan and Hao Liu

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media…

Abstract

Purpose

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets.

Design/methodology/approach

This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons.

Findings

The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks.

Originality/value

This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.

Details

Industrial Management & Data Systems, vol. 120 no. 3
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 16 March 2021

P. Padmavathy, S. Pakkir Mohideen and Zameer Gulzar

The purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN…

Abstract

Purpose

The purpose of this paper is to initially perform Senti-WordNet (SWN)- and point wise mutual information (PMI)-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.

Design/methodology/approach

Recently, in domains like social media(SM), healthcare, hotel, car, product data, etc., research on sentiment analysis (SA) has massively increased. In addition, there is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set, their occurrence signifies a strong inclination with the other sentiment class. Hence, this paper chiefly concentrates on the drawbacks of adapting domain-dependent sentiment lexicon (DDSL) from a collection of labeled user reviews and domain-independent lexicon (DIL) for proposing a framework centered on the information theory that could predict the correct polarity of the words (positive, neutral and negative). The proposed work initially performs SWN- and PMI-based polarity computation and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed. Finally, the predicted polarity is inputted to the mtf-idf-based SVM-NN classifier for the SC of reviews. The outcomes are examined and contrasted to the other existing techniques to verify that the proposed work has predicted the class of the reviews more effectually for different datasets.

Findings

There is no approach for analyzing the positive or negative orientations of every single aspect in a document (a tweet, a review, as well as a piece of news, among others). For SA as well as polarity classification, several researchers have used SWN as a lexical resource. Nevertheless, these lexicons show lower-level performance for sentiment classification (SC) than domain-specific lexicons (DSL). Likewise, in some scenarios, the same term is utilized differently between domain and general knowledge lexicons. While concerning different domains, most words have one sentiment class in SWN, and in the annotated data set their occurrence signifies a strong inclination with the other sentiment class.

Originality/value

The proposed work initially performs SWN- and PMI-based polarity computation, and based polarity updation. When the SWN polarity and polarity mismatched, the vote flipping algorithm (VFA) is employed.

Open Access
Article
Publication date: 29 June 2022

Ibtissam Touahri

This paper purposed a multi-facet sentiment analysis system.

Abstract

Purpose

This paper purposed a multi-facet sentiment analysis system.

Design/methodology/approach

Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.

Findings

The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.

Originality/value

The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 7 November 2016

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 15 March 2013

Eleni Kaliva, Eleni Panopoulou, Efthimios Tambouris and Konstantinos Tarabanis

The purpose of this paper is to develop a domain model for online community building and collaboration in e‐government and policy modelling.

Abstract

Purpose

The purpose of this paper is to develop a domain model for online community building and collaboration in e‐government and policy modelling.

Design/methodology/approach

The authors followed a structured approach including five distinct steps: define the domain to be investigated; collect domain knowledge from both existing online community building and collaboration platforms and domain experts; analyse the gathered knowledge; develop and evaluate the domain model.

Findings

A domain model was developed for community building and collaboration in eGovernment and policy modelling, including the domain definition, the domain lexicon and the conceptual models modelling its basic entities and functions of the domain. In particular, the UML class diagram was used for modelling the domain entities and the UML use cases diagram for modelling the domain functions.

Originality/value

A literature search revealed a lack of domain models for online community building and collaboration, not only in e‐government and policy modelling but also in general. The proposed model provides a better understanding of the domain. It can also be used in the development of relevant platforms, leading to the reduction of software development costs and delivery time, as well as the improvement of software quality and reliability, by minimising domain analysis errors.

Details

Transforming Government: People, Process and Policy, vol. 7 no. 1
Type: Research Article
ISSN: 1750-6166

Keywords

Article
Publication date: 24 June 2020

Yilu Zhou and Yuan Xue

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and…

250

Abstract

Purpose

Strategic alliances among organizations are some of the central drivers of innovation and economic growth. However, the discovery of alliances has relied on pure manual search and has limited scope. This paper proposes a text-mining framework, ACRank, that automatically extracts alliances from news articles. ACRank aims to provide human analysts with a higher coverage of strategic alliances compared to existing databases, yet maintain a reasonable extraction precision. It has the potential to discover alliances involving less well-known companies, a situation often neglected by commercial databases.

Design/methodology/approach

The proposed framework is a systematic process of alliance extraction and validation using natural language processing techniques and alliance domain knowledge. The process integrates news article search, entity extraction, and syntactic and semantic linguistic parsing techniques. In particular, Alliance Discovery Template (ADT) identifies a number of linguistic templates expanded from expert domain knowledge and extract potential alliances at sentence-level. Alliance Confidence Ranking (ACRank)further validates each unique alliance based on multiple features at document-level. The framework is designed to deal with extremely skewed, noisy data from news articles.

Findings

In evaluating the performance of ACRank on a gold standard data set of IBM alliances (2006–2008) showed that: Sentence-level ADT-based extraction achieved 78.1% recall and 44.7% precision and eliminated over 99% of the noise in news articles. ACRank further improved precision to 97% with the top20% of extracted alliance instances. Further comparison with Thomson Reuters SDC database showed that SDC covered less than 20% of total alliances, while ACRank covered 67%. When applying ACRank to Dow 30 company news articles, ACRank is estimated to achieve a recall between 0.48 and 0.95, and only 15% of the alliances appeared in SDC.

Originality/value

The research framework proposed in this paper indicates a promising direction of building a comprehensive alliance database using automatic approaches. It adds value to academic studies and business analyses that require in-depth knowledge of strategic alliances. It also encourages other innovative studies that use text mining and data analytics to study business relations.

Details

Information Technology & People, vol. 33 no. 5
Type: Research Article
ISSN: 0959-3845

Keywords

Article
Publication date: 20 November 2017

Xiangbin Yan, Yumei Li and Weiguo Fan

Getting high-quality data by removing the noisy data from the user-generated content (UGC) is the first step toward data mining and effective decision-making based on ubiquitous…

Abstract

Purpose

Getting high-quality data by removing the noisy data from the user-generated content (UGC) is the first step toward data mining and effective decision-making based on ubiquitous and unstructured social media data. This paper aims to design a framework for revoking noisy data from UGC.

Design/methodology/approach

In this paper, the authors consider a classification-based framework to remove the noise from the unstructured UGC in social media community. They treat the noise as the concerned topic non-relevant messages and apply a text classification-based approach to remove the noise. They introduce a domain lexicon to help identify the concerned topic from noise and compare the performance of several classification algorithms combined with different feature selection methods.

Findings

Experimental results based on a Chinese stock forum show that 84.9 per cent of all the noise data from the UGC could be removed with little valuable information loss. The support vector machines classifier combined with information gain feature extraction model is the best choice for this system. With longer messages getting better classification performance, it has been found that the length of messages affects the system performance.

Originality/value

The proposed method could be used for preprocessing in text mining and new knowledge discovery from the big data.

Details

Information Discovery and Delivery, vol. 45 no. 4
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 8 February 2021

Alejandra Segura Navarrete, Claudia Martinez-Araneda, Christian Vidal-Castro and Clemente Rubio-Manzano

This paper aims to describe the process used to create an emotion lexicon enriched with the emotional intensity of words and focuses on improving the emotion analysis process in…

Abstract

Purpose

This paper aims to describe the process used to create an emotion lexicon enriched with the emotional intensity of words and focuses on improving the emotion analysis process in texts.

Design/methodology/approach

The process includes setting, preparation and labelling stages. In the first stage, a lexicon is selected. It must include a translation to the target language and labelling according to Plutchik’s eight emotions. The second stage starts with the validation of the translations. Then, it is expanded with the synonyms of the emotion synsets of each word. In the labelling stage, the similarity of words is calculated and displayed using WordNet similarity.

Findings

The authors’ approach shows better performance to identification of the predominant emotion for the selected corpus. The most relevant is the improvement obtained in the results of the emotion analysis in a hybrid approach compared to the results obtained in a purist approach.

Research limitations/implications

The proposed lexicon can still be enriched by incorporating elements such as emojis, idioms and colloquial expressions.

Practical implications

This work is part of a research project that aids in solving problems in a digital society, such as detecting cyberbullying, abusive language and gender violence in texts or exercising parental control. Detection of depressive states in young people and children is added.

Originality/value

This semi-automatic process can be applied to any language to generate an emotion lexicon. This resource will be available in a software tool that implements a crowdsourcing strategy allowing the intensity to be re-labelled and new words to be automatically incorporated into the lexicon.

Details

The Electronic Library , vol. 39 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 12 January 2021

Hui Yuan, Yuanyuan Tang, Wei Xu and Raymond Yiu Keung Lau

Despite the extensive academic interest in social media sentiment for financial fields, multimodal data in the stock market has been neglected. The purpose of this paper is to…

1475

Abstract

Purpose

Despite the extensive academic interest in social media sentiment for financial fields, multimodal data in the stock market has been neglected. The purpose of this paper is to explore the influence of multimodal social media data on stock performance, and investigate the underlying mechanism of two forms of social media data, i.e. text and pictures.

Design/methodology/approach

This research employs panel vector autoregressive models to quantify the effect of the sentiment derived from two modalities in social media, i.e. text information and picture information. Through the models, the authors examine the short-term and long-term associations between social media sentiment and stock performance, measured by three metrics. Specifically, the authors design an enhanced sentiment analysis method, integrating random walk and word embeddings through Global Vectors for Word Representation (GloVe), to construct a domain-specific lexicon and apply it to textual sentiment analysis. Secondly, the authors exploit a deep learning framework based on convolutional neural networks to analyze the sentiment in picture data.

Findings

The empirical results derived from vector autoregressive models reveal that both measures of the sentiment extracted from textual information and pictorial information in social media are significant leading indicators of stock performance. Moreover, pictorial information and textual information have similar relationships with stock performance.

Originality/value

To the best of the authors’ knowledge, this is the first study that incorporates multimodal social media data for sentiment analysis, which is valuable in understanding pictures of social media data. The study offers significant implications for researchers and practitioners. This research informs researchers on the attention of multimodal social media data. The study’s findings provide some managerial recommendations, e.g. watching not only words but also pictures in social media.

Details

Internet Research, vol. 31 no. 3
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 8 February 2016

Tsung-Yi Chen, Yan-Chen Liu and Yuh-Min Chen

Customer acquisition and retention methods are the most critical issues for any enterprise. By identifying potential customers and targeting them through marketing activities…

Abstract

Purpose

Customer acquisition and retention methods are the most critical issues for any enterprise. By identifying potential customers and targeting them through marketing activities, enterprises can minimize marketing costs and maximize transaction probability. However, because market surveys are labor- and time-consuming, and data mining is ineffective for obtaining competitor data, enterprises may be unable to understand real-time changes in market trends and consumer preferences. The paper aims to discuss these issues.

Design/methodology/approach

This study developed a mechanism that automatically searches for potential customers in virtual communities. In addition, a common product attribute (CPA) model was developed based on the five dimensions of the theory of consumption values and a questionnaire survey was conducted to verify the corresponding relationships. Subsequently, the authors quantified and applied the relationship between the proposed CPA model and consumption values theory.

Findings

During the experiment, functional and social values yielded more accurate predictions. Contrary to our expectations, emotional value yielded an inaccurate prediction of potential customers. The overall precision was 0.74, with a threshold of 0.5.

Research limitations/implications

Due to each industry including the distinctive characteristics and attributes regarding its products, the methods and models were only adopted in food industry for testing effectiveness.

Practical implications

Considering the food industry as an example, this study adopted the case study method to screen potential customers based on 400 articles from virtual communities, and combined a latent semantic analysis method with a backpropagation neural network to verify the effectiveness of the proposed method.

Originality/value

By adopting the proposed enterprise-product profile model, enterprises can compile basic information related to their products and industry. The proposed system can be used by enterprises to identify potential customers in areas with potential for market development.

Details

Online Information Review, vol. 40 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 1000