Search results

1 – 10 of over 1000
Article
Publication date: 3 February 2020

Nikola Nikolić, Olivera Grljević and Aleksandar Kovačević

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial…

Abstract

Purpose

Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial. Traditionally, students voice their opinions through official surveys organized by the universities. In addition to that, nowadays, social media and review websites such as “Rate my professors” are rich sources of opinions that should not be ignored. Automated mining of students’ opinions can be realized via aspect-based sentiment analysis (ABSA). ABSA s is a sub-discipline of natural language processing (NLP) that focusses on the identification of sentiments (negative, neutral, positive) and aspects (sentiment targets) in a sentence. The purpose of this paper is to introduce a system for ABSA of free text reviews expressed in student opinion surveys in the Serbian language. Sentiment analysis was carried out at the finest level of text granularity – the level of sentence segment (phrase and clause).

Design/methodology/approach

The presented system relies on NLP techniques, machine learning models, rules and dictionaries. The corpora collected and annotated for system development and evaluation comprise students’ reviews of teaching staff at the Faculty of Technical Sciences, University of Novi Sad, Serbia, and a corpus of publicly available reviews from the Serbian equivalent of the “Rate my professors” website.

Findings

The research results indicate that positive sentiment can successfully be identified with the F-measure of 0.83, while negative sentiment can be detected with the F-measure of 0.94. While the F-measure for the aspect’s range is between 0.49 and 0.89, depending on their frequency in the corpus. Furthermore, the authors have concluded that the quality of ABSA depends on the source of the reviews (official students’ surveys vs review websites).

Practical implications

The system for ABSA presented in this paper could improve the quality of service provided by the Serbian higher education institutions through a more effective search and summary of students’ opinions. For example, a particular educational institution could very easily find out which aspects of their service the students are not satisfied with and to which aspects of their service more attention should be directed.

Originality/value

To the best of the authors’ knowledge, this is the first study of ABSA carried out at the level of sentence segment for the Serbian language. The methodology and findings presented in this paper provide a much-needed bases for further work on sentiment analysis for the Serbian language that is well under-resourced and under-researched in this area.

Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 12 April 2022

Mengjuan Zha, Changping Hu and Yu Shi

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for…

Abstract

Purpose

Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for Chinese book reviews. This paper aims to construct a large-scale sentiment lexicon based on the ultrashort reviews of Chinese books.

Design/methodology/approach

First, large-scale ultrashort reviews of Chinese books, whose length is no more than six Chinese characters, are collected and preprocessed as candidate sentiment words. Second, non-sentiment words are filtered out through certain rules, such as part of speech rules, context rules, feature word rules and user behaviour rules. Third, the relative frequency is used to select and judge the polarity of sentiment words. Finally, the performance of the sentiment lexicon is evaluated through experiments.

Findings

This paper proposes a method of sentiment lexicon construction based on ultrashort reviews and successfully builds one for Chinese books with nearly 40,000 words based on the Douban book.

Originality/value

Compared with the idea of constructing a sentiment lexicon based on a small number of reviews, the proposed method can give full play to the advantages of data scale to build a corpus. Moreover, different from the computer segmentation method, this method helps to avoid the problems caused by immature segmentation technology and an imperfect N-gram language model.

Details

The Electronic Library , vol. 40 no. 3
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 25 October 2022

Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…

Abstract

Purpose

This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.

Design/methodology/approach

The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.

Findings

Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.

Research limitations/implications

Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.

Practical implications

The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.

Originality/value

The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 18 May 2021

Prajwal Eachempati and Praveen Ranjan Srivastava

A composite sentiment index (CSI) from quantitative proxy sentiment indicators is likely to be a lag sentiment measure as it reflects only the information absorbed in the market…

Abstract

Purpose

A composite sentiment index (CSI) from quantitative proxy sentiment indicators is likely to be a lag sentiment measure as it reflects only the information absorbed in the market. Information theories and behavioral finance research suggest that market prices may not adjust to all the available information at a point in time. This study hypothesizes that the sentiment from the unincorporated information may provide possible market leads. Thus, this paper aims to discuss a method to identify the un-incorporated qualitative Sentiment from information unadjusted in the market price to test whether sentiment polarity from the information can impact stock returns. Factoring market sentiment extracted from unincorporated information (residual sentiment or sentiment backlog) in CSI is an essential step for developing an integrated sentiment index to explain deviation in asset prices from their intrinsic value. Identifying the unincorporated Sentiment also helps in text analytics to distinguish between current and future market sentiment.

Design/methodology/approach

Initially, this study collects the news from various textual sources and runs the NVivo tool to compute the corpus data’s sentiment polarity. Subsequently, using the predictability horizon technique, this paper mines the unincorporated component of the news’s sentiment polarity. This study regresses three months’ sentiment polarity (the current period and its lags for two months) on the NIFTY50 index of the National Stock Exchange of India. If the three-month lags are significant, it indicates that news sentiment from the three months is unabsorbed and is likely to impact the future NIFTY50 index. The sentiment is also conditionally tested for firm size, volatility and specific industry sector-dependence. This paper discusses the implications of the results.

Findings

Based on information theories and empirical findings, the paper demonstrates that it is possible to identify unincorporated information and extract the sentiment polarity to predict future market direction. The sentiment polarity variables are significant for the current period and two-month lags. The magnitude of the sentiment polarity coefficient has decreased from the current period to lag one and lag two. This study finds that the unabsorbed component or backlog of news consisted of mainly negative market news or unconfirmed news of the previous period, as illustrated in Tables 1 and 2 and Figure 2. The findings on unadjusted news effects vary with firm size, volatility and sectoral indices as depicted in Figures 3, 4, 5 and 6.

Originality/value

The related literature on sentiment index describes top-down/ bottom-up models using quantitative proxy sentiment indicators and natural language processing (NLP)/machine learning approaches to compute the sentiment from qualitative information to explain variance in market returns. NLP approaches use current period sentiment to understand market trends ignoring the unadjusted sentiment carried from the previous period. The underlying assumption here is that the market adjusts to all available information instantly, which is proved false in various empirical studies backed by information theories. The paper discusses a novel approach to identify and extract sentiment from unincorporated information, which is a critical sentiment measure for developing a holistic sentiment index, both in text analytics and in top-down quantitative models. Practitioners may use the methodology in the algorithmic trading models and conduct stock market research.

Article
Publication date: 16 September 2021

Sireesha Jasti

Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or…

Abstract

Purpose

Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or product. The Sentiment classification is the process of analyzing the reviews for helping the user to decide whether to purchase the product or not.

Design/methodology/approach

A rider feedback artificial tree optimization-enabled deep recurrent neural networks (RFATO-enabled deep RNN) is developed for the effective classification of sentiments into various grades. The proposed RFATO algorithm is modeled by integrating the feedback artificial tree (FAT) algorithm in the rider optimization algorithm (ROA), which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of term frequency-inverse document frequency (TF-IDF) features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted. The metrics employed for the evaluation in the proposed RFATO algorithm are accuracy, sensitivity, and specificity.

Findings

By using the proposed RFATO algorithm, the evaluation metrics such as accuracy, sensitivity and specificity are maximized when compared to the existing algorithms.

Originality/value

The proposed RFATO algorithm is modeled by integrating the FAT algorithm in the ROA, which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of TF-IDF features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 9 May 2022

Lei Zhao, Yingyi Zhang and Chengzhi Zhang

To understand the meaning of a sentence, humans can focus on important words in the sentence, which reflects our eyes staying on each word in different gaze time or times. Thus…

Abstract

Purpose

To understand the meaning of a sentence, humans can focus on important words in the sentence, which reflects our eyes staying on each word in different gaze time or times. Thus, some studies utilize eye-tracking values to optimize the attention mechanism in deep learning models. But these studies lack to explain the rationality of this approach. Whether the attention mechanism possesses this feature of human reading needs to be explored.

Design/methodology/approach

The authors conducted experiments on a sentiment classification task. Firstly, they obtained eye-tracking values from two open-source eye-tracking corpora to describe the feature of human reading. Then, the machine attention values of each sentence were learned from a sentiment classification model. Finally, a comparison was conducted to analyze machine attention values and eye-tracking values.

Findings

Through experiments, the authors found the attention mechanism can focus on important words, such as adjectives, adverbs and sentiment words, which are valuable for judging the sentiment of sentences on the sentiment classification task. It possesses the feature of human reading, focusing on important words in sentences when reading. Due to the insufficient learning of the attention mechanism, some words are wrongly focused. The eye-tracking values can help the attention mechanism correct this error and improve the model performance.

Originality/value

Our research not only provides a reasonable explanation for the study of using eye-tracking values to optimize the attention mechanism but also provides new inspiration for the interpretability of attention mechanism.

Details

Aslib Journal of Information Management, vol. 75 no. 1
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 15 August 2018

Heng-Li Yang and August F.Y. Chao

The purpose of this paper is to propose sentiment annotation at sentence level to reduce information overloading while reading product/service reviews in the internet.

Abstract

Purpose

The purpose of this paper is to propose sentiment annotation at sentence level to reduce information overloading while reading product/service reviews in the internet.

Design/methodology/approach

The keyword-based sentiment analysis is applied for highlighting review sentences. An experiment is conducted for demonstrating its effectiveness.

Findings

A prototype is built for highlighting tourism review sentences in Chinese with positive or negative sentiment polarity. An experiment results indicates that sentiment annotation can increase information quality and user’s intention to read tourism reviews.

Research limitations/implications

This study has made two major contributions: proposing the approach of adding sentiment annotation at sentence level of review texts for assisting decision-making; validating the relationships among the information quality constructs. However, in this study, sentiment analysis was conducted on a limited corpus; future research may try a larger corpus. Besides, the annotation system was built on the tourism data. Future studies might try to apply to other areas.

Practical implications

If the proposed annotation systems become popular, both tourists and attraction providers would obtain benefits. In this era of smart tourism, tourists could browse through the huge amount of internet information more quickly. Attraction providers could understand what are the strengths and weaknesses of their facilities more easily. The application of this sentiment analysis is possible for other languages, especially for non-spaced languages.

Originality/value

Facing large amounts of data, past researchers were engaged in automatically constructing a compact yet meaningful abstraction of the texts. However, users have different positions and purposes. This study proposes an alternative approach to add sentiment annotation at sentence level for assisting users.

Details

Online Information Review, vol. 42 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

Open Access
Article
Publication date: 29 June 2022

Ibtissam Touahri

This paper purposed a multi-facet sentiment analysis system.

Abstract

Purpose

This paper purposed a multi-facet sentiment analysis system.

Design/methodology/approach

Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.

Findings

The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.

Originality/value

The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 16 February 2021

Sonia Osorio Angel, Adriana Peña Pérez Negrón and Aurora Espinoza-Valdez

Most studies on Sentiment Analysis are performed in English. However, as the third most spoken language on the Internet, Sentiment Analysis for Spanish presents its challenges…

Abstract

Purpose

Most studies on Sentiment Analysis are performed in English. However, as the third most spoken language on the Internet, Sentiment Analysis for Spanish presents its challenges from a semantic and syntactic point of view. This review presents a scope of the recent advances in this area.

Design/methodology/approach

A systematic literature review on Sentiment Analysis for the Spanish language was conducted on recognized databases by the research community.

Findings

Results show classification systems through three different approaches: Lexicon based, Machine Learning based and hybrid approaches. Additionally, different linguistic resources as Lexicon or corpus explicitly developed for the Spanish language were found.

Originality/value

This study provides academics and professionals, a review of advances in Sentiment Analysis for the Spanish language. Most reviews on Sentiment Analysis are for English, and other languages such as Chinese or Arabic, but no updated reviews were found for Spanish.

Details

Data Technologies and Applications, vol. 55 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 1000