Search results
1 – 10 of over 1000Nikola Nikolić, Olivera Grljević and Aleksandar Kovačević
Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial…
Abstract
Purpose
Student recruitment and retention are important issues for all higher education institutions. Constant monitoring of student satisfaction levels is therefore crucial. Traditionally, students voice their opinions through official surveys organized by the universities. In addition to that, nowadays, social media and review websites such as “Rate my professors” are rich sources of opinions that should not be ignored. Automated mining of students’ opinions can be realized via aspect-based sentiment analysis (ABSA). ABSA s is a sub-discipline of natural language processing (NLP) that focusses on the identification of sentiments (negative, neutral, positive) and aspects (sentiment targets) in a sentence. The purpose of this paper is to introduce a system for ABSA of free text reviews expressed in student opinion surveys in the Serbian language. Sentiment analysis was carried out at the finest level of text granularity – the level of sentence segment (phrase and clause).
Design/methodology/approach
The presented system relies on NLP techniques, machine learning models, rules and dictionaries. The corpora collected and annotated for system development and evaluation comprise students’ reviews of teaching staff at the Faculty of Technical Sciences, University of Novi Sad, Serbia, and a corpus of publicly available reviews from the Serbian equivalent of the “Rate my professors” website.
Findings
The research results indicate that positive sentiment can successfully be identified with the F-measure of 0.83, while negative sentiment can be detected with the F-measure of 0.94. While the F-measure for the aspect’s range is between 0.49 and 0.89, depending on their frequency in the corpus. Furthermore, the authors have concluded that the quality of ABSA depends on the source of the reviews (official students’ surveys vs review websites).
Practical implications
The system for ABSA presented in this paper could improve the quality of service provided by the Serbian higher education institutions through a more effective search and summary of students’ opinions. For example, a particular educational institution could very easily find out which aspects of their service the students are not satisfied with and to which aspects of their service more attention should be directed.
Originality/value
To the best of the authors’ knowledge, this is the first study of ABSA carried out at the level of sentence segment for the Serbian language. The methodology and findings presented in this paper provide a much-needed bases for further work on sentiment analysis for the Serbian language that is well under-resourced and under-researched in this area.
Details
Keywords
Guellil Imane, Darwish Kareem and Azouaou Faical
This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…
Abstract
Purpose
This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.
Design/methodology/approach
The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).
Findings
The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.
Originality/value
The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.
Details
Keywords
Mengjuan Zha, Changping Hu and Yu Shi
Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for…
Abstract
Purpose
Sentiment lexicon is an essential resource for sentiment analysis of user reviews. By far, there is still a lack of domain sentiment lexicon with large scale and high accuracy for Chinese book reviews. This paper aims to construct a large-scale sentiment lexicon based on the ultrashort reviews of Chinese books.
Design/methodology/approach
First, large-scale ultrashort reviews of Chinese books, whose length is no more than six Chinese characters, are collected and preprocessed as candidate sentiment words. Second, non-sentiment words are filtered out through certain rules, such as part of speech rules, context rules, feature word rules and user behaviour rules. Third, the relative frequency is used to select and judge the polarity of sentiment words. Finally, the performance of the sentiment lexicon is evaluated through experiments.
Findings
This paper proposes a method of sentiment lexicon construction based on ultrashort reviews and successfully builds one for Chinese books with nearly 40,000 words based on the Douban book.
Originality/value
Compared with the idea of constructing a sentiment lexicon based on a small number of reviews, the proposed method can give full play to the advantages of data scale to build a corpus. Moreover, different from the computer segmentation method, this method helps to avoid the problems caused by immature segmentation technology and an imperfect N-gram language model.
Details
Keywords
Victor Diogho Heuer de Carvalho and Ana Paula Cabral Seixas Costa
This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is…
Abstract
Purpose
This article presents two Brazilian Portuguese corpora collected from different media concerning public security issues in a specific location. The primary motivation is supporting analyses, so security authorities can make appropriate decisions about their actions.
Design/methodology/approach
The corpora were obtained through web scraping from a newspaper's website and tweets from a Brazilian metropolitan region. Natural language processing was applied considering: text cleaning, lemmatization, summarization, part-of-speech and dependencies parsing, named entities recognition, and topic modeling.
Findings
Several results were obtained based on the methodology used, highlighting some: an example of a summarization using an automated process; dependency parsing; the most common topics in each corpus; the forty named entities and the most common slogans were extracted, highlighting those linked to public security.
Research limitations/implications
Some critical tasks were identified for the research perspective, related to the applied methodology: the treatment of noise from obtaining news on their source websites, passing through textual elements quite present in social network posts such as abbreviations, emojis/emoticons, and even writing errors; the treatment of subjectivity, to eliminate noise from irony and sarcasm; the search for authentic news of issues within the target domain. All these tasks aim to improve the process to enable interested authorities to perform accurate analyses.
Practical implications
The corpora dedicated to the public security domain enable several analyses, such as mining public opinion on security actions in a given location; understanding criminals' behaviors reported in the news or even on social networks and drawing their attitudes timeline; detecting movements that may cause damage to public property and people welfare through texts from social networks; extracting the history and repercussions of police actions, crossing news with records on social networks; among many other possibilities.
Originality/value
The work on behalf of the corpora reported in this text represents one of the first initiatives to create textual bases in Portuguese, dedicated to Brazil's specific public security domain.
Details
Keywords
Prajwal Eachempati and Praveen Ranjan Srivastava
A composite sentiment index (CSI) from quantitative proxy sentiment indicators is likely to be a lag sentiment measure as it reflects only the information absorbed in the market…
Abstract
Purpose
A composite sentiment index (CSI) from quantitative proxy sentiment indicators is likely to be a lag sentiment measure as it reflects only the information absorbed in the market. Information theories and behavioral finance research suggest that market prices may not adjust to all the available information at a point in time. This study hypothesizes that the sentiment from the unincorporated information may provide possible market leads. Thus, this paper aims to discuss a method to identify the un-incorporated qualitative Sentiment from information unadjusted in the market price to test whether sentiment polarity from the information can impact stock returns. Factoring market sentiment extracted from unincorporated information (residual sentiment or sentiment backlog) in CSI is an essential step for developing an integrated sentiment index to explain deviation in asset prices from their intrinsic value. Identifying the unincorporated Sentiment also helps in text analytics to distinguish between current and future market sentiment.
Design/methodology/approach
Initially, this study collects the news from various textual sources and runs the NVivo tool to compute the corpus data’s sentiment polarity. Subsequently, using the predictability horizon technique, this paper mines the unincorporated component of the news’s sentiment polarity. This study regresses three months’ sentiment polarity (the current period and its lags for two months) on the NIFTY50 index of the National Stock Exchange of India. If the three-month lags are significant, it indicates that news sentiment from the three months is unabsorbed and is likely to impact the future NIFTY50 index. The sentiment is also conditionally tested for firm size, volatility and specific industry sector-dependence. This paper discusses the implications of the results.
Findings
Based on information theories and empirical findings, the paper demonstrates that it is possible to identify unincorporated information and extract the sentiment polarity to predict future market direction. The sentiment polarity variables are significant for the current period and two-month lags. The magnitude of the sentiment polarity coefficient has decreased from the current period to lag one and lag two. This study finds that the unabsorbed component or backlog of news consisted of mainly negative market news or unconfirmed news of the previous period, as illustrated in Tables 1 and 2 and Figure 2. The findings on unadjusted news effects vary with firm size, volatility and sectoral indices as depicted in Figures 3, 4, 5 and 6.
Originality/value
The related literature on sentiment index describes top-down/ bottom-up models using quantitative proxy sentiment indicators and natural language processing (NLP)/machine learning approaches to compute the sentiment from qualitative information to explain variance in market returns. NLP approaches use current period sentiment to understand market trends ignoring the unadjusted sentiment carried from the previous period. The underlying assumption here is that the market adjusts to all available information instantly, which is proved false in various empirical studies backed by information theories. The paper discusses a novel approach to identify and extract sentiment from unincorporated information, which is a critical sentiment measure for developing a holistic sentiment index, both in text analytics and in top-down quantitative models. Practitioners may use the methodology in the algorithmic trading models and conduct stock market research.
Details
Keywords
Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or…
Abstract
Purpose
Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or product. The Sentiment classification is the process of analyzing the reviews for helping the user to decide whether to purchase the product or not.
Design/methodology/approach
A rider feedback artificial tree optimization-enabled deep recurrent neural networks (RFATO-enabled deep RNN) is developed for the effective classification of sentiments into various grades. The proposed RFATO algorithm is modeled by integrating the feedback artificial tree (FAT) algorithm in the rider optimization algorithm (ROA), which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of term frequency-inverse document frequency (TF-IDF) features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted. The metrics employed for the evaluation in the proposed RFATO algorithm are accuracy, sensitivity, and specificity.
Findings
By using the proposed RFATO algorithm, the evaluation metrics such as accuracy, sensitivity and specificity are maximized when compared to the existing algorithms.
Originality/value
The proposed RFATO algorithm is modeled by integrating the FAT algorithm in the ROA, which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of TF-IDF features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted.
Details
Keywords
Lei Zhao, Yingyi Zhang and Chengzhi Zhang
To understand the meaning of a sentence, humans can focus on important words in the sentence, which reflects our eyes staying on each word in different gaze time or times. Thus…
Abstract
Purpose
To understand the meaning of a sentence, humans can focus on important words in the sentence, which reflects our eyes staying on each word in different gaze time or times. Thus, some studies utilize eye-tracking values to optimize the attention mechanism in deep learning models. But these studies lack to explain the rationality of this approach. Whether the attention mechanism possesses this feature of human reading needs to be explored.
Design/methodology/approach
The authors conducted experiments on a sentiment classification task. Firstly, they obtained eye-tracking values from two open-source eye-tracking corpora to describe the feature of human reading. Then, the machine attention values of each sentence were learned from a sentiment classification model. Finally, a comparison was conducted to analyze machine attention values and eye-tracking values.
Findings
Through experiments, the authors found the attention mechanism can focus on important words, such as adjectives, adverbs and sentiment words, which are valuable for judging the sentiment of sentences on the sentiment classification task. It possesses the feature of human reading, focusing on important words in sentences when reading. Due to the insufficient learning of the attention mechanism, some words are wrongly focused. The eye-tracking values can help the attention mechanism correct this error and improve the model performance.
Originality/value
Our research not only provides a reasonable explanation for the study of using eye-tracking values to optimize the attention mechanism but also provides new inspiration for the interpretability of attention mechanism.
Details
Keywords
Heng-Li Yang and August F.Y. Chao
The purpose of this paper is to propose sentiment annotation at sentence level to reduce information overloading while reading product/service reviews in the internet.
Abstract
Purpose
The purpose of this paper is to propose sentiment annotation at sentence level to reduce information overloading while reading product/service reviews in the internet.
Design/methodology/approach
The keyword-based sentiment analysis is applied for highlighting review sentences. An experiment is conducted for demonstrating its effectiveness.
Findings
A prototype is built for highlighting tourism review sentences in Chinese with positive or negative sentiment polarity. An experiment results indicates that sentiment annotation can increase information quality and user’s intention to read tourism reviews.
Research limitations/implications
This study has made two major contributions: proposing the approach of adding sentiment annotation at sentence level of review texts for assisting decision-making; validating the relationships among the information quality constructs. However, in this study, sentiment analysis was conducted on a limited corpus; future research may try a larger corpus. Besides, the annotation system was built on the tourism data. Future studies might try to apply to other areas.
Practical implications
If the proposed annotation systems become popular, both tourists and attraction providers would obtain benefits. In this era of smart tourism, tourists could browse through the huge amount of internet information more quickly. Attraction providers could understand what are the strengths and weaknesses of their facilities more easily. The application of this sentiment analysis is possible for other languages, especially for non-spaced languages.
Originality/value
Facing large amounts of data, past researchers were engaged in automatically constructing a compact yet meaningful abstraction of the texts. However, users have different positions and purposes. This study proposes an alternative approach to add sentiment annotation at sentence level for assisting users.
Details
Keywords
This paper purposed a multi-facet sentiment analysis system.
Abstract
Purpose
This paper purposed a multi-facet sentiment analysis system.
Design/methodology/approach
Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.
Findings
The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.
Originality/value
The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.
Details
Keywords
Sonia Osorio Angel, Adriana Peña Pérez Negrón and Aurora Espinoza-Valdez
Most studies on Sentiment Analysis are performed in English. However, as the third most spoken language on the Internet, Sentiment Analysis for Spanish presents its challenges…
Abstract
Purpose
Most studies on Sentiment Analysis are performed in English. However, as the third most spoken language on the Internet, Sentiment Analysis for Spanish presents its challenges from a semantic and syntactic point of view. This review presents a scope of the recent advances in this area.
Design/methodology/approach
A systematic literature review on Sentiment Analysis for the Spanish language was conducted on recognized databases by the research community.
Findings
Results show classification systems through three different approaches: Lexicon based, Machine Learning based and hybrid approaches. Additionally, different linguistic resources as Lexicon or corpus explicitly developed for the Spanish language were found.
Originality/value
This study provides academics and professionals, a review of advances in Sentiment Analysis for the Spanish language. Most reviews on Sentiment Analysis are for English, and other languages such as Chinese or Arabic, but no updated reviews were found for Spanish.
Details