Search results

1 – 10 of 88
Article
Publication date: 13 March 2020

Jinwook Choi, Yongmoo Suh and Namchul Jung

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative…

Abstract

Purpose

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative information represented by published reports or management interview has been known as an important source in addition to quantitative information represented by financial values in assigning corporate credit rating in practice. Nevertheless, prior studies have room for further research in that they rarely employed qualitative information in developing prediction model of corporate credit rating.

Design/methodology/approach

This study adopted three document vectorization methods, Bag-Of-Words (BOW), Word to Vector (Word2Vec) and Document to Vector (Doc2Vec), to transform an unstructured textual data into a numeric vector, so that Machine Learning (ML) algorithms accept it as an input. For the experiments, we used the corpus of Management’s Discussion and Analysis (MD&A) section in 10-K financial reports as well as financial variables and corporate credit rating data.

Findings

Experimental results from a series of multi-class classification experiments show the predictive models trained by both financial variables and vectors extracted from MD&A data outperform the benchmark models trained only by traditional financial variables.

Originality/value

This study proposed a new approach for corporate credit rating prediction by using qualitative information extracted from MD&A documents as an input to ML-based prediction models. Also, this research adopted and compared three textual vectorization methods in the domain of corporate credit rating prediction and showed that BOW mostly outperformed Word2Vec and Doc2Vec.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 27 August 2019

Barkha Bansal and Sangeet Srivastava

Vast volumes of rich online consumer-generated content (CGC) can be used effectively to gain important insights for decision-making, product improvement and brand management…

Abstract

Purpose

Vast volumes of rich online consumer-generated content (CGC) can be used effectively to gain important insights for decision-making, product improvement and brand management. Recently, many studies have proposed semi-supervised aspect-based sentiment classification of unstructured CGC. However, most of the existing CGC mining methods rely on explicitly detecting aspect-based sentiments and overlooking the context of sentiment-bearing words. Therefore, this study aims to extract implicit context-sensitive sentiment, and handle slangs, ambiguous, informal and special words used in CGC.

Design/methodology/approach

A novel text mining framework is proposed to detect and evaluate implicit semantic word relations and context. First, POS (part of speech) tagging is used for detecting aspect descriptions and sentiment-bearing words. Then, LDA (latent Dirichlet allocation) is used to group similar aspects together and to form an attribute. Semantically and contextually similar words are found using the skip-gram model for distributed word vectorisation. Finally, to find context-sensitive sentiment of each attribute, cosine similarity is used along with a set of positive and negative seed words.

Findings

Experimental results using more than 400,000 Amazon mobile phone reviews showed that the proposed method efficiently found product attributes and corresponding context-aware sentiments. This method also outperforms the classification accuracy of the baseline model and state-of-the-art techniques using context-sensitive information on data sets from two different domains.

Practical implications

Extracted attributes can be easily classified into consumer issues and brand merits. A brand-based comparative study is presented to demonstrate the practical significance of the proposed approach.

Originality/value

This paper presents a novel method for context-sensitive attribute-based sentiment analysis of CGC, which is useful for both brand and product improvement.

Details

Kybernetes, vol. 50 no. 2
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 11 September 2019

Duen-Ren Liu, Yu-Shan Liao and Jun-Yi Lu

Providing online news recommendations to users has become an important trend for online media platforms, enabling them to attract more users. The purpose of this paper is to…

Abstract

Purpose

Providing online news recommendations to users has become an important trend for online media platforms, enabling them to attract more users. The purpose of this paper is to propose an online news recommendation system for recommending news articles to users when browsing news on online media platforms.

Design/methodology/approach

A Collaborative Semantic Topic Modeling (CSTM) method and an ensemble model (EM) are proposed to predict user preferences based on the combination of matrix factorization with articles’ semantic latent topics derived from word embedding and latent topic modeling. The proposed EM further integrates an online interest adjustment (OIA) mechanism to adjust users’ online recommendation lists based on their current news browsing.

Findings

This study evaluated the proposed approach using offline experiments, as well as an online evaluation on an existing online media platform. The evaluation shows that the proposed method can improve the recommendation quality and achieve better performance than other recommendation methods can. The online evaluation also shows that integrating the proposed method with OIA can improve the click-through rate for online news recommendation.

Originality/value

The novel CSTM and EM combined with OIA are proposed for news recommendation. The proposed novel recommendation system can improve the click-through rate of online news recommendations, thus increasing online media platforms’ commercial value.

Details

Industrial Management & Data Systems, vol. 119 no. 8
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 2 February 2022

Deepak Suresh Asudani, Naresh Kumar Nagwani and Pradeep Singh

Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature…

399

Abstract

Purpose

Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.

Design/methodology/approach

In this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.

Findings

In the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.

Originality/value

The experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 19 April 2023

Milad Soltani, Alexios Kythreotis and Arash Roshanpoor

The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning…

5660

Abstract

Purpose

The emergence of machine learning has opened a new way for researchers. It allows them to supplement the traditional manual methods for conducting a literature review and turning it into smart literature. This study aims to present a framework for incorporating machine learning into financial statement fraud (FSF) literature analysis. This framework facilitates the analysis of a large amount of literature to show the trend of the field and identify the most productive authors, journals and potential areas for future research.

Design/methodology/approach

In this study, a framework was introduced that merges bibliometric analysis techniques such as word frequency, co-word analysis and coauthorship analysis with the Latent Dirichlet Allocation topic modeling approach. This framework was used to uncover subtopics from 20 years of financial fraud research articles. Furthermore, the hierarchical clustering method was used on selected subtopics to demonstrate the primary contexts in the literature on FSF.

Findings

This study has contributed to the literature in two ways. First, this study has determined the top journals, articles, countries and keywords based on various bibliometric metrics. Second, using topic modeling and then hierarchy clustering, this study demonstrates the four primary contexts in FSF detection.

Research limitations/implications

In this study, the authors tried to comprehensively view the studies related to financial fraud conducted over two decades. However, this research has limitations that can be an opportunity for future researchers. The first limitation is due to language bias. This study has focused on English language articles, so it is suggested that other researchers consider other languages as well. The second limitation is caused by citation bias. In this study, the authors tried to show the top articles based on the citation criteria. However, judging based on citation alone can be misleading. Therefore, this study suggests that the researchers consider other measures to check the citation quality and assess the studies’ precision by applying meta-analysis.

Originality/value

Despite the popularity of bibliometric analysis and topic modeling, there have been limited efforts to use machine learning for literature review. This novel approach of using hierarchical clustering on topic modeling results enable us to uncover four primary contexts. Furthermore, this method allowed us to show the keywords of each context and highlight significant articles within each context.

Details

Journal of Financial Crime, vol. 30 no. 5
Type: Research Article
ISSN: 1359-0790

Keywords

Article
Publication date: 20 February 2023

Elena Fedorova, Igor Demin and Elena Silina

The paper aims to estimate how corporate philanthropy expenditures and corporate philanthropy disclosure (in general and in different spheres) affect investment attractiveness of…

282

Abstract

Purpose

The paper aims to estimate how corporate philanthropy expenditures and corporate philanthropy disclosure (in general and in different spheres) affect investment attractiveness of Russian companies.

Design/methodology/approach

To assess the degree of corporate philanthropy disclosure the authors compiled lexicons based on a set of techniques: text and frequency analysis, correlations, principal component analysis. To adjust the existing classifications of corporate philanthropic activities to the Russian market the authors employed expert analysis. The empirical research base includes 83 Russian publicly traded companies for the period 2013–2019. To estimate the impact of indicators of corporate philanthropy disclosure on company's investment attractiveness the authors utilized panel data regression and random forest algorithm.

Findings

We compiled 2 Russian lexicons: one on general issues of corporate philanthropy and another one on philanthropic activities in various spheres (sports and healthcare; support for certain groups of people; social infrastructure; children protection and youth policy; culture, education and science). 2. The paper observes that the disclosure of non-financial data including that related to general issues of corporate philanthropy as well as to different spheres affects the market capitalization of the largest Russian companies. The results of regression analysis suggest that disclosure of altruism-driven philanthropic activities (such as corporate philanthropy in the sphere of culture, education and science) has a lesser impact on company's investment attractiveness than that of activities driven by business-related motives (sports and healthcare, children protection and youth policy).

Research limitations/implications

Our findings are important to management, investors, financial analysts, regulators and various agencies providing guidance on corporate governance and sustainability reporting. However, the authors acknowledge that the research results may lack generalizability due to the sample covering a single national context. Researchers are encouraged to test the proposed approach further on other countries' data by using the authors’ compiled lexicons.

Originality/value

The study aims to expand the domains of signaling and agency theories. First, this subject has not been widely examined in terms of emerging markets, the authors’ study is the first to focus on the Russian market. Secondly, the majority of scholars use text analysis to examine not only the impact of charitable donations but also the effect of corporate philanthropy disclosure. Thirdly, the authors provided the authors’ own lexicon of corporate philanthropy disclosure based on machine learning technique and expert analysis. Fourthly, to estimate the impact of corporate philanthropy on company's investment attractiveness the authors used the original approach based on combination of linear (regression), and non-linear methods (permutation importance. The authors’ findings extend the theoretical concept of Peterson et al. (2021): corporate philanthropy is viewed as the company strategy to reinforce its reputation, it helps to establish more efficient relationships with stakeholders which, in its turn, results in the increased business value.

Details

Corporate Communications: An International Journal, vol. 28 no. 3
Type: Research Article
ISSN: 1356-3289

Keywords

Article
Publication date: 29 December 2022

Xiaoguang Tian, Robert Pavur, Henry Han and Lili Zhang

Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to…

2368

Abstract

Purpose

Studies on mining text and generating intelligence on human resource documents are rare. This research aims to use artificial intelligence and machine learning techniques to facilitate the employee selection process through latent semantic analysis (LSA), bidirectional encoder representations from transformers (BERT) and support vector machines (SVM). The research also compares the performance of different machine learning, text vectorization and sampling approaches on the human resource (HR) resume data.

Design/methodology/approach

LSA and BERT are used to discover and understand the hidden patterns from a textual resume dataset, and SVM is applied to build the screening model and improve performance.

Findings

Based on the results of this study, LSA and BERT are proved useful in retrieving critical topics, and SVM can optimize the prediction model performance with the help of cross-validation and variable selection strategies.

Research limitations/implications

The technique and its empirical conclusions provide a practical, theoretical basis and reference for HR research.

Practical implications

The novel methods proposed in the study can assist HR practitioners in designing and improving their existing recruitment process. The topic detection techniques used in the study provide HR practitioners insights to identify the skill set of a particular recruiting position.

Originality/value

To the best of the authors’ knowledge, this research is the first study that uses LSA, BERT, SVM and other machine learning models in human resource management and resume classification. Compared with the existing machine learning-based resume screening system, the proposed system can provide more interpretable insights for HR professionals to understand the recommendation results through the topics extracted from the resumes. The findings of this study can also help organizations to find a better and effective approach for resume screening and evaluation.

Details

Business Process Management Journal, vol. 29 no. 1
Type: Research Article
ISSN: 1463-7154

Keywords

Article
Publication date: 24 March 2022

Shu-Ying Lin, Duen-Ren Liu and Hsien-Pin Huang

Financial price forecast issues are always a concern of investors. However, the financial applications based on machine learning methods mainly focus on stock market predictions…

Abstract

Purpose

Financial price forecast issues are always a concern of investors. However, the financial applications based on machine learning methods mainly focus on stock market predictions. Few studies have explored credit risk predictions. Understanding credit risk trends can help investors avoid market risks. The purpose of this study is to investigate the prediction model that can effectively predict credit default swaps (CDS).

Design/methodology/approach

A novel generative adversarial network (GAN) for CDS prediction is proposed. The authors take three features into account that are highly relevant to the future trends of CDS: historical CDS price, news and financial leverage. The main goal of this model is to improve the existing GAN-based regression model by adding finance and news feature extraction approaches. The proposed model adopts an attentional long short-term memory network and convolution network to process historical CDS data and news information, respectively. In addition to enhancing the effectiveness of the GAN model, the authors also design a data sampling strategy to alleviate the overfitting issue.

Findings

The authors conduct an experiment with a real dataset and evaluate the performance of the proposed model. The components and selected features of the model are evaluated for their ability to improve the prediction performance. The experimental results show that the proposed model performs better than other machine learning algorithms and traditional regression GAN.

Originality/value

There are very few studies on prediction models for CDS. With the proposed novel approach, the authors can improve the performance of CDS predictions. The proposed work can thereby increase the commercial value of CDS predictions to support trading decisions.

Details

Data Technologies and Applications, vol. 56 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 26 May 2021

Ly Thi Hai Tran, Thoa Thi Kim Tu, Tran Thi Hong Nguyen, Hoa Thi Lien Nguyen and Xuan Vinh Vo

This paper examines the role of the annual report’s linguistic tone in predicting future firm performance in an emerging market, Vietnam.

1022

Abstract

Purpose

This paper examines the role of the annual report’s linguistic tone in predicting future firm performance in an emerging market, Vietnam.

Design/methodology/approach

Both manual coding approach and the naïve Bayesian algorithm are employed to determine the annual report tone, which is then used to investigate its impact on future firm performance.

Findings

The study finds that tone can predict firm performance one year ahead. The predictability of tone is strengthened for firms that have a high degree of information asymmetry. Besides, the government’s regulatory reforms on corporate disclosures enhance the predictive ability of tone.

Research limitations/implications

The study suggests the naïve Bayesian algorithm as a cost-efficient alternative for human coding in textual analysis. Also, information asymmetry and regulation changes should be modeled in future research on narrative disclosures.

Practical implications

The study sends messages to both investors and policymakers in emerging markets. Investors should pay more attention to the tone of annual reports for improving the accuracy of future firm performance prediction. Policymakers should regularly revise and update regulations on qualitative disclosure to reduce information asymmetry.

Originality/value

This study enhances understanding of the annual report’s role in a non-Western country that has been under-investigated. The research also provides original evidence of the link between annual report tone and future firm performance under different information asymmetry degrees. Furthermore, this study justifies the effectiveness of the governments’ regulatory reforms on corporate disclosure in developing countries. Finally, by applying both the human coding and machine learning approach, this research contributes to the literature on textual analysis methodology.

Details

International Journal of Emerging Markets, vol. 18 no. 2
Type: Research Article
ISSN: 1746-8809

Keywords

Article
Publication date: 8 April 2021

Mariem Bounabi, Karim Elmoutaouakil and Khalid Satori

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency…

Abstract

Purpose

This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals.

Design/methodology/approach

After preprocessing textual data, the original Neutrosophic TF-IDF applies the neutrosophic inference system (NIS) to produce weights for terms representing a document. Using the local frequency TF, global frequency IDF and text N's length as NIS inputs, this study generate two neutrosophic weights for a given term. The first measure provides information on the relevance degree for a word, and the second one represents their ambiguity degree. Next, the Zhang combination function is applied to combine neutrosophic weights outputs and present the final term weight, inserted in the document's representative vector. To analyze the NTF-IDF impact on the classification phase, this study uses a set of ML algorithms.

Findings

Practicing the neutrosophic logic (NL) characteristics, the authors have been able to study the ambiguity of the terms and their degree of relevance to represent a document. NL's choice has proven its effectiveness in defining significant text vectorization weights, especially for text classification tasks. The experimentation part demonstrates that the new method positively impacts the categorization. Moreover, the adopted system's recognition rate is higher than 91%, an accuracy score not attained using the FTF-IDF. Also, using benchmarked data sets, in different text mining fields, and many ML classifiers, i.e. SVM and Feed-Forward Network, and applying the proposed term scores NTF-IDF improves the accuracy by 10%.

Originality/value

The novelty of this paper lies in two aspects. First, a new term weighting method, which uses the term frequencies as components to define the relevance and the ambiguity of term; second, the application of NL to infer weights is considered as an original model in this paper, which also aims to correct the shortcomings of the FTF-IDF which uses fuzzy logic and its drawbacks. The introduced technique was combined with different ML models to improve the accuracy and relevance of the obtained feature vectors to fed the classification mechanism.

Details

International Journal of Web Information Systems, vol. 17 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 88