Search results

1 – 10 of 57
Article
Publication date: 13 March 2020

Jinwook Choi, Yongmoo Suh and Namchul Jung

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative…

Abstract

Purpose

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative information represented by published reports or management interview has been known as an important source in addition to quantitative information represented by financial values in assigning corporate credit rating in practice. Nevertheless, prior studies have room for further research in that they rarely employed qualitative information in developing prediction model of corporate credit rating.

Design/methodology/approach

This study adopted three document vectorization methods, Bag-Of-Words (BOW), Word to Vector (Word2Vec) and Document to Vector (Doc2Vec), to transform an unstructured textual data into a numeric vector, so that Machine Learning (ML) algorithms accept it as an input. For the experiments, we used the corpus of Management’s Discussion and Analysis (MD&A) section in 10-K financial reports as well as financial variables and corporate credit rating data.

Findings

Experimental results from a series of multi-class classification experiments show the predictive models trained by both financial variables and vectors extracted from MD&A data outperform the benchmark models trained only by traditional financial variables.

Originality/value

This study proposed a new approach for corporate credit rating prediction by using qualitative information extracted from MD&A documents as an input to ML-based prediction models. Also, this research adopted and compared three textual vectorization methods in the domain of corporate credit rating prediction and showed that BOW mostly outperformed Word2Vec and Doc2Vec.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 5 March 2021

Xuan Ji, Jiachen Wang and Zhijun Yan

Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with…

17266

Abstract

Purpose

Stock price prediction is a hot topic and traditional prediction methods are usually based on statistical and econometric models. However, these models are difficult to deal with nonstationary time series data. With the rapid development of the internet and the increasing popularity of social media, online news and comments often reflect investors’ emotions and attitudes toward stocks, which contains a lot of important information for predicting stock price. This paper aims to develop a stock price prediction method by taking full advantage of social media data.

Design/methodology/approach

This study proposes a new prediction method based on deep learning technology, which integrates traditional stock financial index variables and social media text features as inputs of the prediction model. This study uses Doc2Vec to build long text feature vectors from social media and then reduce the dimensions of the text feature vectors by stacked auto-encoder to balance the dimensions between text feature variables and stock financial index variables. Meanwhile, based on wavelet transform, the time series data of stock price is decomposed to eliminate the random noise caused by stock market fluctuation. Finally, this study uses long short-term memory model to predict the stock price.

Findings

The experiment results show that the method performs better than all three benchmark models in all kinds of evaluation indicators and can effectively predict stock price.

Originality/value

In this paper, this study proposes a new stock price prediction model that incorporates traditional financial features and social media text features which are derived from social media based on deep learning technology.

Details

International Journal of Crowd Science, vol. 5 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Article
Publication date: 5 July 2021

Jenish Dhanani, Rupa Mehta and Dipti P. Rana

In the Indian judicial system, the court considers interpretations of similar previous judgments for the present case. An essential requirement of legal practitioners is to…

Abstract

Purpose

In the Indian judicial system, the court considers interpretations of similar previous judgments for the present case. An essential requirement of legal practitioners is to determine the most relevant judgments from an enormous amount of judgments for preparing supportive, beneficial and favorable arguments against the opponent. It urges a strong demand to develop a Legal Document Recommendation System (LDRS) to automate the process. In existing works, traditionally preprocessed judgment corpus is processed by Doc2Vec to learn semantically rich judgment embedding space (i.e. vector space). Here, vectors of semantically relevant judgments are in close proximity, as Doc2Vec can effectively capture semantic meanings. The enormous amount of judgments produces a huge noisy corpus and vocabulary which possesses a significant challenge: traditional preprocessing cannot fully eliminate noisy data from the corpus and due to this, the Doc2Vec demands huge memory and time to learn the judgment embedding. It also adversely affects the recommendation performance in terms of correctness. This paper aims to develop an effective and efficient LDRS to support civilians and the legal fraternity.

Design/methodology/approach

To overcome previously mentioned challenges, this research proposes the LDRS that uses the proposed Generalized English and Indian Legal Dictionary (GEILD) which keeps the corpus of relevant dictionary words only and discards noisy elements. Accordingly, the proposed LDRS significantly reduces the corpus size, which can potentially improve the space and time efficiency of Doc2Vec.

Findings

The experimental results confirm that the proposed LDRS with GEILD yield superior performance in terms of accuracy, F1-Score, MCC-Score, with significant improvement in the space and time efficiency.

Originality/value

The proposed LDRS uses the customized domain-specific preprocessing and novel legal dictionary (i.e. GEILD) to precisely recommend the relevant judgments. The proposed LDRS can be incorporated with online legal search repositories/engines to enrich their functionality.

Details

International Journal of Web Information Systems, vol. 17 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Abstract

Details

Big Data Analytics for the Prediction of Tourist Preferences Worldwide
Type: Book
ISBN: 978-1-83549-339-7

Article
Publication date: 28 April 2020

Siham Eddamiri, Asmaa Benghabrit and Elmoukhtar Zemmouri

The purpose of this paper is to present a generic pipeline for Resource Description Framework (RDF) graph mining to provide a comprehensive review of each step in the knowledge…

Abstract

Purpose

The purpose of this paper is to present a generic pipeline for Resource Description Framework (RDF) graph mining to provide a comprehensive review of each step in the knowledge discovery from data process. The authors also investigate different approaches and combinations to extract feature vectors from RDF graphs to apply the clustering and theme identification tasks.

Design/methodology/approach

The proposed methodology comprises four steps. First, the authors generate several graph substructures (Walks, Set of Walks, Walks with backward and Set of Walks with backward). Second, the authors build neural language models to extract numerical vectors of the generated sequences by using word embedding techniques (Word2Vec and Doc2Vec) combined with term frequency-inverse document frequency (TF-IDF). Third, the authors use the well-known K-means algorithm to cluster the RDF graph. Finally, the authors extract the most relevant rdf:type from the grouped vertices to describe the semantics of each theme by generating the labels.

Findings

The experimental evaluation on the state of the art data sets (AIFB, BGS and Conference) shows that the combination of Set of Walks-with-backward with TF-IDF and Doc2vec techniques give excellent results. In fact, the clustering results reach more than 97% and 90% in terms of purity and F-measure, respectively. Concerning the theme identification, the results show that by using the same combination, the purity and F-measure criteria reach more than 90% for all the considered data sets.

Originality/value

The originality of this paper lies in two aspects: first, a new machine learning pipeline for RDF data is presented; second, an efficient process to identify and extract relevant graph substructures from an RDF graph is proposed. The proposed techniques were combined with different neural language models to improve the accuracy and relevance of the obtained feature vectors that will be fed to the clustering mechanism.

Details

International Journal of Web Information Systems, vol. 16 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 21 March 2024

Thamaraiselvan Natarajan, P. Pragha, Krantiraditya Dhalmahapatra and Deepak Ramanan Veera Raghavan

The metaverse, which is now revolutionizing how brands strategize their business needs, necessitates understanding individual opinions. Sentiment analysis deciphers emotions and…

Abstract

Purpose

The metaverse, which is now revolutionizing how brands strategize their business needs, necessitates understanding individual opinions. Sentiment analysis deciphers emotions and uncovers a deeper understanding of user opinions and trends within this digital realm. Further, sentiments signify the underlying factor that triggers one’s intent to use technology like the metaverse. Positive sentiments often correlate with positive user experiences, while negative sentiments may signify issues or frustrations. Brands may consider these sentiments and implement them on their metaverse platforms for a seamless user experience.

Design/methodology/approach

The current study adopts machine learning sentiment analysis techniques using Support Vector Machine, Doc2Vec, RNN, and CNN to explore the sentiment of individuals toward metaverse in a user-generated context. The topics were discovered using the topic modeling method, and sentiment analysis was performed subsequently.

Findings

The results revealed that the users had a positive notion about the experience and orientation of the metaverse while having a negative attitude towards the economy, data, and cyber security. The accuracy of each model has been analyzed, and it has been concluded that CNN provides better accuracy on an average of 89% compared to the other models.

Research limitations/implications

Analyzing sentiment can reveal how the general public perceives the metaverse. Positive sentiment may suggest enthusiasm and readiness for adoption, while negative sentiment might indicate skepticism or concerns. Given the positive user notions about the metaverse’s experience and orientation, developers should continue to focus on creating innovative and immersive virtual environments. At the same time, users' concerns about data, cybersecurity and the economy are critical. The negative attitude toward the metaverse’s economy suggests a need for innovation in economic models within the metaverse. Also, developers and platform operators should prioritize robust data security measures. Implementing strong encryption and two-factor authentication and educating users about cybersecurity best practices can address these concerns and enhance user trust.

Social implications

In terms of societal dynamics, the metaverse could revolutionize communication and relationships by altering traditional notions of proximity and the presence of its users. Further, virtual economies might emerge, with virtual assets having real-world value, presenting both opportunities and challenges for industries and regulators.

Originality/value

The current study contributes to research as it is the first of its kind to explore the sentiments of individuals toward the metaverse using deep learning techniques and evaluate the accuracy of these models.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Abstract

Details

Big Data Analytics for the Prediction of Tourist Preferences Worldwide
Type: Book
ISBN: 978-1-83549-339-7

Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 8 June 2022

Qingqing Zhou

Citations have been used as a common basis to measure the academic accomplishments of scientific books. However, traditional citation analysis ignored content mining and without…

Abstract

Purpose

Citations have been used as a common basis to measure the academic accomplishments of scientific books. However, traditional citation analysis ignored content mining and without consideration of citation equivalence, which may lead to the decline of evaluation reliability. Hence, this paper aims to integrate multi-level citation information to conduct multi-dimensional analysis.

Design/methodology/approach

In this paper, books’ academic impacts were measured by integrating multi-level citation resources, including books’ citation frequencies and citation-related contents. Specifically, firstly, books’ citation frequencies were counted as the frequency-level metric. Secondly, content-level metrics were detected from multi-dimensional citation contents based on finer-grained mining, including topic extraction on the metadata and citation classification on the citation contexts. Finally, differential metric weighting methods were compared with integrate the multi-level metrics and computing books’ academic impacts.

Findings

The experimental results indicate that the integration of multiple citation resources is necessary, as it can significantly improve the comprehensiveness of the evaluation results. Meanwhile, compared with the type differences of books, disciplinary differences need more attention when evaluating the academic impacts of books.

Originality/value

Academic impact assessment of books via integrating multi-level citation information can provide more detailed evaluation information and cover shortcomings of methods based on single citation data. Moreover, the method proposed in this paper is publication independent, which can be used to measure other publications besides books.

Details

The Electronic Library , vol. 40 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 22 August 2024

Guanghui Ye, Songye Li, Lanqi Wu, Jinyu Wei, Chuan Wu, Yujie Wang, Jiarong Li, Bo Liang and Shuyan Liu

Community question answering (CQA) platforms play a significant role in knowledge dissemination and information retrieval. Expert recommendation can assist users by helping them…

Abstract

Purpose

Community question answering (CQA) platforms play a significant role in knowledge dissemination and information retrieval. Expert recommendation can assist users by helping them find valuable answers efficiently. Existing works mainly use content and user behavioural features for expert recommendation, and fail to effectively leverage the correlation across multi-dimensional features.

Design/methodology/approach

To address the above issue, this work proposes a multi-dimensional feature fusion-based method for expert recommendation, aiming to integrate features of question–answerer pairs from three dimensions, including network features, content features and user behaviour features. Specifically, network features are extracted by first learning user and tag representations using network representation learning methods and then calculating questioner–answerer similarities and answerer–tag similarities. Secondly, content features are extracted from textual contents of questions and answerer generated contents using text representation models. Thirdly, user behaviour features are extracted from user actions observed in CQA platforms, such as following and likes. Finally, given a question–answerer pair, the three dimensional features are fused and used to predict the probability of the candidate expert answering the given question.

Findings

The proposed method is evaluated on a data set collected from a publicly available CQA platform. Results show that the proposed method is effective compared with baseline methods. Ablation study shows that network features is the most important dimensional features among all three dimensional features.

Practical implications

This work identifies three dimensional features for expert recommendation in CQA platforms and conducts a comprehensive investigation into the importance of features for the performance of expert recommendation. The results suggest that network features are the most important features among three-dimensional features, which indicates that the performance of expert recommendation in CQA platforms is likely to get improved by further mining network features using advanced techniques, such as graph neural networks. One broader implication is that it is always important to include multi-dimensional features for expert recommendation and conduct systematic investigation to identify the most important features for finding directions for improvement.

Originality/value

This work proposes three-dimensional features given that existing works mostly focus on one or two-dimensional features and demonstrate the effectiveness of the newly proposed features.

Details

The Electronic Library , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0264-0473

Keywords

1 – 10 of 57