Search results

1 – 10 of 675
Article
Publication date: 20 July 2023

Mu Shengdong, Liu Yunjie and Gu Jijian

By introducing Stacking algorithm to solve the underfitting problem caused by insufficient data in traditional machine learning, this paper provides a new solution to the cold…

Abstract

Purpose

By introducing Stacking algorithm to solve the underfitting problem caused by insufficient data in traditional machine learning, this paper provides a new solution to the cold start problem of entrepreneurial borrowing risk control.

Design/methodology/approach

The authors introduce semi-supervised learning and integrated learning into the field of migration learning, and innovatively propose the Stacking model migration learning, which can independently train models on entrepreneurial borrowing credit data, and then use the migration strategy itself as the learning object, and use the Stacking algorithm to combine the prediction results of the source domain model and the target domain model.

Findings

The effectiveness of the two migration learning models is evaluated with real data from an entrepreneurial borrowing. The algorithmic performance of the Stacking-based model migration learning is further improved compared to the benchmark model without migration learning techniques, with the model area under curve value rising to 0.8. Comparing the two migration learning models reveals that the model-based migration learning approach performs better. The reason for this is that the sample-based migration learning approach only eliminates the noisy samples that are relatively less similar to the entrepreneurial borrowing data. However, the calculation of similarity and the weighing of similarity are subjective, and there is no unified judgment standard and operation method, so there is no guarantee that the retained traditional credit samples have the same sample distribution and feature structure as the entrepreneurial borrowing data.

Practical implications

From a practical standpoint, on the one hand, it provides a new solution to the cold start problem of entrepreneurial borrowing risk control. The small number of labeled high-quality samples cannot support the learning and deployment of big data risk control models, which is the cold start problem of the entrepreneurial borrowing risk control system. By extending the training sample set with auxiliary domain data through suitable migration learning methods, the prediction performance of the model can be improved to a certain extent and more generalized laws can be learned.

Originality/value

This paper introduces the thought method of migration learning to the entrepreneurial borrowing scenario, provides a new solution to the cold start problem of the entrepreneurial borrowing risk control system and verifies the feasibility and effectiveness of the migration learning method applied in the risk control field through empirical data.

Details

Management Decision, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0025-1747

Keywords

Article
Publication date: 21 December 2023

Majid Rahi, Ali Ebrahimnejad and Homayun Motameni

Taking into consideration the current human need for agricultural produce such as rice that requires water for growth, the optimal consumption of this valuable liquid is…

Abstract

Purpose

Taking into consideration the current human need for agricultural produce such as rice that requires water for growth, the optimal consumption of this valuable liquid is important. Unfortunately, the traditional use of water by humans for agricultural purposes contradicts the concept of optimal consumption. Therefore, designing and implementing a mechanized irrigation system is of the highest importance. This system includes hardware equipment such as liquid altimeter sensors, valves and pumps which have a failure phenomenon as an integral part, causing faults in the system. Naturally, these faults occur at probable time intervals, and the probability function with exponential distribution is used to simulate this interval. Thus, before the implementation of such high-cost systems, its evaluation is essential during the design phase.

Design/methodology/approach

The proposed approach included two main steps: offline and online. The offline phase included the simulation of the studied system (i.e. the irrigation system of paddy fields) and the acquisition of a data set for training machine learning algorithms such as decision trees to detect, locate (classification) and evaluate faults. In the online phase, C5.0 decision trees trained in the offline phase were used on a stream of data generated by the system.

Findings

The proposed approach is a comprehensive online component-oriented method, which is a combination of supervised machine learning methods to investigate system faults. Each of these methods is considered a component determined by the dimensions and complexity of the case study (to discover, classify and evaluate fault tolerance). These components are placed together in the form of a process framework so that the appropriate method for each component is obtained based on comparison with other machine learning methods. As a result, depending on the conditions under study, the most efficient method is selected in the components. Before the system implementation phase, its reliability is checked by evaluating the predicted faults (in the system design phase). Therefore, this approach avoids the construction of a high-risk system. Compared to existing methods, the proposed approach is more comprehensive and has greater flexibility.

Research limitations/implications

By expanding the dimensions of the problem, the model verification space grows exponentially using automata.

Originality/value

Unlike the existing methods that only examine one or two aspects of fault analysis such as fault detection, classification and fault-tolerance evaluation, this paper proposes a comprehensive process-oriented approach that investigates all three aspects of fault analysis concurrently.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 27 February 2023

Dilawar Ali, Kenzo Milleville, Steven Verstockt, Nico Van de Weghe, Sally Chambers and Julie M. Birkholz

Historical newspaper collections provide a wealth of information about the past. Although the digitization of these collections significantly improves their accessibility, a large…

Abstract

Purpose

Historical newspaper collections provide a wealth of information about the past. Although the digitization of these collections significantly improves their accessibility, a large portion of digitized historical newspaper collections, such as those of KBR, the Royal Library of Belgium, are not yet searchable at article-level. However, recent developments in AI-based research methods, such as document layout analysis, have the potential for further enriching the metadata to improve the searchability of these historical newspaper collections. This paper aims to discuss the aforementioned issue.

Design/methodology/approach

In this paper, the authors explore how existing computer vision and machine learning approaches can be used to improve access to digitized historical newspapers. To do this, the authors propose a workflow, using computer vision and machine learning approaches to (1) provide article-level access to digitized historical newspaper collections using document layout analysis, (2) extract specific types of articles (e.g. feuilletons – literary supplements from Le Peuple from 1938), (3) conduct image similarity analysis using (un)supervised classification methods and (4) perform named entity recognition (NER) to link the extracted information to open data.

Findings

The results show that the proposed workflow improves the accessibility and searchability of digitized historical newspapers, and also contributes to the building of corpora for digital humanities research. The AI-based methods enable automatic extraction of feuilletons, clustering of similar images and dynamic linking of related articles.

Originality/value

The proposed workflow enables automatic extraction of articles, including detection of a specific type of article, such as a feuilleton or literary supplement. This is particularly valuable for humanities researchers as it improves the searchability of these collections and enables corpora to be built around specific themes. Article-level access to, and improved searchability of, KBR's digitized newspapers are demonstrated through the online tool (https://tw06v072.ugent.be/kbr/).

Article
Publication date: 9 January 2024

Ning Chen, Zhenyu Zhang and An Chen

Consequence prediction is an emerging topic in safety management concerning the severity outcome of accidents. In practical applications, it is usually implemented through…

Abstract

Purpose

Consequence prediction is an emerging topic in safety management concerning the severity outcome of accidents. In practical applications, it is usually implemented through supervised learning methods; however, the evaluation of classification results remains a challenge. The previous studies mostly adopted simplex evaluation based on empirical and quantitative assessment strategies. This paper aims to shed new light on the comprehensive evaluation and comparison of diverse classification methods through visualization, clustering and ranking techniques.

Design/methodology/approach

An empirical study is conducted using 9 state-of-the-art classification methods on a real-world data set of 653 construction accidents in China for predicting the consequence with respect to 39 carefully featured factors and accident type. The proposed comprehensive evaluation enriches the interpretation of classification results from different perspectives. Furthermore, the critical factors leading to severe construction accidents are identified by analyzing the coefficients of a logistic regression model.

Findings

This paper identifies the critical factors that significantly influence the consequence of construction accidents, which include accident type (particularly collapse), improper accident reporting and handling (E21), inadequate supervision engineers (O41), no special safety department (O11), delayed or low-quality drawings (T11), unqualified contractor (C21), schedule pressure (C11), multi-level subcontracting (C22), lacking safety examination (S22), improper operation of mechanical equipment (R11) and improper construction procedure arrangement (T21). The prediction models and findings of critical factors help make safety intervention measures in a targeted way and enhance the experience of safety professionals in the construction industry.

Research limitations/implications

The empirical study using some well-known classification methods for forecasting the consequences of construction accidents provides some evidence for the comprehensive evaluation of multiple classifiers. These techniques can be used jointly with other evaluation approaches for a comprehensive understanding of the classification algorithms. Despite the limitation of specific methods used in the study, the presented methodology can be configured with other classification methods and performance metrics and even applied to other decision-making problems such as clustering.

Originality/value

This study sheds new light on the comprehensive comparison and evaluation of classification results through visualization, clustering and ranking techniques using an empirical study of consequence prediction of construction accidents. The relevance of construction accident type is discussed with the severity of accidents. The critical factors influencing the accident consequence are identified for the sake of taking prevention measures for risk reduction. The proposed method can be applied to other decision-making tasks where the evaluation is involved as an important component.

Details

Construction Innovation , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1471-4175

Keywords

Article
Publication date: 10 April 2024

Aslıhan Dursun-Cengizci and Meltem Caber

This study aims to predict customer churn in resort hotels by calculating the churn probability of repeat customers for future stays in the same hotel brand.

27

Abstract

Purpose

This study aims to predict customer churn in resort hotels by calculating the churn probability of repeat customers for future stays in the same hotel brand.

Design/methodology/approach

Based on the recency, frequency, monetary (RFM) paradigm, random forest and logistic regression supervised machine learning algorithms were used to predict churn behavior. The model with superior performance was used to detect potential churners and generate a priority matrix.

Findings

The random forest algorithm showed a higher prediction performance with an 80% accuracy rate. The most important variables were RFM-based, followed by hotel sector-specific variables such as market, season, accompaniers and booker. Some managerial strategies were proposed to retain future churners, clustered as “hesitant,” “economy,” “alternative seeker,” and “opportunity chaser” customer groups.

Research limitations/implications

This study contributes to the theoretical understanding of customer behavior in the hospitality industry and provides valuable insight for hotel practitioners by demonstrating the methods that facilitate the identification of potential churners and their characteristics.

Originality/value

Most customer retention studies in hospitality either concentrate on the antecedents of retention or customers’ revisit intentions using traditional methods. Taking a unique place within the literature, this study conducts churn prediction analysis for repeat hotel customers by opening a new area for inquiry in hospitality studies.

Details

International Journal of Contemporary Hospitality Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 26 September 2022

Christian Nnaemeka Egwim, Hafiz Alaka, Oluwapelumi Oluwaseun Egunjobi, Alvaro Gomes and Iosif Mporas

This study aims to compare and evaluate the application of commonly used machine learning (ML) algorithms used to develop models for assessing energy efficiency of buildings.

Abstract

Purpose

This study aims to compare and evaluate the application of commonly used machine learning (ML) algorithms used to develop models for assessing energy efficiency of buildings.

Design/methodology/approach

This study foremostly combined building energy efficiency ratings from several data sources and used them to create predictive models using a variety of ML methods. Secondly, to test the hypothesis of ensemble techniques, this study designed a hybrid stacking ensemble approach based on the best performing bagging and boosting ensemble methods generated from its predictive analytics.

Findings

Based on performance evaluation metrics scores, the extra trees model was shown to be the best predictive model. More importantly, this study demonstrated that the cumulative result of ensemble ML algorithms is usually always better in terms of predicted accuracy than a single method. Finally, it was discovered that stacking is a superior ensemble approach for analysing building energy efficiency than bagging and boosting.

Research limitations/implications

While the proposed contemporary method of analysis is assumed to be applicable in assessing energy efficiency of buildings within the sector, the unique data transformation used in this study may not, as typical of any data driven model, be transferable to the data from other regions other than the UK.

Practical implications

This study aids in the initial selection of appropriate and high-performing ML algorithms for future analysis. This study also assists building managers, residents, government agencies and other stakeholders in better understanding contributing factors and making better decisions about building energy performance. Furthermore, this study will assist the general public in proactively identifying buildings with high energy demands, potentially lowering energy costs by promoting avoidance behaviour and assisting government agencies in making informed decisions about energy tariffs when this novel model is integrated into an energy monitoring system.

Originality/value

This study fills a gap in the lack of a reason for selecting appropriate ML algorithms for assessing building energy efficiency. More importantly, this study demonstrated that the cumulative result of ensemble ML algorithms is usually always better in terms of predicted accuracy than a single method.

Details

Journal of Engineering, Design and Technology , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 12 September 2023

Sumei Yao, Fan Wang, Jing Chen and Quan Lu

Social media texts as a data source in depression research have emerged as a significant convergence between Information Management and Public Health in recent years. This paper…

Abstract

Purpose

Social media texts as a data source in depression research have emerged as a significant convergence between Information Management and Public Health in recent years. This paper aims to sort out the depression-related study conducted on the text on social media, with particular attention to the research theme and methods.

Design/methodology/approach

The authors finally selected research articles published in Web of Science, Wiley, ACM Digital Library, EBSCO, IEEE Xplore and JMIR databases, covering 57 articles.

Findings

(1) According to the coding results, Depression Prediction and Linguistic Characteristics and Information Behavior are the two most popular themes. The theme of Patient Needs has progressed over the past few years. Still, there is a lesser focus on Stigma and Antidepressants. (2) Researchers prefer quantitative methods such as machine learning and statistical analysis to qualitative ones. (4) According to the analysis of the data collection platforms, more researchers used comprehensive social media sites like Reddit and Facebook than depression-specific communities like Sunforum and Alonelylife.

Practical implications

The authors recommend employing machine learning and statistical analysis to explore factors related to Stigmatization and Antidepressants thoroughly. Additionally, conducting mixed-methods studies incorporating data from diverse sources would be valuable. Such approaches would provide insights beneficial to policymakers and pharmaceutical companies seeking a comprehensive understanding of depression.

Originality/value

This article signifies a pioneering effort in systematically gathering and examining the themes and methodologies within the intersection of health-related texts on social media and depression.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Open Access
Article
Publication date: 23 November 2023

Reema Khaled AlRowais and Duaa Alsaeed

Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of…

229

Abstract

Purpose

Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of data on the internet via platforms like social media sites. Stance detection system helps determine whether the author agree, against or has a neutral opinion with the given target. Most of the research in stance detection focuses on the English language, while few research was conducted on the Arabic language.

Design/methodology/approach

This paper aimed to address stance detection on Arabic tweets by building and comparing different stance detection models using four transformers, namely: Araelectra, MARBERT, AraBERT and Qarib. Using different weights for these transformers, the authors performed extensive experiments fine-tuning the task of stance detection Arabic tweets with the four different transformers.

Findings

The results showed that the AraBERT model learned better than the other three models with a 70% F1 score followed by the Qarib model with a 68% F1 score.

Research limitations/implications

A limitation of this study is the imbalanced dataset and the limited availability of annotated datasets of SD in Arabic.

Originality/value

Provide comprehensive overview of the current resources for stance detection in the literature, including datasets and machine learning methods used. Therefore, the authors examined the models to analyze and comprehend the obtained findings in order to make recommendations for the best performance models for the stance detection task.

Details

Arab Gulf Journal of Scientific Research, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1985-9899

Keywords

Article
Publication date: 13 April 2023

Dandan He, Zhong Yao, Futao Zhao and Yue Wang

Retail investors are prone to be affected by information dissemination in social media with the rapid development of Web 2.0. The purpose of this study is to recognize the factors…

Abstract

Purpose

Retail investors are prone to be affected by information dissemination in social media with the rapid development of Web 2.0. The purpose of this study is to recognize the factors that may impact users' retweet behavior, namely information dissemination in the online financial community, through machine learning techniques.

Design/methodology/approach

This paper crawled data from the Chinese online financial community (Xueqiu.com) and extracted author-related, content-related, situation-related, stock-related and stock market-related features from the dataset. The best information dissemination prediction model based on these features was determined by evaluating five classifiers with various performance metrics, and the predictability of different feature groups was tested.

Findings

Five prevalent classifiers were evaluated with various performance metrics and the random forest classifier was proven to be the best retweet prediction model in the authors’ experiments. Moreover, the predictability of author-related, content-related and market-related features was illustrated to be relatively better than that of the other two feature groups. Several particularly important features, such as the author's followers and the rise and fall of the stock index, were recognized in this paper at last.

Originality/value

This study contributes to in-depth research on information dissemination in the financial domain. The findings of this study have important practical implications for government regulators to supervise public opinion in the financial market.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 22 April 2022

Sreedhar Jyothi and Geetanjali Nelloru

Patients having ventricular arrhythmias and atrial fibrillation, that are early markers of stroke and sudden cardiac death, as well as benign subjects are all studied using the…

Abstract

Purpose

Patients having ventricular arrhythmias and atrial fibrillation, that are early markers of stroke and sudden cardiac death, as well as benign subjects are all studied using the electrocardiogram (ECG). In order to identify cardiac anomalies, ECG signals analyse the heart's electrical activity and show output in the form of waveforms. Patients with these disorders must be identified as soon as possible. ECG signals can be difficult, time-consuming and subject to inter-observer variability when inspected manually.

Design/methodology/approach

There are various forms of arrhythmias that are difficult to distinguish in complicated non-linear ECG data. It may be beneficial to use computer-aided decision support systems (CAD). It is possible to classify arrhythmias in a rapid, accurate, repeatable and objective manner using the CAD, which use machine learning algorithms to identify the tiny changes in cardiac rhythms. Cardiac infractions can be classified and detected using this method. The authors want to categorize the arrhythmia with better accurate findings in even less computational time as the primary objective. Using signal and axis characteristics and their association n-grams as features, this paper makes a significant addition to the field. Using a benchmark dataset as input to multi-label multi-fold cross-validation, an experimental investigation was conducted.

Findings

This dataset was used as input for cross-validation on contemporary models and the resulting cross-validation metrics have been weighed against the performance metrics of other contemporary models. There have been few false alarms with the suggested model's high sensitivity and specificity.

Originality/value

The results of cross validation are significant. In terms of specificity, sensitivity, and decision accuracy, the proposed model outperforms other contemporary models.

Details

International Journal of Intelligent Unmanned Systems, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2049-6427

Keywords

1 – 10 of 675