Search results

1 – 10 of over 2000
Article
Publication date: 26 February 2024

Chong Wu, Xiaofang Chen and Yongjie Jiang

While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of…

Abstract

Purpose

While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of enterprises and also jeopardizes the interests of investors. Therefore, it is important to understand how to accurately and reasonably predict the financial distress of enterprises.

Design/methodology/approach

In the present study, ensemble feature selection (EFS) and improved stacking were used for financial distress prediction (FDP). Mutual information, analysis of variance (ANOVA), random forest (RF), genetic algorithms, and recursive feature elimination (RFE) were chosen for EFS to select features. Since there may be missing information when feeding the results of the base learner directly into the meta-learner, the features with high importance were fed into the meta-learner together. A screening layer was added to select the meta-learner with better performance. Finally, Optima hyperparameters were used for parameter tuning by the learners.

Findings

An empirical study was conducted with a sample of A-share listed companies in China. The F1-score of the model constructed using the features screened by EFS reached 84.55%, representing an improvement of 4.37% compared to the original features. To verify the effectiveness of improved stacking, benchmark model comparison experiments were conducted. Compared to the original stacking model, the accuracy of the improved stacking model was improved by 0.44%, and the F1-score was improved by 0.51%. In addition, the improved stacking model had the highest area under the curve (AUC) value (0.905) among all the compared models.

Originality/value

Compared to previous models, the proposed FDP model has better performance, thus bridging the research gap of feature selection. The present study provides new ideas for stacking improvement research and a reference for subsequent research in this field.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 22 March 2024

Mohd Mustaqeem, Suhel Mustajab and Mahfooz Alam

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have…

Abstract

Purpose

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Gray Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Gray Wolf Optimization, inspired by the social hierarchy and hunting behavior of gray wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.

Design/methodology/approach

The integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.

Findings

The performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.

Originality/value

Experimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 18 April 2024

Vaishali Rajput, Preeti Mulay and Chandrashekhar Madhavrao Mahajan

Nature’s evolution has shaped intelligent behaviors in creatures like insects and birds, inspiring the field of Swarm Intelligence. Researchers have developed bio-inspired…

Abstract

Purpose

Nature’s evolution has shaped intelligent behaviors in creatures like insects and birds, inspiring the field of Swarm Intelligence. Researchers have developed bio-inspired algorithms to address complex optimization problems efficiently. These algorithms strike a balance between computational efficiency and solution optimality, attracting significant attention across domains.

Design/methodology/approach

Bio-inspired optimization techniques for feature engineering and its applications are systematically reviewed with chief objective of assessing statistical influence and significance of “Bio-inspired optimization”-based computational models by referring to vast research literature published between year 2015 and 2022.

Findings

The Scopus and Web of Science databases were explored for review with focus on parameters such as country-wise publications, keyword occurrences and citations per year. Springer and IEEE emerge as the most creative publishers, with indicative prominent and superior journals, namely, PLoS ONE, Neural Computing and Applications, Lecture Notes in Computer Science and IEEE Transactions. The “National Natural Science Foundation” of China and the “Ministry of Electronics and Information Technology” of India lead in funding projects in this area. China, India and Germany stand out as leaders in publications related to bio-inspired algorithms for feature engineering research.

Originality/value

The review findings integrate various bio-inspired algorithm selection techniques over a diverse spectrum of optimization techniques. Anti colony optimization contributes to decentralized and cooperative search strategies, bee colony optimization (BCO) improves collaborative decision-making, particle swarm optimization leads to exploration-exploitation balance and bio-inspired algorithms offer a range of nature-inspired heuristics.

Article
Publication date: 19 May 2023

Anil Kumar Swain, Aleena Swetapadma, Jitendra Kumar Rout and Bunil Kumar Balabantaray

The objective of the proposed work is to identify the most commonly occurring non–small cell carcinoma types, such as adenocarcinoma and squamous cell carcinoma, within the human…

Abstract

Purpose

The objective of the proposed work is to identify the most commonly occurring non–small cell carcinoma types, such as adenocarcinoma and squamous cell carcinoma, within the human population. Another objective of the work is to reduce the false positive rate during the classification.

Design/methodology/approach

In this work, a hybrid method using convolutional neural networks (CNNs), extreme gradient boosting (XGBoost) and long-short-term memory networks (LSTMs) has been proposed to distinguish between lung adenocarcinoma and squamous cell carcinoma. To extract features from non–small cell lung carcinoma images, a three-layer convolution and three-layer max-pooling-based CNN is used. A few important features have been selected from the extracted features using the XGBoost algorithm as the optimal feature. Finally, LSTM has been used for the classification of carcinoma types. The accuracy of the proposed method is 99.57 per cent, and the false positive rate is 0.427 per cent.

Findings

The proposed CNN–XGBoost–LSTM hybrid method has significantly improved the results in distinguishing between adenocarcinoma and squamous cell carcinoma. The importance of the method can be outlined as follows: It has a very low false positive rate of 0.427 per cent. It has very high accuracy, i.e. 99.57 per cent. CNN-based features are providing accurate results in classifying lung carcinoma. It has the potential to serve as an assisting aid for doctors.

Practical implications

It can be used by doctors as a secondary tool for the analysis of non–small cell lung cancers.

Social implications

It can help rural doctors by sending the patients to specialized doctors for more analysis of lung cancer.

Originality/value

In this work, a hybrid method using CNN, XGBoost and LSTM has been proposed to distinguish between lung adenocarcinoma and squamous cell carcinoma. A three-layer convolution and three-layer max-pooling-based CNN is used to extract features from the non–small cell lung carcinoma images. A few important features have been selected from the extracted features using the XGBoost algorithm as the optimal feature. Finally, LSTM has been used for the classification of carcinoma types.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 19 July 2023

Gaurav Kumar, Molla Ramizur Rahman, Abhinav Rajverma and Arun Kumar Misra

This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.

Abstract

Purpose

This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.

Design/methodology/approach

The study makes use of the Tobias and Brunnermeier (2016) estimator to quantify the systemic risk (ΔCoVaR) that banks contribute to the system. The methodology addresses a classification problem based on the probability that a particular bank will emit high systemic risk or moderate systemic risk. The study applies machine learning models such as logistic regression, random forest (RF), neural networks and gradient boosting machine (GBM) and addresses the issue of imbalanced data sets to investigate bank’s balance sheet features and bank’s stock features which may potentially determine the factors of systemic risk emission.

Findings

The study reports that across various performance matrices, the authors find that two specifications are preferred: RF and GBM. The study identifies lag of the estimator of systemic risk, stock beta, stock volatility and return on equity as important features to explain emission of systemic risk.

Practical implications

The findings will help banks and regulators with the key features that can be used to formulate the policy decisions.

Originality/value

This study contributes to the existing literature by suggesting classification algorithms that can be used to model the probability of systemic risk emission in a classification problem setting. Further, the study identifies the features responsible for the likelihood of systemic risk.

Details

Journal of Modelling in Management, vol. 19 no. 2
Type: Research Article
ISSN: 1746-5664

Keywords

Article
Publication date: 19 December 2023

Guilherme Dayrell Mendonça, Stanley Robson de Medeiros Oliveira, Orlando Fontes Lima Jr and Paulo Tarso Vilela de Resende

The objective of this paper is to evaluate whether the data from consignors, logistics service providers (LSPs) and consignees contribute to the prediction of air transport…

Abstract

Purpose

The objective of this paper is to evaluate whether the data from consignors, logistics service providers (LSPs) and consignees contribute to the prediction of air transport shipment delays in a machine learning application.

Design/methodology/approach

The research database contained 2,244 air freight intercontinental shipments to 4 automotive production plants in Latin America. Different algorithm classes were tested in the knowledge discovery in databases (KDD) process: support vector machine (SVM), random forest (RF), artificial neural networks (ANN) and k-nearest neighbors (KNN).

Findings

Shipper, consignee and LSP data attribute selection achieved 86% accuracy through the RF algorithm in a cross-validation scenario after a combined class balancing procedure.

Originality/value

These findings expand the current literature on machine learning applied to air freight delay management, which has mostly focused on weather, airport structure, flight schedule, ground delay and congestion as explanatory attributes.

Details

International Journal of Physical Distribution & Logistics Management, vol. 54 no. 1
Type: Research Article
ISSN: 0960-0035

Keywords

Article
Publication date: 29 July 2021

Aarathi S. and Vasundra S.

Pervasive analytics act as a prominent role in computer-aided prediction of non-communicating diseases. In the early stage, arrhythmia diagnosis detection helps prevent the cause…

Abstract

Purpose

Pervasive analytics act as a prominent role in computer-aided prediction of non-communicating diseases. In the early stage, arrhythmia diagnosis detection helps prevent the cause of death suddenly owing to heart failure or heart stroke. The arrhythmia scope can be identified by electrocardiogram (ECG) report.

Design/methodology/approach

The ECG report has been used extensively by several clinical experts. However, diagnosis accuracy has been dependent on clinical experience. For the prediction methods of computer-aided heart disease, both accuracy and sensitivity metrics play a remarkable part. Hence, the existing research contributions have optimized the machine-learning approaches to have a great significance in computer-aided methods, which perform predictive analysis of arrhythmia detection.

Findings

In reference to this, this paper determined a regression heuristics by tridimensional optimum features of ECG reports to perform pervasive analytics for computer-aided arrhythmia prediction. The intent of these reports is arrhythmia detection. From an empirical outcome, it has been envisioned that the project model of this contribution is more optimal and added a more advantage when compared to existing or contemporary approaches.

Originality/value

In reference to this, this paper determined a regression heuristics by tridimensional optimum features of ECG reports to perform pervasive analytics for computer-aided arrhythmia prediction. The intent of these reports is arrhythmia detection. From an empirical outcome, it has been envisioned that the project model of this contribution is more optimal and added a more advantage when compared to existing or contemporary approaches.

Details

International Journal of Pervasive Computing and Communications, vol. 20 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 3 April 2024

Samar Shilbayeh and Rihab Grassa

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to…

Abstract

Purpose

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to manage risks. This paper aims to investigate the credit rating patterns that are crucial for assessing creditworthiness of the Islamic banks, thereby evaluating the stability of their industry.

Design/methodology/approach

Three distinct machine learning algorithms are exploited and evaluated for the desired objective. This research initially uses the decision tree machine learning algorithm as a base learner conducting an in-depth comparison with the ensemble decision tree and Random Forest. Subsequently, the Apriori algorithm is deployed to uncover the most significant attributes impacting a bank’s credit rating. To appraise the previously elucidated models, a ten-fold cross-validation method is applied. This method involves segmenting the data sets into ten folds, with nine used for training and one for testing alternatively ten times changeable. This approach aims to mitigate any potential biases that could arise during the learning and training phases. Following this process, the accuracy is assessed and depicted in a confusion matrix as outlined in the methodology section.

Findings

The findings of this investigation reveal that the Random Forest machine learning algorithm superperforms others, achieving an impressive 90.5% accuracy in predicting credit ratings. Notably, our research sheds light on the significance of the loan-to-deposit ratio as a primary attribute affecting credit rating predictions. Moreover, this study uncovers additional pivotal banking features that intensely impact the measurements under study. This paper’s findings provide evidence that the loan-to-deposit ratio looks to be the purest bank attribute that affects credit rating prediction. In addition, deposit-to-assets ratio and profit sharing investment account ratio criteria are found to be effective in credit rating prediction and the ownership structure criterion came to be viewed as one of the essential bank attributes in credit rating prediction.

Originality/value

These findings contribute significant evidence to the understanding of attributes that strongly influence credit rating predictions within the banking sector. This study uniquely contributes by uncovering patterns that have not been previously documented in the literature, broadening our understanding in this field.

Details

International Journal of Islamic and Middle Eastern Finance and Management, vol. 17 no. 2
Type: Research Article
ISSN: 1753-8394

Keywords

Article
Publication date: 26 May 2022

Ismail Abiodun Sulaimon, Hafiz Alaka, Razak Olu-Ajayi, Mubashir Ahmad, Saheed Ajayi and Abdul Hye

Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully…

260

Abstract

Purpose

Road traffic emissions are generally believed to contribute immensely to air pollution, but the effect of road traffic data sets on air quality (AQ) predictions has not been fully investigated. This paper aims to investigate the effects traffic data set have on the performance of machine learning (ML) predictive models in AQ prediction.

Design/methodology/approach

To achieve this, the authors have set up an experiment with the control data set having only the AQ data set and meteorological (Met) data set, while the experimental data set is made up of the AQ data set, Met data set and traffic data set. Several ML models (such as extra trees regressor, eXtreme gradient boosting regressor, random forest regressor, K-neighbors regressor and two others) were trained, tested and compared on these individual combinations of data sets to predict the volume of PM2.5, PM10, NO2 and O3 in the atmosphere at various times of the day.

Findings

The result obtained showed that various ML algorithms react differently to the traffic data set despite generally contributing to the performance improvement of all the ML algorithms considered in this study by at least 20% and an error reduction of at least 18.97%.

Research limitations/implications

This research is limited in terms of the study area, and the result cannot be generalized outside of the UK as some of the inherent conditions may not be similar elsewhere. Additionally, only the ML algorithms commonly used in literature are considered in this research, therefore, leaving out a few other ML algorithms.

Practical implications

This study reinforces the belief that the traffic data set has a significant effect on improving the performance of air pollution ML prediction models. Hence, there is an indication that ML algorithms behave differently when trained with a form of traffic data set in the development of an AQ prediction model. This implies that developers and researchers in AQ prediction need to identify the ML algorithms that behave in their best interest before implementation.

Originality/value

The result of this study will enable researchers to focus more on algorithms of benefit when using traffic data sets in AQ prediction.

Details

Journal of Engineering, Design and Technology , vol. 22 no. 3
Type: Research Article
ISSN: 1726-0531

Keywords

Open Access
Article
Publication date: 23 January 2024

Luís Jacques de Sousa, João Poças Martins, Luís Sanhudo and João Santos Baptista

This study aims to review recent advances towards the implementation of ANN and NLP applications during the budgeting phase of the construction process. During this phase…

Abstract

Purpose

This study aims to review recent advances towards the implementation of ANN and NLP applications during the budgeting phase of the construction process. During this phase, construction companies must assess the scope of each task and map the client’s expectations to an internal database of tasks, resources and costs. Quantity surveyors carry out this assessment manually with little to no computer aid, within very austere time constraints, even though these results determine the company’s bid quality and are contractually binding.

Design/methodology/approach

This paper seeks to compile applications of machine learning (ML) and natural language processing in the architectural engineering and construction sector to find which methodologies can assist this assessment. The paper carries out a systematic literature review, following the preferred reporting items for systematic reviews and meta-analyses guidelines, to survey the main scientific contributions within the topic of text classification (TC) for budgeting in construction.

Findings

This work concludes that it is necessary to develop data sets that represent the variety of tasks in construction, achieve higher accuracy algorithms, widen the scope of their application and reduce the need for expert validation of the results. Although full automation is not within reach in the short term, TC algorithms can provide helpful support tools.

Originality/value

Given the increasing interest in ML for construction and recent developments, the findings disclosed in this paper contribute to the body of knowledge, provide a more automated perspective on budgeting in construction and break ground for further implementation of text-based ML in budgeting for construction.

Details

Construction Innovation , vol. 24 no. 7
Type: Research Article
ISSN: 1471-4175

Keywords

1 – 10 of over 2000