Search results

1 – 10 of 117
Article
Publication date: 22 March 2024

Mohd Mustaqeem, Suhel Mustajab and Mahfooz Alam

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have…

Abstract

Purpose

Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Gray Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Gray Wolf Optimization, inspired by the social hierarchy and hunting behavior of gray wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.

Design/methodology/approach

The integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.

Findings

The performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.

Originality/value

Experimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 4 May 2023

Zeping Wang, Hengte Du, Liangyan Tao and Saad Ahmed Javed

The traditional failure mode and effect analysis (FMEA) has some limitations, such as the neglect of relevant historical data, subjective use of rating numbering and the less…

Abstract

Purpose

The traditional failure mode and effect analysis (FMEA) has some limitations, such as the neglect of relevant historical data, subjective use of rating numbering and the less rationality and accuracy of the Risk Priority Number. The current study proposes a machine learning–enhanced FMEA (ML-FMEA) method based on a popular machine learning tool, Waikato environment for knowledge analysis (WEKA).

Design/methodology/approach

This work uses the collected FMEA historical data to predict the probability of component/product failure risk by machine learning based on different commonly used classifiers. To compare the correct classification rate of ML-FMEA based on different classifiers, the 10-fold cross-validation is employed. Moreover, the prediction error is estimated by repeated experiments with different random seeds under varying initialization settings. Finally, the case of the submersible pump in Bhattacharjee et al. (2020) is utilized to test the performance of the proposed method.

Findings

The results show that ML-FMEA, based on most of the commonly used classifiers, outperforms the Bhattacharjee model. For example, the ML-FMEA based on Random Committee improves the correct classification rate from 77.47 to 90.09 per cent and area under the curve of receiver operating characteristic curve (ROC) from 80.9 to 91.8 per cent, respectively.

Originality/value

The proposed method not only enables the decision-maker to use the historical failure data and predict the probability of the risk of failure but also may pave a new way for the application of machine learning techniques in FMEA.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 5 October 2023

Babitha Philip and Hamad AlJassmi

To proactively draw efficient maintenance plans, road agencies should be able to forecast main road distress parameters, such as cracking, rutting, deflection and International…

Abstract

Purpose

To proactively draw efficient maintenance plans, road agencies should be able to forecast main road distress parameters, such as cracking, rutting, deflection and International Roughness Index (IRI). Nonetheless, the behavior of those parameters throughout pavement life cycles is associated with high uncertainty, resulting from various interrelated factors that fluctuate over time. This study aims to propose the use of dynamic Bayesian belief networks for the development of time-series prediction models to probabilistically forecast road distress parameters.

Design/methodology/approach

While Bayesian belief network (BBN) has the merit of capturing uncertainty associated with variables in a domain, dynamic BBNs, in particular, are deemed ideal for forecasting road distress over time due to its Markovian and invariant transition probability properties. Four dynamic BBN models are developed to represent rutting, deflection, cracking and IRI, using pavement data collected from 32 major road sections in the United Arab Emirates between 2013 and 2019. Those models are based on several factors affecting pavement deterioration, which are classified into three categories traffic factors, environmental factors and road-specific factors.

Findings

The four developed performance prediction models achieved an overall precision and reliability rate of over 80%.

Originality/value

The proposed approach provides flexibility to illustrate road conditions under various scenarios, which is beneficial for pavement maintainers in obtaining a realistic representation of expected future road conditions, where maintenance efforts could be prioritized and optimized.

Details

Construction Innovation , vol. 24 no. 1
Type: Research Article
ISSN: 1471-4175

Keywords

Article
Publication date: 19 July 2023

Gaurav Kumar, Molla Ramizur Rahman, Abhinav Rajverma and Arun Kumar Misra

This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.

Abstract

Purpose

This study aims to analyse the systemic risk emitted by all publicly listed commercial banks in a key emerging economy, India.

Design/methodology/approach

The study makes use of the Tobias and Brunnermeier (2016) estimator to quantify the systemic risk (ΔCoVaR) that banks contribute to the system. The methodology addresses a classification problem based on the probability that a particular bank will emit high systemic risk or moderate systemic risk. The study applies machine learning models such as logistic regression, random forest (RF), neural networks and gradient boosting machine (GBM) and addresses the issue of imbalanced data sets to investigate bank’s balance sheet features and bank’s stock features which may potentially determine the factors of systemic risk emission.

Findings

The study reports that across various performance matrices, the authors find that two specifications are preferred: RF and GBM. The study identifies lag of the estimator of systemic risk, stock beta, stock volatility and return on equity as important features to explain emission of systemic risk.

Practical implications

The findings will help banks and regulators with the key features that can be used to formulate the policy decisions.

Originality/value

This study contributes to the existing literature by suggesting classification algorithms that can be used to model the probability of systemic risk emission in a classification problem setting. Further, the study identifies the features responsible for the likelihood of systemic risk.

Details

Journal of Modelling in Management, vol. 19 no. 2
Type: Research Article
ISSN: 1746-5664

Keywords

Article
Publication date: 5 April 2022

Saeed Pahlevan Sharif, Navaz Naghavi, Hassam Waheed and Kizito Uyi Ehigiamusoe

This study aims to investigate whether gender predicts financial inclusion and whether education can fill the gender gap in financial inclusion when controlling for the effects of…

Abstract

Purpose

This study aims to investigate whether gender predicts financial inclusion and whether education can fill the gender gap in financial inclusion when controlling for the effects of supply side factors of financial inclusion in low-income economies.

Design/methodology/approach

This study aims to investigate whether gender predicts financial inclusion and whether education can fill the gender gap in financial inclusion when controlling for the effects of supply side factors of financial inclusion in low-income economies.

Findings

The findings provided support for the gender gap in financial inclusion using the most basic measure of financial inclusion. However, using formal savings and access to credit, the gender gap hypothesis is not supported. Moreover, the results revealed that education reduces the gender gap in the basic form of financial inclusion. However, this study could not find any significant difference between men and women's financial inclusion in terms of saving at a bank or borrowing from a bank though men tend to save more than women informally.

Originality/value

The current study contributes to the literature by examining the role of education in the relationship between gender gap and financial inclusion when controlling for the effects of heterogeneous infrastructure and the supply side factors of financial inclusion among the selected countries.

Details

International Journal of Emerging Markets, vol. 18 no. 12
Type: Research Article
ISSN: 1746-8809

Keywords

Book part
Publication date: 25 October 2023

Md Aminul Islam and Md Abu Sufian

This research navigates the confluence of data analytics, machine learning, and artificial intelligence to revolutionize the management of urban services in smart cities. The…

Abstract

This research navigates the confluence of data analytics, machine learning, and artificial intelligence to revolutionize the management of urban services in smart cities. The study thoroughly investigated with advanced tools to scrutinize key performance indicators integral to the functioning of smart cities, thereby enhancing leadership and decision-making strategies. Our work involves the implementation of various machine learning models such as Logistic Regression, Support Vector Machine, Decision Tree, Naive Bayes, and Artificial Neural Networks (ANN), to the data. Notably, the Support Vector Machine and Bernoulli Naive Bayes models exhibit robust performance with an accuracy rate of 70% precision score. In particular, the study underscores the employment of an ANN model on our existing dataset, optimized using the Adam optimizer. Although the model yields an overall accuracy of 61% and a precision score of 58%, implying correct predictions for the positive class 58% of the time, a comprehensive performance assessment using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) metrics was necessary. This evaluation results in a score of 0.475 at a threshold of 0.5, indicating that there's room for model enhancement. These models and their performance metrics serve as a key cog in our data analytics pipeline, providing decision-makers and city leaders with actionable insights that can steer urban service management decisions. Through real-time data availability and intuitive visualization dashboards, these leaders can promptly comprehend the current state of their services, pinpoint areas requiring improvement, and make informed decisions to bolster these services. This research illuminates the potential for data analytics, machine learning, and AI to significantly upgrade urban service management in smart cities, fostering sustainable and livable communities. Moreover, our findings contribute valuable knowledge to other cities aiming to adopt similar strategies, thus aiding the continued development of smart cities globally.

Details

Technology and Talent Strategies for Sustainable Smart Cities
Type: Book
ISBN: 978-1-83753-023-6

Keywords

Article
Publication date: 20 July 2023

Mu Shengdong, Liu Yunjie and Gu Jijian

By introducing Stacking algorithm to solve the underfitting problem caused by insufficient data in traditional machine learning, this paper provides a new solution to the cold…

Abstract

Purpose

By introducing Stacking algorithm to solve the underfitting problem caused by insufficient data in traditional machine learning, this paper provides a new solution to the cold start problem of entrepreneurial borrowing risk control.

Design/methodology/approach

The authors introduce semi-supervised learning and integrated learning into the field of migration learning, and innovatively propose the Stacking model migration learning, which can independently train models on entrepreneurial borrowing credit data, and then use the migration strategy itself as the learning object, and use the Stacking algorithm to combine the prediction results of the source domain model and the target domain model.

Findings

The effectiveness of the two migration learning models is evaluated with real data from an entrepreneurial borrowing. The algorithmic performance of the Stacking-based model migration learning is further improved compared to the benchmark model without migration learning techniques, with the model area under curve value rising to 0.8. Comparing the two migration learning models reveals that the model-based migration learning approach performs better. The reason for this is that the sample-based migration learning approach only eliminates the noisy samples that are relatively less similar to the entrepreneurial borrowing data. However, the calculation of similarity and the weighing of similarity are subjective, and there is no unified judgment standard and operation method, so there is no guarantee that the retained traditional credit samples have the same sample distribution and feature structure as the entrepreneurial borrowing data.

Practical implications

From a practical standpoint, on the one hand, it provides a new solution to the cold start problem of entrepreneurial borrowing risk control. The small number of labeled high-quality samples cannot support the learning and deployment of big data risk control models, which is the cold start problem of the entrepreneurial borrowing risk control system. By extending the training sample set with auxiliary domain data through suitable migration learning methods, the prediction performance of the model can be improved to a certain extent and more generalized laws can be learned.

Originality/value

This paper introduces the thought method of migration learning to the entrepreneurial borrowing scenario, provides a new solution to the cold start problem of the entrepreneurial borrowing risk control system and verifies the feasibility and effectiveness of the migration learning method applied in the risk control field through empirical data.

Details

Management Decision, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0025-1747

Keywords

Article
Publication date: 5 May 2023

Nguyen Thi Dinh, Nguyen Thi Uyen Nhi, Thanh Manh Le and Thanh The Van

The problem of image retrieval and image description exists in various fields. In this paper, a model of content-based image retrieval and image content extraction based on the…

Abstract

Purpose

The problem of image retrieval and image description exists in various fields. In this paper, a model of content-based image retrieval and image content extraction based on the KD-Tree structure was proposed.

Design/methodology/approach

A Random Forest structure was built to classify the objects on each image on the basis of the balanced multibranch KD-Tree structure. From that purpose, a KD-Tree structure was generated by the Random Forest to retrieve a set of similar images for an input image. A KD-Tree structure is applied to determine a relationship word at leaves to extract the relationship between objects on an input image. An input image content is described based on class names and relationships between objects.

Findings

A model of image retrieval and image content extraction was proposed based on the proposed theoretical basis; simultaneously, the experiment was built on multi-object image datasets including Microsoft COCO and Flickr with an average image retrieval precision of 0.9028 and 0.9163, respectively. The experimental results were compared with those of other works on the same image dataset to demonstrate the effectiveness of the proposed method.

Originality/value

A balanced multibranch KD-Tree structure was built to apply to relationship classification on the basis of the original KD-Tree structure. Then, KD-Tree Random Forest was built to improve the classifier performance and retrieve a set of similar images for an input image. Concurrently, the image content was described in the process of combining class names and relationships between objects.

Details

Data Technologies and Applications, vol. 57 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 28 September 2023

Moh. Riskiyadi

This study aims to compare machine learning models, datasets and splitting training-testing using data mining methods to detect financial statement fraud.

3566

Abstract

Purpose

This study aims to compare machine learning models, datasets and splitting training-testing using data mining methods to detect financial statement fraud.

Design/methodology/approach

This study uses a quantitative approach from secondary data on the financial reports of companies listed on the Indonesia Stock Exchange in the last ten years, from 2010 to 2019. Research variables use financial and non-financial variables. Indicators of financial statement fraud are determined based on notes or sanctions from regulators and financial statement restatements with special supervision.

Findings

The findings show that the Extremely Randomized Trees (ERT) model performs better than other machine learning models. The best original-sampling dataset compared to other dataset treatments. Training testing splitting 80:10 is the best compared to other training-testing splitting treatments. So the ERT model with an original-sampling dataset and 80:10 training-testing splitting are the most appropriate for detecting future financial statement fraud.

Practical implications

This study can be used by regulators, investors, stakeholders and financial crime experts to add insight into better methods of detecting financial statement fraud.

Originality/value

This study proposes a machine learning model that has not been discussed in previous studies and performs comparisons to obtain the best financial statement fraud detection results. Practitioners and academics can use findings for further research development.

Details

Asian Review of Accounting, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1321-7348

Keywords

Article
Publication date: 19 April 2024

Jitendra Gaur, Kumkum Bharti and Rahul Bajaj

Allocation of the marketing budget has become increasingly challenging due to the diverse channel exposure to customers. This study aims to enhance global marketing knowledge by…

Abstract

Purpose

Allocation of the marketing budget has become increasingly challenging due to the diverse channel exposure to customers. This study aims to enhance global marketing knowledge by introducing an ensemble attribution model to optimize marketing budget allocation for online marketing channels. As empirical research, this study demonstrates the supremacy of the ensemble model over standalone models.

Design/methodology/approach

The transactional data set for car insurance from an Indian insurance aggregator is used in this empirical study. The data set contains information from more than three million platform visitors. A robust ensemble model is created by combining results from two probabilistic models, namely, the Markov chain model and the Shapley value. These results are compared and validated with heuristic models. Also, the performances of online marketing channels and attribution models are evaluated based on the devices used (i.e. desktop vs mobile).

Findings

Channel importance charts for desktop and mobile devices are analyzed to understand the top contributing online marketing channels. Customer relationship management-emailers and Google cost per click a paid advertising is identified as the top two marketing channels for desktop and mobile channels. The research reveals that ensemble model accuracy is better than the standalone model, that is, the Markov chain model and the Shapley value.

Originality/value

To the best of the authors’ knowledge, the current research is the first of its kind to introduce ensemble modeling for solving attribution problems in online marketing. A comparison with heuristic models using different devices (desktop and mobile) offers insights into the results with heuristic models.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

1 – 10 of 117