Search results

1 – 10 of 537
Book part
Publication date: 29 September 2023

Torben Juul Andersen

This chapter first analyzes how the data-cleaning process affects the share of missing values in the extracted European and North American datasets. It then moves on to examine…

Abstract

This chapter first analyzes how the data-cleaning process affects the share of missing values in the extracted European and North American datasets. It then moves on to examine how three different approaches to treat the issue of missing values, Complete Case, Multiple Imputation Chained Equations (MICE), and K-Nearest Neighbor (KNN) imputations affect the number of firms and their average lifespan in the datasets compared to the original sample and assessed across different SIC industry divisions. This is extended to consider implied effects on the distribution of a key performance indicator, return on assets (ROA), calculating skewness and kurtosis measures for each of the treatment methods and across industry contexts. This consistently shows highly negatively skewed distributions with high positive excess kurtosis across all the industries where the KNN imputation treatment creates results with distribution characteristics that are closest to the original untreated data. We further analyze the persistency of the (extreme) left-skewed tails measured in terms of the share of outliers and extreme outliers, which shows consistent and rather high percentages of outliers around 15% of the full sample and extreme outliers around 7.5% indicating pervasive skewness in the data. Of the three alternative approaches to deal with missing values, the KNN imputation treatment is found to be the method that generates final datasets that most closely resemble the original data even though the Complete Case approach remains the norm in mainstream studies. One consequence of this is that most empirical studies are likely to underestimate the prevalence of extreme negative performance outcomes.

Details

A Study of Risky Business Outcomes: Adapting to Strategic Disruption
Type: Book
ISBN: 978-1-83797-074-2

Keywords

Article
Publication date: 28 February 2023

Mohamed Lachaab and Abdelwahed Omri

The goal of this study is to investigate the predictive performance of the machine and deep learning methods in predicting the CAC 40 index and its 40 constituent prices of the…

268

Abstract

Purpose

The goal of this study is to investigate the predictive performance of the machine and deep learning methods in predicting the CAC 40 index and its 40 constituent prices of the French stock market during the COVID-19 pandemic. The study objective in forecasting the CAC 40 index is to analyze if the index and the individual prices will preserve the continuous increase they acquired at the beginning of the administration of vaccination and containment measures or if the negative effect of the pandemic will be reflected in the future.

Design/methodology/approach

The authors apply two machine and deep learning methods (KNN and LSTM) and compare their performances to ARIMA time series model. Two scenarios have been considered: optimistic (high values) and pessimistic (low values) and four periods are examined: the period before COVID-19 pandemic, the period during the COVID-19, and the period of vaccination and containment. The last period is divided into two sub-periods: the test period and the prediction period.

Findings

The authors found that the KNN method performed better than LSTM and ARIMA in forecasting the CAC 40 index for both scenarios. The authors also identified that the positive effect of vaccination and containment outweighs the negative effect of the pandemic, and the recovery pattern is not even among major companies in the stock market.

Practical implications

The study empirical results have valuable practical implications for companies in the stock market to respond to unexpected events such as COVID-19, improve operational efficiency and enhance long-term competitiveness. Companies in the transportation sector should consider additional investment in R&D on communication and information technology, accelerate their digital capabilities, at least in some parts of their businesses, develop plans for lights out factories and supply chains to keep pace with changing times, and even include big data resources. Additionally, they should also use a mix of financing sources and securities in order to diversify their capital structure, and not rely only on equity financing as their share prices are volatile and below the pre-pandemic level. Considering portfolio allocation, the transportation sector was severely affected by the pandemic. This displays that transportation equities fail to be a candidate as a good diversifier during the health crisis. However, the diversification would be worth it while including assets related to the banking and industrial sectors. On another strand, the instability of this period induced an informational asymmetry among investors. This pessimistic mood affected the assets' value and created a state of disequilibrium opening up more opportunities to benefit from potential arbitrage profits.

Originality/value

The impact of COVID-19 on stock markets is significant and affects investor behavior, who suffered amplified losses in a very short period of time. In this regard, correct and well-informed decision-making by investors and other market participants requires careful analysis and accurate prediction of the stock markets during the pandemic. However, few studies have been conducted in this area, and those studies have either concentrated on some specific stock markets or did not apply the powerful machine learning and deep learning techniques such as LSTM and KNN. To the best of our knowledge, no research has been conducted that used these techniques to assess and forecast the CAC 40 French stock market during the pandemic. This study tries to close this gap in the literature.

Details

EuroMed Journal of Business, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1450-2194

Keywords

Open Access
Article
Publication date: 1 April 2021

Arunit Maity, P. Prakasam and Sarthak Bhargava

Due to the continuous and rapid evolution of telecommunication equipment, the demand for more efficient and noise-robust detection of dual-tone multi-frequency (DTMF) signals is…

1296

Abstract

Purpose

Due to the continuous and rapid evolution of telecommunication equipment, the demand for more efficient and noise-robust detection of dual-tone multi-frequency (DTMF) signals is most significant.

Design/methodology/approach

A novel machine learning-based approach to detect DTMF tones affected by noise, frequency and time variations by employing the k-nearest neighbour (KNN) algorithm is proposed. The features required for training the proposed KNN classifier are extracted using Goertzel's algorithm that estimates the absolute discrete Fourier transform (DFT) coefficient values for the fundamental DTMF frequencies with or without considering their second harmonic frequencies. The proposed KNN classifier model is configured in four different manners which differ in being trained with or without augmented data, as well as, with or without the inclusion of second harmonic frequency DFT coefficient values as features.

Findings

It is found that the model which is trained using the augmented data set and additionally includes the absolute DFT values of the second harmonic frequency values for the eight fundamental DTMF frequencies as the features, achieved the best performance with a macro classification F1 score of 0.980835, a five-fold stratified cross-validation accuracy of 98.47% and test data set detection accuracy of 98.1053%.

Originality/value

The generated DTMF signal has been classified and detected using the proposed KNN classifier which utilizes the DFT coefficient along with second harmonic frequencies for better classification. Additionally, the proposed KNN classifier has been compared with existing models to ascertain its superiority and proclaim its state-of-the-art performance.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 17 April 2024

Jahanzaib Alvi and Imtiaz Arif

The crux of this paper is to unveil efficient features and practical tools that can predict credit default.

Abstract

Purpose

The crux of this paper is to unveil efficient features and practical tools that can predict credit default.

Design/methodology/approach

Annual data of non-financial listed companies were taken from 2000 to 2020, along with 71 financial ratios. The dataset was bifurcated into three panels with three default assumptions. Logistic regression (LR) and k-nearest neighbor (KNN) binary classification algorithms were used to estimate credit default in this research.

Findings

The study’s findings revealed that features used in Model 3 (Case 3) were the efficient and best features comparatively. Results also showcased that KNN exposed higher accuracy than LR, which proves the supremacy of KNN on LR.

Research limitations/implications

Using only two classifiers limits this research for a comprehensive comparison of results; this research was based on only financial data, which exhibits a sizeable room for including non-financial parameters in default estimation. Both limitations may be a direction for future research in this domain.

Originality/value

This study introduces efficient features and tools for credit default prediction using financial data, demonstrating KNN’s superior accuracy over LR and suggesting future research directions.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 26 December 2023

Farshad Peiman, Mohammad Khalilzadeh, Nasser Shahsavari-Pour and Mehdi Ravanshadnia

Earned value management (EVM)–based models for estimating project actual duration (AD) and cost at completion using various methods are continuously developed to improve the…

Abstract

Purpose

Earned value management (EVM)–based models for estimating project actual duration (AD) and cost at completion using various methods are continuously developed to improve the accuracy and actualization of predicted values. This study primarily aimed to examine natural gradient boosting (NGBoost-2020) with the classification and regression trees (CART) base model (base learner). To the best of the authors' knowledge, this concept has never been applied to EVM AD forecasting problem. Consequently, the authors compared this method to the single K-nearest neighbor (KNN) method, the ensemble method of extreme gradient boosting (XGBoost-2016) with the CART base model and the optimal equation of EVM, the earned schedule (ES) equation with the performance factor equal to 1 (ES1). The paper also sought to determine the extent to which the World Bank's two legal factors affect countries and how the two legal causes of delay (related to institutional flaws) influence AD prediction models.

Design/methodology/approach

In this paper, data from 30 construction projects of various building types in Iran, Pakistan, India, Turkey, Malaysia and Nigeria (due to the high number of delayed projects and the detrimental effects of these delays in these countries) were used to develop three models. The target variable of the models was a dimensionless output, the ratio of estimated duration to completion (ETC(t)) to planned duration (PD). Furthermore, 426 tracking periods were used to build the three models, with 353 samples and 23 projects in the training set, 73 patterns (17% of the total) and six projects (21% of the total) in the testing set. Furthermore, 17 dimensionless input variables were used, including ten variables based on the main variables and performance indices of EVM and several other variables detailed in the study. The three models were subsequently created using Python and several GitHub-hosted codes.

Findings

For the testing set of the optimal model (NGBoost), the better percentage mean (better%) of the prediction error (based on projects with a lower error percentage) of the NGBoost compared to two KNN and ES1 single models, as well as the total mean absolute percentage error (MAPE) and mean lags (MeLa) (indicating model stability) were 100, 83.33, 5.62 and 3.17%, respectively. Notably, the total MAPE and MeLa for the NGBoost model testing set, which had ten EVM-based input variables, were 6.74 and 5.20%, respectively. The ensemble artificial intelligence (AI) models exhibited a much lower MAPE than ES1. Additionally, ES1 was less stable in prediction than NGBoost. The possibility of excessive and unusual MAPE and MeLa values occurred only in the two single models. However, on some data sets, ES1 outperformed AI models. NGBoost also outperformed other models, especially single models for most developing countries, and was more accurate than previously presented optimized models. In addition, sensitivity analysis was conducted on the NGBoost predicted outputs of 30 projects using the SHapley Additive exPlanations (SHAP) method. All variables demonstrated an effect on ETC(t)/PD. The results revealed that the most influential input variables in order of importance were actual time (AT) to PD, regulatory quality (RQ), earned duration (ED) to PD, schedule cost index (SCI), planned complete percentage, rule of law (RL), actual complete percentage (ACP) and ETC(t) of the ES optimal equation to PD. The probabilistic hybrid model was selected based on the outputs predicted by the NGBoost and XGBoost models and the MAPE values from three AI models. The 95% prediction interval of the NGBoost–XGBoost model revealed that 96.10 and 98.60% of the actual output values of the testing and training sets are within this interval, respectively.

Research limitations/implications

Due to the use of projects performed in different countries, it was not possible to distribute the questionnaire to the managers and stakeholders of 30 projects in six developing countries. Due to the low number of EVM-based projects in various references, it was unfeasible to utilize other types of projects. Future prospects include evaluating the accuracy and stability of NGBoost for timely and non-fluctuating projects (mostly in developed countries), considering a greater number of legal/institutional variables as input, using legal/institutional/internal/inflation inputs for complex projects with extremely high uncertainty (such as bridge and road construction) and integrating these inputs and NGBoost with new technologies (such as blockchain, radio frequency identification (RFID) systems, building information modeling (BIM) and Internet of things (IoT)).

Practical implications

The legal/intuitive recommendations made to governments are strict control of prices, adequate supervision, removal of additional rules, removal of unfair regulations, clarification of the future trend of a law change, strict monitoring of property rights, simplification of the processes for obtaining permits and elimination of unnecessary changes particularly in developing countries and at the onset of irregular projects with limited information and numerous uncertainties. Furthermore, the managers and stakeholders of this group of projects were informed of the significance of seven construction variables (institutional/legal external risks, internal factors and inflation) at an early stage, using time series (dynamic) models to predict AD, accurate calculation of progress percentage variables, the effectiveness of building type in non-residential projects, regular updating inflation during implementation, effectiveness of employer type in the early stage of public projects in addition to the late stage of private projects, and allocating reserve duration (buffer) in order to respond to institutional/legal risks.

Originality/value

Ensemble methods were optimized in 70% of references. To the authors' knowledge, NGBoost from the set of ensemble methods was not used to estimate construction project duration and delays. NGBoost is an effective method for considering uncertainties in irregular projects and is often implemented in developing countries. Furthermore, AD estimation models do fail to incorporate RQ and RL from the World Bank's worldwide governance indicators (WGI) as risk-based inputs. In addition, the various WGI, EVM and inflation variables are not combined with substantial degrees of delay institutional risks as inputs. Consequently, due to the existence of critical and complex risks in different countries, it is vital to consider legal and institutional factors. This is especially recommended if an in-depth, accurate and reality-based method like SHAP is used for analysis.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Open Access
Article
Publication date: 17 May 2022

M'hamed Bilal Abidine, Mourad Oussalah, Belkacem Fergani and Hakim Lounis

Mobile phone-based human activity recognition (HAR) consists of inferring user’s activity type from the analysis of the inertial mobile sensor data. This paper aims to mainly…

Abstract

Purpose

Mobile phone-based human activity recognition (HAR) consists of inferring user’s activity type from the analysis of the inertial mobile sensor data. This paper aims to mainly introduce a new classification approach called adaptive k-nearest neighbors (AKNN) for intelligent HAR using smartphone inertial sensors with a potential real-time implementation on smartphone platform.

Design/methodology/approach

The proposed method puts forward several modification on AKNN baseline by using kernel discriminant analysis for feature reduction and hybridizing weighted support vector machines and KNN to tackle imbalanced class data set.

Findings

Extensive experiments on a five large scale daily activity recognition data set have been performed to demonstrate the effectiveness of the method in terms of error rate, recall, precision, F1-score and computational/memory resources, with several comparison with state-of-the art methods and other hybridization modes. The results showed that the proposed method can achieve more than 50% improvement in error rate metric and up to 5.6% in F1-score. The training phase is also shown to be reduced by a factor of six compared to baseline, which provides solid assets for smartphone implementation.

Practical implications

This work builds a bridge to already growing work in machine learning related to learning with small data set. Besides, the availability of systems that are able to perform on flight activity recognition on smartphone will have a significant impact in the field of pervasive health care, supporting a variety of practical applications such as elderly care, ambient assisted living and remote monitoring.

Originality/value

The purpose of this study is to build and test an accurate offline model by using only a compact training data that can reduce the computational and memory complexity of the system. This provides grounds for developing new innovative hybridization modes in the context of daily activity recognition and smartphone-based implementation. This study demonstrates that the new AKNN is able to classify the data without any training step because it does not use any model for fitting and only uses memory resources to store the corresponding support vectors.

Details

Sensor Review, vol. 42 no. 4
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 29 April 2021

Omobolanle Ruth Ogunseiju, Johnson Olayiwola, Abiola Abosede Akanmu and Chukwuma Nnaji

Construction action recognition is essential to efficiently manage productivity, health and safety risks. These can be achieved by tracking and monitoring construction work. This…

Abstract

Purpose

Construction action recognition is essential to efficiently manage productivity, health and safety risks. These can be achieved by tracking and monitoring construction work. This study aims to examine the performance of a variant of deep convolutional neural networks (CNNs) for recognizing actions of construction workers from images of signals of time-series data.

Design/methodology/approach

This paper adopts Inception v1 to classify actions involved in carpentry and painting activities from images of motion data. Augmented time-series data from wearable sensors attached to worker's lower arms are converted to signal images to train an Inception v1 network. Performance of Inception v1 is compared with the highest performing supervised learning classifier, k-nearest neighbor (KNN).

Findings

Results show that the performance of Inception v1 network improved when trained with signal images of the augmented data but at a high computational cost. Inception v1 network and KNN achieved an accuracy of 95.2% and 99.8%, respectively when trained with 50-fold augmented carpentry dataset. The accuracy of Inception v1 and KNN with 10-fold painting augmented dataset is 95.3% and 97.1%, respectively.

Research limitations/implications

Only acceleration data of the lower arm of the two trades were used for action recognition. Each signal image comprises 20 datasets.

Originality/value

Little has been reported on recognizing construction workers' actions from signal images. This study adds value to the existing literature, in particular by providing insights into the extent to which a deep CNN can classify subtasks from patterns in signal images compared to a traditional best performing shallow network.

Article
Publication date: 5 December 2017

Rabeb Faleh, Sami Gomri, Mehdi Othman, Khalifa Aguir and Abdennaceur Kachouri

In this paper, a novel hybrid approach aimed at solving the problem of cross-selectivity of gases in electronic nose (E-nose) using the combination classifiers of support vector…

Abstract

Purpose

In this paper, a novel hybrid approach aimed at solving the problem of cross-selectivity of gases in electronic nose (E-nose) using the combination classifiers of support vector machine (SVM) and k-nearest neighbors (KNN) methods was proposed.

Design/methodology/approach

First, three WO3 sensors E-nose system was used for data acquisition to detect three gases, namely, ozone, ethanol and acetone. Then, two transient parameters, derivate and integral, were extracted for each gas response. Next, the principal component analysis (PCA) was been applied to extract the most relevant sensor data and dimensionality reduction. The new coordinates calculated by PCA were used as inputs for classification by the SVM method. Finally, the classification achieved by the KNN method was carried out to calculate only the support vectors (SVs), not all the data.

Findings

This work has proved that the proposed fusion method led to the highest classification rate (100 per cent) compared to the accuracy of the individual classifiers: KNN, SVM-linear, SVM-RBF, SVM-polynomial that present, respectively, 89, 75.2, 80 and 79.9 per cent as classification rate.

Originality/value

The authors propose a fusion classifier approach to improve the classification rate. In this method, the extracted features are projected into the PCA subspace to reduce the dimensionality. Then, the obtained principal components are introduced to the SVM classifier and calculated SVs which will be used in the KNN method.

Details

Sensor Review, vol. 38 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 1 August 2005

Songbo Tan

With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes…

Abstract

Purpose

With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes the use of binary k‐nearest neighbour (BKNN) for text categorization.

Design/methodology/approach

The paper describes the traditional k‐nearest neighbor (KNN) classifier, introduces BKNN and outlines experiemental results.

Findings

The experimental results indicate that BKNN requires much less CPU time than KNN, without loss of classification performance.

Originality/value

The paper demonstrates how BKNN can be an efficient and effective algorithm for text categorization. Proposes the use of binary k‐nearest neighbor (BKNN ) for text categorization.

Details

Online Information Review, vol. 29 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Book part
Publication date: 18 July 2022

Yakub Kayode Saheed, Usman Ahmad Baba and Mustafa Ayobami Raji

Purpose: This chapter aims to examine machine learning (ML) models for predicting credit card fraud (CCF).Need for the study: With the advance of technology, the world is…

Abstract

Purpose: This chapter aims to examine machine learning (ML) models for predicting credit card fraud (CCF).

Need for the study: With the advance of technology, the world is increasingly relying on credit cards rather than cash in daily life. This creates a slew of new opportunities for fraudulent individuals to abuse these cards. As of December 2020, global card losses reached $28.65billion, up 2.9% from $27.85 billion in 2018, according to the Nilson 2019 research. To safeguard the safety of credit card users, the credit card issuer should include a service that protects customers from potential risks. CCF has become a severe threat as internet buying has grown. To this goal, various studies in the field of automatic and real-time fraud detection are required. Due to their advantageous properties, the most recent ones employ a variety of ML algorithms and techniques to construct a well-fitting model to detect fraudulent transactions. When it comes to recognising credit card risk is huge and high-dimensional data, feature selection (FS) is critical for improving classification accuracy and fraud detection.

Methodology/design/approach: The objectives of this chapter are to construct a new model for credit card fraud detection (CCFD) based on principal component analysis (PCA) for FS and using supervised ML techniques such as K-nearest neighbour (KNN), ridge classifier, gradient boosting, quadratic discriminant analysis, AdaBoost, and random forest for classification of fraudulent and legitimate transactions. When compared to earlier experiments, the suggested approach demonstrates a high capacity for detecting fraudulent transactions. To be more precise, our model’s resilience is constructed by integrating the power of PCA for determining the most useful predictive features. The experimental analysis was performed on German credit card and Taiwan credit card data sets.

Findings: The experimental findings revealed that the KNN achieved an accuracy of 96.29%, recall of 100%, and precision of 96.29%, which is the best performing model on the German data set. While the ridge classifier was the best performing model on Taiwan Credit data with an accuracy of 81.75%, recall of 34.89, and precision of 66.61%.

Practical implications: The poor performance of the models on the Taiwan data revealed that it is an imbalanced credit card data set. The comparison of our proposed models with state-of-the-art credit card ML models showed that our results were competitive.

1 – 10 of 537