Search results

1 – 10 of over 5000
Article
Publication date: 17 March 2023

Stewart Jones

This study updates the literature review of Jones (1987) published in this journal. The study pays particular attention to two important themes that have shaped the field over the…

Abstract

Purpose

This study updates the literature review of Jones (1987) published in this journal. The study pays particular attention to two important themes that have shaped the field over the past 35 years: (1) the development of a range of innovative new statistical learning methods, particularly advanced machine learning methods such as stochastic gradient boosting, adaptive boosting, random forests and deep learning, and (2) the emergence of a wide variety of bankruptcy predictor variables extending beyond traditional financial ratios, including market-based variables, earnings management proxies, auditor going concern opinions (GCOs) and corporate governance attributes. Several directions for future research are discussed.

Design/methodology/approach

This study provides a systematic review of the corporate failure literature over the past 35 years with a particular focus on the emergence of new statistical learning methodologies and predictor variables. This synthesis of the literature evaluates the strength and limitations of different modelling approaches under different circumstances and provides an overall evaluation the relative contribution of alternative predictor variables. The study aims to provide a transparent, reproducible and interpretable review of the literature. The literature review also takes a theme-centric rather than author-centric approach and focuses on structured themes that have dominated the literature since 1987.

Findings

There are several major findings of this study. First, advanced machine learning methods appear to have the most promise for future firm failure research. Not only do these methods predict significantly better than conventional models, but they also possess many appealing statistical properties. Second, there are now a much wider range of variables being used to model and predict firm failure. However, the literature needs to be interpreted with some caution given the many mixed findings. Finally, there are still a number of unresolved methodological issues arising from the Jones (1987) study that still requiring research attention.

Originality/value

The study explains the connections and derivations between a wide range of firm failure models, from simpler linear models to advanced machine learning methods such as gradient boosting, random forests, adaptive boosting and deep learning. The paper highlights the most promising models for future research, particularly in terms of their predictive power, underlying statistical properties and issues of practical implementation. The study also draws together an extensive literature on alternative predictor variables and provides insights into the role and behaviour of alternative predictor variables in firm failure research.

Details

Journal of Accounting Literature, vol. 45 no. 2
Type: Research Article
ISSN: 0737-4607

Keywords

Article
Publication date: 1 July 2024

Aamir Rashid, Rizwana Rasheed, Abdul Hafaz Ngah and Noor Aina Amirah

Recent disruptions have sparked concern about building a resilient and sustainable manufacturing supply chain. While artificial intelligence (AI) strengthens resilience, research…

Abstract

Purpose

Recent disruptions have sparked concern about building a resilient and sustainable manufacturing supply chain. While artificial intelligence (AI) strengthens resilience, research is needed to understand how cloud adoption can foster integration, collaboration, adaptation and sustainable manufacturing. Therefore, this study aimed to unleash the power of cloud adoption and AI in optimizing resilience and sustainable performance through collaboration and adaptive capabilities at manufacturing firms.

Design/methodology/approach

This research followed a deductive approach and employed a quantitative method with a survey technique to collect data from its target population. The study used stratified random sampling with a sample size of 1,279 participants working in diverse manufacturing industries across California, Texas and New York.

Findings

This research investigated how companies can make their manufacturing supply chains more resilient and sustainable. The findings revealed that integrating the manufacturing supply chains can foster collaboration and enhance adaptability, leading to better performance (hypotheses H1-H7, except H5). Additionally, utilizing artificial intelligence helps improve adaptability, further strengthening resilience and sustainability (H8-H11). Interestingly, the study found that internal integration alone does not significantly impact collaboration (H5). This suggests that external factors are more critical in fostering collaboration within the manufacturing supply chain during disruptions.

Originality/value

This study dives into the complex world of interconnected factors (formative constructs in higher order) influencing manufacturing supply chains. Using advanced modeling techniques, it highlights the powerful impact of cloud-based integration. Cloud-based integration and artificial intelligence unlock significant improvements for manufacturers and decision-makers by enabling information processes and dynamic capability theory.

Details

Journal of Manufacturing Technology Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1741-038X

Keywords

Article
Publication date: 14 November 2016

Shrawan Kumar Trivedi and Shubhamoy Dey

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with…

Abstract

Purpose

The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam.

Design/methodology/approach

For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers.

Findings

For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naïve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naïve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate.

Research limitations/implications

This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study.

Practical implications

This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate.

Originality/value

The proposed combined classifier is a novel classifier designed for accurate classification of email spam.

Details

VINE Journal of Information and Knowledge Management Systems, vol. 46 no. 4
Type: Research Article
ISSN: 2059-5891

Keywords

Article
Publication date: 7 November 2023

Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye and Oluwapelumi Oluwaseun Egunjobi

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning…

95

Abstract

Purpose

The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data.

Design/methodology/approach

For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model.

Findings

Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM2.5 concentration level than bagging and boosting ensemble models.

Research limitations/implications

A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast.

Practical implications

The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system

Originality/value

This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.

Details

Journal of Engineering, Design and Technology , vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1726-0531

Keywords

Article
Publication date: 19 August 2022

Ahed Habib and Umut Yildirim

Currently, many experimental studies on the properties and behavior of rubberized concrete are available in the literature. These findings have motivated scholars to propose…

Abstract

Purpose

Currently, many experimental studies on the properties and behavior of rubberized concrete are available in the literature. These findings have motivated scholars to propose models for estimating some properties of rubberized concrete using traditional and advanced techniques. However, with the advancement of computational techniques and new estimation models, selecting a model that best estimates concrete's property is becoming challenging.

Design/methodology/approach

In this study, over 1,000 different experimental findings were obtained from the literature and used to investigate the capabilities of ten different machine learning algorithms in modeling the hardened density, compressive, splitting tensile, and flexural strengths, static and dynamic moduli, and damping ratio of rubberized concrete through adopting three different prediction approaches with respect to the inputs of the model.

Findings

In general, the study's findings have shown that XGBoosting and FFBP models result in the best performances compared to other techniques.

Originality/value

Previous studies have focused on the compressive strength of rubberized concrete as the main parameter to be estimated and rarely went into other characteristics of the material. In this study, the capabilities of different machine learning algorithms in predicting the properties of rubberized concrete were investigated and compared. Additionally, most of the studies adopted the direct estimation approach in which the concrete constituent materials are used as inputs to the prediction model. In contrast, this study evaluates three different prediction approaches based on the input parameters used, referred to as direct, generalized, and nondestructive methods.

Details

Engineering Computations, vol. 39 no. 8
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 20 June 2022

Lokesh Singh, Rekh Ram Janghel and Satya Prakash Sahu

Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in…

Abstract

Purpose

Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in automated skin lesion analysis. The unavailability of adequate data poses difficulty in developing classification methods due to the skewed class distribution.

Design/methodology/approach

Boosting-based transfer learning (TL) paradigms like Transfer AdaBoost algorithm can compensate for such a lack of samples by taking advantage of auxiliary data. However, in such methods, beneficial source instances representing the target have a fast and stochastic weight convergence, which results in “weight-drift” that negates transfer. In this paper, a framework is designed utilizing the “Rare-Transfer” (RT), a boosting-based TL algorithm, that prevents “weight-drift” and simultaneously addresses absolute-rarity in skin lesion datasets. RT prevents the weights of source samples from quick convergence. It addresses absolute-rarity using an instance transfer approach incorporating the best-fit set of auxiliary examples, which improves balanced error minimization. It compensates for class unbalance and scarcity of training samples in absolute-rarity simultaneously for inducing balanced error optimization.

Findings

Promising results are obtained utilizing the RT compared with state-of-the-art techniques on absolute-rare skin lesion datasets with an accuracy of 92.5%. Wilcoxon signed-rank test examines significant differences amid the proposed RT algorithm and conventional algorithms used in the experiment.

Originality/value

Experimentation is performed on absolute-rare four skin lesion datasets, and the effectiveness of RT is assessed based on accuracy, sensitivity, specificity and area under curve. The performance is compared with an existing ensemble and boosting-based TL methods.

Details

Data Technologies and Applications, vol. 57 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 29 November 2021

Ziming Zeng, Tingting Li, Shouqiang Sun, Jingjing Sun and Jie Yin

Twitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective…

Abstract

Purpose

Twitter fake accounts refer to bot accounts created by third-party organizations to influence public opinion, commercial propaganda or impersonate others. The effective identification of bot accounts is conducive to accurately judge the disseminated information for the public. However, in actual fake account identification, it is expensive and inefficient to manually label Twitter accounts, and the labeled data are usually unbalanced in classes. To this end, the authors propose a novel framework to solve these problems.

Design/methodology/approach

In the proposed framework, the authors introduce the concept of semi-supervised self-training learning and apply it to the real Twitter account data set from Kaggle. Specifically, the authors first train the classifier in the initial small amount of labeled account data, then use the trained classifier to automatically label large-scale unlabeled account data. Next, iteratively select high confidence instances from unlabeled data to expand the labeled data. Finally, an expanded Twitter account training set is obtained. It is worth mentioning that the resampling technique is integrated into the self-training process, and the data class is balanced at the initial stage of the self-training iteration.

Findings

The proposed framework effectively improves labeling efficiency and reduces the influence of class imbalance. It shows excellent identification results on 6 different base classifiers, especially for the initial small-scale labeled Twitter accounts.

Originality/value

This paper provides novel insights in identifying Twitter fake accounts. First, the authors take the lead in introducing a self-training method to automatically label Twitter accounts from the semi-supervised background. Second, the resampling technique is integrated into the self-training process to effectively reduce the influence of class imbalance on the identification effect.

Details

Data Technologies and Applications, vol. 56 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 24 March 2006

Valeriy V. Gavrishchaka

Increasing availability of the financial data has opened new opportunities for quantitative modeling. It has also exposed limitations of the existing frameworks, such as low…

Abstract

Increasing availability of the financial data has opened new opportunities for quantitative modeling. It has also exposed limitations of the existing frameworks, such as low accuracy of the simplified analytical models and insufficient interpretability and stability of the adaptive data-driven algorithms. I make the case that boosting (a novel, ensemble learning technique) can serve as a simple and robust framework for combining the best features of the analytical and data-driven models. Boosting-based frameworks for typical financial and econometric applications are outlined. The implementation of a standard boosting procedure is illustrated in the context of the problem of symbolic volatility forecasting for IBM stock time series. It is shown that the boosted collection of the generalized autoregressive conditional heteroskedastic (GARCH)-type models is systematically more accurate than both the best single model in the collection and the widely used GARCH(1,1) model.

Details

Econometric Analysis of Financial and Economic Time Series
Type: Book
ISBN: 978-1-84950-388-4

Article
Publication date: 28 November 2023

Shiqin Zeng, Frederick Chung and Baabak Ashuri

Completing Right-of-Way (ROW) acquisition process on schedule is critical to avoid delays and cost overruns on transportation projects. However, transportation agencies face…

Abstract

Purpose

Completing Right-of-Way (ROW) acquisition process on schedule is critical to avoid delays and cost overruns on transportation projects. However, transportation agencies face challenges in accurately forecasting ROW acquisition timelines in the early stage of projects due to complex nature of acquisition process and limited design information. There is a need of improving accuracy of estimating ROW acquisition duration during the early phase of project development and quantitatively identifying risk factors affecting the duration.

Design/methodology/approach

The quantitative research methodology used to develop the forecasting model includes an ensemble algorithm based on decision tree and adaptive boosting techniques. A dataset of Georgia Department of Transportation projects held from 2010 to 2019 is utilized to demonstrate building the forecasting model. Furthermore, sensitivity analysis is performed to identify critical drivers of ROW acquisition durations.

Findings

The forecasting model developed in this research achieves a high accuracy to predict ROW durations by explaining 74% of the variance in ROW acquisition durations using project features, which is outperforming single regression tree, multiple linear regression and support vector machine. Moreover, number of parcels, average cost estimation per parcel, length of projects, number of condemnations, number of relocations and type of work are found to be influential factors as drivers of ROW acquisition duration.

Originality/value

This research contributes to the state of knowledge in estimating ROW acquisition timeline through (1) developing a novel machine learning model to accurately estimate ROW acquisition timelines, and (2) identifying drivers (i.e. risk factors) of ROW acquisition durations. The findings of this research will provide transportation agencies with insights on how to improve practices in scheduling ROW acquisition process.

Details

Built Environment Project and Asset Management, vol. 14 no. 2
Type: Research Article
ISSN: 2044-124X

Keywords

Article
Publication date: 8 June 2021

Jyoti Godara, Rajni Aron and Mohammad Shabaz

Sentiment analysis has observed a nascent interest over the past decade in the field of social media analytics. With major advances in the volume, rationality and veracity of…

Abstract

Purpose

Sentiment analysis has observed a nascent interest over the past decade in the field of social media analytics. With major advances in the volume, rationality and veracity of social networking data, the misunderstanding, uncertainty and inaccuracy within the data have multiplied. In the textual data, the location of sarcasm is a challenging task. It is a different way of expressing sentiments, in which people write or says something different than what they actually intended to. So, the researchers are showing interest to develop various techniques for the detection of sarcasm in the texts to boost the performance of sentiment analysis. This paper aims to overview the sentiment analysis, sarcasm and related work for sarcasm detection. Further, this paper provides training to health-care professionals to make the decision on the patient’s sentiments.

Design/methodology/approach

This paper has compared the performance of five different classifiers – support vector machine, naïve Bayes classifier, decision tree classifier, AdaBoost classifier and K-nearest neighbour on the Twitter data set.

Findings

This paper has observed that naïve Bayes has performed the best having the highest accuracy of 61.18%, and decision tree performed the worst with an accuracy of 54.27%. Accuracy of AdaBoost, K-nearest neighbour and support vector machine measured were 56.13%, 54.81% and 59.55%, respectively.

Originality/value

This research work is original.

Details

World Journal of Engineering, vol. 19 no. 1
Type: Research Article
ISSN: 1708-5284

Keywords

1 – 10 of over 5000