Search results

1 – 10 of 48

Abstract

Details

Machine Learning and Artificial Intelligence in Marketing and Sales
Type: Book
ISBN: 978-1-80043-881-1

Article
Publication date: 19 August 2022

Anjali More and Dipti Rana

Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of…

Abstract

Purpose

Referred data set produces reliable information about the network flows and common attacks meeting with real-world criteria. Accordingly, this study aims to focus on the use of imbalanced intrusion detection benchmark knowledge discovery in database (KDD) data set. KDD data set is most preferably used by many researchers for experimentation and analysis. The proposed algorithm improvised random forest classification with error tuning factors (IRFCETF) deals with experimentation on KDD data set and evaluates the performance of a complete set of network traffic features through IRFCETF.

Design/methodology/approach

In the current era of applications, the attention of researchers is immersed by a diverse number of existing time applications that deals with imbalanced data classification (ImDC). Real-time application areas, artificial intelligence (AI), Industrial Internet of Things (IIoT), etc. are dealing ImDC undergo with diverted classification performance due to skewed data distribution (SkDD). There are numerous application areas that deal with SkDD. Many of the data applications in AI and IIoT face the diverted data classification rate in SkDD. In recent advancements, there is an exponential expansion in the volume of computer network data and related application developments. Intrusion detection is one of the demanding applications of ImDC. The proposed study focusses on imbalanced intrusion benchmark data set, KDD data set and other benchmark data set with the proposed IRFCETF approach. IRFCETF justifies the enriched classification performance on imbalanced data set over the existing approach. The purpose of this work is to review imbalanced data applications in numerous application areas including AI and IIoT and tuning the performance with respect to principal component analysis. This study also focusses on the out-of-bag error performance-tuning factor.

Findings

Experimental results on KDD data set shows that proposed algorithm gives enriched performance. For referred intrusion detection data set, IRFCETF classification accuracy is 99.57% and error rate is 0.43%.

Research limitations/implications

This research work extended for further improvements in classification techniques with multiple correspondence analysis (MCA); hierarchical MCA can be focussed with the use of classification models for wide range of skewed data sets.

Practical implications

The metrics enhancement is measurable and helpful in dealing with intrusion detection systems–related imbalanced applications in current application domains such as security, AI and IIoT digitization. Analytical results show improvised metrics of the proposed approach than other traditional machine learning algorithms. Thus, error-tuning parameter creates a measurable impact on classification accuracy is justified with the proposed IRFCETF.

Social implications

Proposed algorithm is useful in numerous IIoT applications such as health care, machinery automation etc.

Originality/value

This research work addressed classification metric enhancement approach IRFCETF. The proposed method yields a test set categorization for each case with error reduction mechanism.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 9 September 2022

Lucie Maruejols, Hanjie Wang, Qiran Zhao, Yunli Bai and Linxiu Zhang

Despite rising incomes and reduction of extreme poverty, the feeling of being poor remains widespread. Support programs can improve well-being, but they first require identifying…

Abstract

Purpose

Despite rising incomes and reduction of extreme poverty, the feeling of being poor remains widespread. Support programs can improve well-being, but they first require identifying who are the households that judge their income is insufficient to meet their basic needs, and what factors are associated with subjective poverty.

Design/methodology/approach

Households report the income level they judge is sufficient to make ends meet. Then, they are classified as being subjectively poor if their own monetary income is inferior to the level they indicated. Second, the study compares the performance of three machine learning algorithms, the random forest, support vector machines and least absolute shrinkage and selection operator (LASSO) regression, applied to a set of socioeconomic variables to predict subjective poverty status.

Findings

The random forest generates 85.29% of correct predictions using a range of income and non-income predictors, closely followed by the other two techniques. For the middle-income group, the LASSO regression outperforms random forest. Subjective poverty is mostly associated with monetary income for low-income households. However, a combination of low income, low endowment (land, consumption assets) and unusual large expenditure (medical, gifts) constitutes the key predictors of feeling poor for the middle-income households.

Practical implications

To reduce the feeling of poverty, policy intervention should continue to focus on increasing incomes. However, improvements in nonincome domains such as health expenditure, education and family demographics can also relieve the feeling of income inadequacy. Methodologically, better performance of either algorithm depends on the data at hand.

Originality/value

For the first time, the authors show that prediction techniques are reliable to identify subjective poverty prevalence, with example from rural China. The analysis offers specific attention to the modest-income households, who may feel poor but not be identified as such by objective poverty lines, and is relevant when policy-makers seek to address the “next step” after ending extreme poverty. Prediction performance and mechanisms for three machine learning algorithms are compared.

Details

China Agricultural Economic Review, vol. 15 no. 2
Type: Research Article
ISSN: 1756-137X

Keywords

Article
Publication date: 26 February 2024

Chong Wu, Xiaofang Chen and Yongjie Jiang

While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of…

Abstract

Purpose

While the Chinese securities market is booming, the phenomenon of listed companies falling into financial distress is also emerging, which affects the operation and development of enterprises and also jeopardizes the interests of investors. Therefore, it is important to understand how to accurately and reasonably predict the financial distress of enterprises.

Design/methodology/approach

In the present study, ensemble feature selection (EFS) and improved stacking were used for financial distress prediction (FDP). Mutual information, analysis of variance (ANOVA), random forest (RF), genetic algorithms, and recursive feature elimination (RFE) were chosen for EFS to select features. Since there may be missing information when feeding the results of the base learner directly into the meta-learner, the features with high importance were fed into the meta-learner together. A screening layer was added to select the meta-learner with better performance. Finally, Optima hyperparameters were used for parameter tuning by the learners.

Findings

An empirical study was conducted with a sample of A-share listed companies in China. The F1-score of the model constructed using the features screened by EFS reached 84.55%, representing an improvement of 4.37% compared to the original features. To verify the effectiveness of improved stacking, benchmark model comparison experiments were conducted. Compared to the original stacking model, the accuracy of the improved stacking model was improved by 0.44%, and the F1-score was improved by 0.51%. In addition, the improved stacking model had the highest area under the curve (AUC) value (0.905) among all the compared models.

Originality/value

Compared to previous models, the proposed FDP model has better performance, thus bridging the research gap of feature selection. The present study provides new ideas for stacking improvement research and a reference for subsequent research in this field.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 7 November 2016

Hsu-Che Wu and Yu-Ting Wu

An increasing number of investors have begun using financial data to develop optimal investment portfolios; therefore, the public financial data shared in the capital market plays…

Abstract

Purpose

An increasing number of investors have begun using financial data to develop optimal investment portfolios; therefore, the public financial data shared in the capital market plays a critical role in credit ratings. These data enable investors to understand the credit levels of debtors from a bank perspective; this facilitates predicting the debtor default rate to efficiently evaluate investment risks. The paper aims to discuss these issues.

Design/methodology/approach

A credit rating model can be developed to reduce the risk of adverse selection and moral hazard caused by information asymmetry in the loan market. In this study, a random forest (RF) was used to evaluate financial variables and construct credit rating prediction models. Data-mining techniques, including an RF, decision tree, neural networks, and support vector machine, were used to search for suitable credit rating forecasting methods. The distance to default from the KMV model was then incorporated into the credit rating model as a research variable to increase predictive power of various data-mining techniques. In addition, four-level and nine-level classification were set to investigate the accuracy rates of various models.

Findings

The experimental results indicated that applying the RF in the variable feature selection process and developing a forecasting model was the most effective method of predicting credit ratings; the four-level and nine-level feature-selection settings achieved 95.5 and 87.8 percent accuracy rates, respectively, indicating that RF demonstrated outstanding feature selection and forecasting capacity.

Research limitations/implications

The experimental cases were based on financial data from public companies in North America.

Practical implications

Practical implication of this study indicates the most effective financial variables were dividends common/ordinary, cash dividends, volatility assumption, and risk-free rate assumption.

Originality/value

The RF model can be used to perform feature selection and efficiently filter numerous financial variables to obtain crediting rating information instantly.

Details

Kybernetes, vol. 45 no. 10
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 20 December 2022

Ganisha N.P. Athaudage, H. Niles Perera, P.T. Ranil S. Sugathadasa, M. Mavin De Silva and Oshadhi K. Herath

The crude oil supply chain (COSC) is one of the most complex and largest supply chains in the world. It is easily vulnerable to extreme events. Recently, the severe acute…

Abstract

Purpose

The crude oil supply chain (COSC) is one of the most complex and largest supply chains in the world. It is easily vulnerable to extreme events. Recently, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (often known as COVID-19) pandemic created a massive imbalance between supply and demand which caused significant price fluctuations. The purpose of this study is to explore the influential factors affecting the international COSC in terms of consumption, production and price. Furthermore, it develops a model to predict the international crude oil price during disease outbreaks using Random Forest (RF) regression.

Design/methodology/approach

This study uses both qualitative and quantitative approaches. A qualitative study is conducted using a literature review to explore the influential factors on COSC. All the data are extracted from Web sources. In addition to COVID-19, four other diseases are considered to optimize the accuracy of predictive results. A principal component analysis is deployed to reduce the number of variables. A forecasting model is developed using RF regression.

Findings

The findings of the qualitative analysis characterize the factors that influence international COSC. The findings of quantitative analysis emphasize that production and consumption have a higher contribution to the variance of the data set. Also, this study found that the impact caused to crude oil price varies with the region. Most importantly, the model introduced using the RF technique provides a high predictive ability in short horizons such as infectious diseases. This study delivers future directions and insights to researchers and practitioners to expand the study further.

Originality/value

This is one of the few available pieces of research which uses the RF method in the context of crude oil price forecasting. Additionally, this study examines international COSC in the events of emergencies, specifically disease outbreaks using machine learning techniques.

Details

International Journal of Energy Sector Management, vol. 17 no. 6
Type: Research Article
ISSN: 1750-6220

Keywords

Article
Publication date: 4 July 2016

Stanislaw Osowski, Krzysztof Siwek and Tomasz Grzywacz

The paper is concerned with exploration of sensor signals in differential electronic nose. It is a special type of nose, which applies double sensor matrices and exploits only…

Abstract

Purpose

The paper is concerned with exploration of sensor signals in differential electronic nose. It is a special type of nose, which applies double sensor matrices and exploits only their differential signals, which are used in recognition of patterns associated with them. The purpose of this paper is to study the application of differential nose in dynamic measurement of aroma of 11 brands of cigarettes.

Design/methodology/approach

The most important task in pattern recognition using electronic nose is its resistance to the noise corrupting the measurement. The authors will analyze and compare the performance of the nose in the noisy environment by applying two classifier systems: the support vector machine (SVM) and random forest (RF) of decision trees.

Findings

On the basis of numerical experiments the authors have found that application of SVM as the classifier in the electronic nose is more advantageous than RF, especially at high level of noise and small number of measuring sensors. Its application allowed to recognize 11 brands of cigarettes with the accuracy close to 100 percent.

Practical implications

Thanks to application of two identical sensors working in a differential mode the authors avoid the baseline estimation and thus the solution is well suited for on-line dynamic measurements of the process.

Originality/value

The paper has studied the advantages and limitations of the differential electronic nose following from the existence of the noise, corrupting the measurements. It has pointed an important role of the applied classifier system in getting the electronic nose of the highest quality.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 35 no. 4
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 3 November 2021

Irfan Haider Shakri

The purpose of this study is to compare five data-driven-based ML techniques to predict the time series data of Bitcoin returns, namely, alternating model tree, random forest…

Abstract

Purpose

The purpose of this study is to compare five data-driven-based ML techniques to predict the time series data of Bitcoin returns, namely, alternating model tree, random forest (RF), multiple linear regression, multi-layer perceptron regression and M5 Tree algorithms.

Design/methodology/approach

The data used to forecast time series data of Bitcoin returns ranges from 8 July 2010 to 30 Aug 2020. This study used several predictors to predict bitcoin returns including economic policy uncertainty, equity market volatility index, S&P returns, USD/EURO exchange rates, oil and gold prices, volatilities and returns. Five statistical indexes, namely, correlation coefficient, mean absolute error, root mean square error, relative absolute error and root relative squared error are determined. The results of these metrices are used to develop colour intensity ranking.

Findings

Among the machine learning (ML) techniques used in this study, RF models has shown superior predictive ability for estimating the Bitcoin returns.

Originality/value

This study is first of its kind to use and compare ML models in the prediction of Bitcoins. More studies can be carried out by using further cryptocurrencies and other ML data-driven models in future.

Details

Studies in Economics and Finance, vol. 39 no. 3
Type: Research Article
ISSN: 1086-7376

Keywords

Article
Publication date: 21 December 2021

Ling Jiang, Tingsheng Zhao, Chuxuan Feng and Wei Zhang

This research is aimed at predicting tower crane accident phases with incomplete data.

360

Abstract

Purpose

This research is aimed at predicting tower crane accident phases with incomplete data.

Design/methodology/approach

The tower crane accidents are collected for prediction model training. Random forest (RF) is used to conduct prediction. When there are missing values in the new inputs, they should be filled in advance. Nevertheless, it is difficult to collect complete data on construction site. Thus, the authors use multiple imputation (MI) method to improve RF. Finally the prediction model is applied to a case study.

Findings

The results show that multiple imputation RF (MIRF) can effectively predict tower crane accident when the data are incomplete. This research provides the importance rank of tower crane safety factors. The critical factors should be focused on site, because the missing data affect the prediction results seriously. Also the value of critical factors influences the safety of tower crane.

Practical implication

This research promotes the application of machine learning methods for accident prediction in actual projects. According to the onsite data, the authors can predict the accident phase of tower crane. The results can be used for tower crane accident prevention.

Originality/value

Previous studies have seldom predicted tower crane accidents, especially the phase of accident. This research uses tower crane data collected on site to predict the phase of the tower crane accident. The incomplete data collection is considered in this research according to the actual situation.

Details

Engineering, Construction and Architectural Management, vol. 30 no. 3
Type: Research Article
ISSN: 0969-9988

Keywords

Book part
Publication date: 5 April 2024

Christine Amsler, Robert James, Artem Prokhorov and Peter Schmidt

The traditional predictor of technical inefficiency proposed by Jondrow, Lovell, Materov, and Schmidt (1982) is a conditional expectation. This chapter explores whether, and by…

Abstract

The traditional predictor of technical inefficiency proposed by Jondrow, Lovell, Materov, and Schmidt (1982) is a conditional expectation. This chapter explores whether, and by how much, the predictor can be improved by using auxiliary information in the conditioning set. It considers two types of stochastic frontier models. The first type is a panel data model where composed errors from past and future time periods contain information about contemporaneous technical inefficiency. The second type is when the stochastic frontier model is augmented by input ratio equations in which allocative inefficiency is correlated with technical inefficiency. Compared to the standard kernel-smoothing estimator, a newer estimator based on a local linear random forest helps mitigate the curse of dimensionality when the conditioning set is large. Besides numerous simulations, there is an illustrative empirical example.

1 – 10 of 48