Search results

1 – 10 of over 3000
Article
Publication date: 21 December 2021

Laouni Djafri

This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P…

384

Abstract

Purpose

This work can be used as a building block in other settings such as GPU, Map-Reduce, Spark or any other. Also, DDPML can be deployed on other distributed systems such as P2P networks, clusters, clouds computing or other technologies.

Design/methodology/approach

In the age of Big Data, all companies want to benefit from large amounts of data. These data can help them understand their internal and external environment and anticipate associated phenomena, as the data turn into knowledge that can be used for prediction later. Thus, this knowledge becomes a great asset in companies' hands. This is precisely the objective of data mining. But with the production of a large amount of data and knowledge at a faster pace, the authors are now talking about Big Data mining. For this reason, the authors’ proposed works mainly aim at solving the problem of volume, veracity, validity and velocity when classifying Big Data using distributed and parallel processing techniques. So, the problem that the authors are raising in this work is how the authors can make machine learning algorithms work in a distributed and parallel way at the same time without losing the accuracy of classification results. To solve this problem, the authors propose a system called Dynamic Distributed and Parallel Machine Learning (DDPML) algorithms. To build it, the authors divided their work into two parts. In the first, the authors propose a distributed architecture that is controlled by Map-Reduce algorithm which in turn depends on random sampling technique. So, the distributed architecture that the authors designed is specially directed to handle big data processing that operates in a coherent and efficient manner with the sampling strategy proposed in this work. This architecture also helps the authors to actually verify the classification results obtained using the representative learning base (RLB). In the second part, the authors have extracted the representative learning base by sampling at two levels using the stratified random sampling method. This sampling method is also applied to extract the shared learning base (SLB) and the partial learning base for the first level (PLBL1) and the partial learning base for the second level (PLBL2). The experimental results show the efficiency of our solution that the authors provided without significant loss of the classification results. Thus, in practical terms, the system DDPML is generally dedicated to big data mining processing, and works effectively in distributed systems with a simple structure, such as client-server networks.

Findings

The authors got very satisfactory classification results.

Originality/value

DDPML system is specially designed to smoothly handle big data mining classification.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 19 August 2021

Hendrik Kohrs, Benjamin Rainer Auer and Frank Schuhmacher

In short-term forecasting of day-ahead electricity prices, incorporating intraday dependencies is vital for accurate predictions. However, it quickly leads to dimensionality…

Abstract

Purpose

In short-term forecasting of day-ahead electricity prices, incorporating intraday dependencies is vital for accurate predictions. However, it quickly leads to dimensionality problems, i.e. ill-defined models with too many parameters, which require an adequate remedy. This study addresses this issue.

Design/methodology/approach

In an application for the German/Austrian market, this study derives variable importance scores from a random forest algorithm, feeds the identified variables into a support vector machine and compares the resulting forecasting technique to other approaches (such as dynamic factor models, penalized regressions or Bayesian shrinkage) that are commonly used to resolve dimensionality problems.

Findings

This study develops full importance profiles stating which hours of which past days have the highest predictive power for specific hours in the future. Using the profile information in the forecasting setup leads to very promising results compared to the alternatives. Furthermore, the importance profiles provide a possible explanation why some forecasting methods are more accurate for certain hours of the day than others. They also help to explain why simple forecast combination schemes tend to outperform the full battery of models considered in the comprehensive comparative study.

Originality/value

With the information contained in the variable importance scores and the results of the extensive model comparison, this study essentially provides guidelines for variable and model selection in future electricity market research.

Open Access
Article
Publication date: 28 July 2020

R. Shashikant and P. Chetankumar

Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart…

2361

Abstract

Cardiac arrest is a severe heart anomaly that results in billions of annual casualties. Smoking is a specific hazard factor for cardiovascular pathology, including coronary heart disease, but data on smoking and heart death not earlier reviewed. The Heart Rate Variability (HRV) parameters used to predict cardiac arrest in smokers using machine learning technique in this paper. Machine learning is a method of computing experience based on automatic learning and enhances performances to increase prognosis. This study intends to compare the performance of logistical regression, decision tree, and random forest model to predict cardiac arrest in smokers. In this paper, a machine learning technique implemented on the dataset received from the data science research group MITU Skillogies Pune, India. To know the patient has a chance of cardiac arrest or not, developed three predictive models as 19 input feature of HRV indices and two output classes. These model evaluated based on their accuracy, precision, sensitivity, specificity, F1 score, and Area under the curve (AUC). The model of logistic regression has achieved an accuracy of 88.50%, precision of 83.11%, the sensitivity of 91.79%, the specificity of 86.03%, F1 score of 0.87, and AUC of 0.88. The decision tree model has arrived with an accuracy of 92.59%, precision of 97.29%, the sensitivity of 90.11%, the specificity of 97.38%, F1 score of 0.93, and AUC of 0.94. The model of the random forest has achieved an accuracy of 93.61%, precision of 94.59%, the sensitivity of 92.11%, the specificity of 95.03%, F1 score of 0.93 and AUC of 0.95. The random forest model achieved the best accuracy classification, followed by the decision tree, and logistic regression shows the lowest classification accuracy.

Details

Applied Computing and Informatics, vol. 19 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 23 March 2021

Mostafa El Habib Daho, Nesma Settouti, Mohammed El Amine Bechar, Amina Boublenza and Mohammed Amine Chikh

Ensemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems…

Abstract

Purpose

Ensemble methods have been widely used in the field of pattern recognition due to the difficulty of finding a single classifier that performs well on a wide variety of problems. Despite the effectiveness of these techniques, studies have shown that ensemble methods generate a large number of hypotheses and that contain redundant classifiers in most cases. Several works proposed in the state of the art attempt to reduce all hypotheses without affecting performance.

Design/methodology/approach

In this work, the authors are proposing a pruning method that takes into consideration the correlation between classifiers/classes and each classifier with the rest of the set. The authors have used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by a technique inspired by the CFS (correlation feature selection) algorithm.

Findings

The proposed method CES (correlation-based Ensemble Selection) was evaluated on ten datasets from the UCI machine learning repository, and the performances were compared to six ensemble pruning techniques. The results showed that our proposed pruning method selects a small ensemble in a smaller amount of time while improving classification rates compared to the state-of-the-art methods.

Originality/value

CES is a new ordering-based method that uses the CFS algorithm. CES selects, in a short time, a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-the-art techniques used in this study.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 30 December 2020

Suraj Kulkarni, Suhas Suresh Ambekar and Manoj Hudnurkar

Increasing health-care costs are a major concern, especially in the USA. The purpose of this paper is to predict the hospital charges of a patient before being admitted. This will…

Abstract

Purpose

Increasing health-care costs are a major concern, especially in the USA. The purpose of this paper is to predict the hospital charges of a patient before being admitted. This will help a patient who is getting admitted: “electively” can plan his/her finance. Also, this can be used as a tool by payers (insurance companies) to better forecast the amount that a patient might claim.

Design/methodology/approach

This research method involves secondary data collected from New York state’s patient discharges of 2017. A stratified sampling technique is used to sample the data from the population, feature engineering is done on categorical variables. Different regression techniques are being used to predict the target value “total charges.”

Findings

Total cost varies linearly with the length of stay. Among all the machine learning algorithms considered, namely, random forest, stochastic gradient descent (SGD) regressor, K nearest neighbors regressor, extreme gradient boosting regressor and gradient boosting regressor, random forest regressor had the best accuracy with R2 value 0.7753. “Age group” was the most important predictor among all the features.

Practical implications

This model can be helpful for patients who want to compare the cost at different hospitals and can plan their finances accordingly in case of “elective” admission. Insurance companies can predict how much a patient with a particular medical condition might claim by getting admitted to the hospital.

Originality/value

Health care can be a costly affair if not planned properly. This research gives patients and insurance companies a better prediction of the total cost that they might incur.

Details

International Journal of Innovation Science, vol. 13 no. 1
Type: Research Article
ISSN: 1757-2223

Keywords

Article
Publication date: 1 October 2018

Vinod Nistane and Suraj Harsha

In rotary machines, the bearing failure is one of the major causes of the breakdown of machinery. The bearing degradation monitoring is a great anxiety for the prevention of…

Abstract

Purpose

In rotary machines, the bearing failure is one of the major causes of the breakdown of machinery. The bearing degradation monitoring is a great anxiety for the prevention of bearing failures. This paper aims to present a combination of the stationary wavelet decomposition and extra-trees regression (ETR) for the evaluation of bearing degradation.

Design/methodology/approach

The higher order cumulants features are extracted from the bearing vibration signals by using the stationary wavelet decomposition (stationary wavelet transform [SWT]). The extracted features are then subjected to the ETR for obtaining normal and failure state. A dominance level curve build using the dissimilarity data of test object and retained as health degradation indicator for the evaluation of bearing health.

Findings

Experiment conducts to verify and assess the effectiveness of ETR for the evaluation of performance of bearing degradation. To justify the preeminence of recommended approach, it is compared with the performance of random forest regression and multi-layer perceptron regression.

Originality/value

The experimental results indicated that the presently adopted method shows better performance for detecting the degradation more accurately at early stage. Furthermore, the diagnostics and prognostics have been getting much attention in the field of vibration, and it plays a significant role to avoid accidents.

Open Access
Article
Publication date: 8 July 2019

Daniel Abreu Vasconcellos de Paula, Rinaldo Artes, Fabio Ayres and Andrea Maria Accioly Fonseca Minardi

Although credit unions are nonprofit organizations, their objectives depend on the efficient management of their resources and credit risk aligned with the principles of the…

2545

Abstract

Purpose

Although credit unions are nonprofit organizations, their objectives depend on the efficient management of their resources and credit risk aligned with the principles of the cooperative doctrine. This paper aims to propose the combined use of credit scoring and profit scoring to increase the effectiveness of the loan-granting process in credit unions.

Design/methodology/approach

This sample is composed by the data of personal loans transactions of a Brazilian credit union.

Findings

The analysis reveals that the use of statistical methods improves significantly the predictability of default when compared to the use of subjective techniques and the superiority of the random forests model in estimating credit scoring and profit scoring when compared to logit and ordinary least squares method (OLS) regression. The study also illustrates how both analyses can be used jointly for more effective decision-making.

Originality/value

Replacing subjective analysis with objective credit analysis using deterministic models will benefit Brazilian credit unions. The credit decision will be based on the input variables and on clear criteria, turning the decision-making process impartial. The joint use of credit scoring and profit scoring allows granting credit for the clients with the highest potential to pay debt obligation and, at the same time, to certify that the transaction profitability meets the goals of the organization: to be sustainable and to provide loans and investment opportunities at attractive rates to members.

Details

RAUSP Management Journal, vol. 54 no. 3
Type: Research Article
ISSN: 2531-0488

Keywords

Article
Publication date: 5 June 2017

Hao Wu

This paper aims to inspect the defects of solder joints of printed circuit board in real-time production line, simple computing and high accuracy are primary consideration factors…

Abstract

Purpose

This paper aims to inspect the defects of solder joints of printed circuit board in real-time production line, simple computing and high accuracy are primary consideration factors for feature extraction and classification algorithm.

Design/methodology/approach

In this study, the author presents an ensemble method for the classification of solder joint defects. The new method is based on extracting the color and geometry features after solder image acquisition and using decision trees to guarantee the algorithm’s running executive efficiency. To improve algorithm accuracy, the author proposes an ensemble method of random forest which combined several trees for the classification of solder joints.

Findings

The proposed method has been tested using 280 samples of solder joints, including good and various defect types, for experiments. The results show that the proposed method has a high accuracy.

Originality/value

The author extracted the color and geometry features and used decision trees to guarantee the algorithm's running executive efficiency. To improve the algorithm accuracy, the author proposes using an ensemble method of random forest which combined several trees for the classification of solder joints. The results show that the proposed method has a high accuracy.

Details

Soldering & Surface Mount Technology, vol. 29 no. 3
Type: Research Article
ISSN: 0954-0911

Keywords

Article
Publication date: 3 April 2024

Samar Shilbayeh and Rihab Grassa

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to…

Abstract

Purpose

Bank creditworthiness refers to the evaluation of a bank’s ability to meet its financial obligations. It is an assessment of the bank’s financial health, stability and capacity to manage risks. This paper aims to investigate the credit rating patterns that are crucial for assessing creditworthiness of the Islamic banks, thereby evaluating the stability of their industry.

Design/methodology/approach

Three distinct machine learning algorithms are exploited and evaluated for the desired objective. This research initially uses the decision tree machine learning algorithm as a base learner conducting an in-depth comparison with the ensemble decision tree and Random Forest. Subsequently, the Apriori algorithm is deployed to uncover the most significant attributes impacting a bank’s credit rating. To appraise the previously elucidated models, a ten-fold cross-validation method is applied. This method involves segmenting the data sets into ten folds, with nine used for training and one for testing alternatively ten times changeable. This approach aims to mitigate any potential biases that could arise during the learning and training phases. Following this process, the accuracy is assessed and depicted in a confusion matrix as outlined in the methodology section.

Findings

The findings of this investigation reveal that the Random Forest machine learning algorithm superperforms others, achieving an impressive 90.5% accuracy in predicting credit ratings. Notably, our research sheds light on the significance of the loan-to-deposit ratio as a primary attribute affecting credit rating predictions. Moreover, this study uncovers additional pivotal banking features that intensely impact the measurements under study. This paper’s findings provide evidence that the loan-to-deposit ratio looks to be the purest bank attribute that affects credit rating prediction. In addition, deposit-to-assets ratio and profit sharing investment account ratio criteria are found to be effective in credit rating prediction and the ownership structure criterion came to be viewed as one of the essential bank attributes in credit rating prediction.

Originality/value

These findings contribute significant evidence to the understanding of attributes that strongly influence credit rating predictions within the banking sector. This study uniquely contributes by uncovering patterns that have not been previously documented in the literature, broadening our understanding in this field.

Details

International Journal of Islamic and Middle Eastern Finance and Management, vol. 17 no. 2
Type: Research Article
ISSN: 1753-8394

Keywords

Article
Publication date: 20 December 2022

Ganisha N.P. Athaudage, H. Niles Perera, P.T. Ranil S. Sugathadasa, M. Mavin De Silva and Oshadhi K. Herath

The crude oil supply chain (COSC) is one of the most complex and largest supply chains in the world. It is easily vulnerable to extreme events. Recently, the severe acute…

Abstract

Purpose

The crude oil supply chain (COSC) is one of the most complex and largest supply chains in the world. It is easily vulnerable to extreme events. Recently, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (often known as COVID-19) pandemic created a massive imbalance between supply and demand which caused significant price fluctuations. The purpose of this study is to explore the influential factors affecting the international COSC in terms of consumption, production and price. Furthermore, it develops a model to predict the international crude oil price during disease outbreaks using Random Forest (RF) regression.

Design/methodology/approach

This study uses both qualitative and quantitative approaches. A qualitative study is conducted using a literature review to explore the influential factors on COSC. All the data are extracted from Web sources. In addition to COVID-19, four other diseases are considered to optimize the accuracy of predictive results. A principal component analysis is deployed to reduce the number of variables. A forecasting model is developed using RF regression.

Findings

The findings of the qualitative analysis characterize the factors that influence international COSC. The findings of quantitative analysis emphasize that production and consumption have a higher contribution to the variance of the data set. Also, this study found that the impact caused to crude oil price varies with the region. Most importantly, the model introduced using the RF technique provides a high predictive ability in short horizons such as infectious diseases. This study delivers future directions and insights to researchers and practitioners to expand the study further.

Originality/value

This is one of the few available pieces of research which uses the RF method in the context of crude oil price forecasting. Additionally, this study examines international COSC in the events of emergencies, specifically disease outbreaks using machine learning techniques.

Details

International Journal of Energy Sector Management, vol. 17 no. 6
Type: Research Article
ISSN: 1750-6220

Keywords

1 – 10 of over 3000