TY  - CHAP
AB  - Abstract In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.
VL  - 13
SN  - 978-1-78754-290-7, 978-1-78754-289-1/1477-4070
DO  - 10.1108/S1477-407020190000013011
UR  - https://doi.org/10.1108/S1477-407020190000013011
AU  - Nguyen Son
AU  - Niu Gao
AU  - Quinn John
AU  - Olinsky Alan
AU  - Ormsbee Jonathan
AU  - Smith Richard M.
AU  - Bishop James
PY  - 2019
Y1  - 2019/01/01
TI  - Detecting Non-injured Passengers and Drivers in Car Accidents: A New Under-resampling Method for Imbalanced Classification
T2  - Advances in Business and Management Forecasting
T3  - Advances in Business and Management Forecasting
PB  - Emerald Publishing Limited
SP  - 93
EP  - 105
Y2  - 2024/04/26
ER  -