TY - CHAP AB - Abstract In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data. VL - 13 SN - 978-1-78754-290-7, 978-1-78754-289-1/1477-4070 DO - 10.1108/S1477-407020190000013011 UR - https://doi.org/10.1108/S1477-407020190000013011 AU - Nguyen Son AU - Niu Gao AU - Quinn John AU - Olinsky Alan AU - Ormsbee Jonathan AU - Smith Richard M. AU - Bishop James PY - 2019 Y1 - 2019/01/01 TI - Detecting Non-injured Passengers and Drivers in Car Accidents: A New Under-resampling Method for Imbalanced Classification T2 - Advances in Business and Management Forecasting T3 - Advances in Business and Management Forecasting PB - Emerald Publishing Limited SP - 93 EP - 105 Y2 - 2024/04/26 ER -