To read the full version of this content please select one of the options below:

Detecting Non-injured Passengers and Drivers in Car Accidents: A New Under-resampling Method for Imbalanced Classification

Advances in Business and Management Forecasting

ISBN: 978-1-78754-290-7, eISBN: 978-1-78754-289-1

ISSN: 1477-4070

Publication date: 6 September 2019

Abstract

In recent years, the problem of classification with imbalanced data has been growing in popularity in the data-mining and machine-learning communities due to the emergence of an abundance of imbalanced data in many fields. In this chapter, we compare the performance of six classification methods on an imbalanced dataset under the influence of four resampling techniques. These classification methods are the random forest, the support vector machine, logistic regression, k-nearest neighbor (KNN), the decision tree, and AdaBoost. Our study has shown that all of the classification methods have difficulty when working with the imbalanced data, with the KNN performing the worst, detecting only 27.4% of the minority class. However, with the help of resampling techniques, all of the classification methods experience improvement on overall performances. In particular, the Random Forest, in combination with the random over-sampling technique, performs the best, achieving 82.8% balanced accuracy (the average of the true-positive rate and true-negative rate).

We then propose a new procedure to resample the data. Our method is based on the idea of eliminating “easy” majority observations before under-sampling them. It has further improved the balanced accuracy of the Random Forest to 83.7%, making it the best approach for the imbalanced data.

Keywords

Citation

Nguyen, S., Niu, G., Quinn, J., Olinsky, A., Ormsbee, J., Smith, R.M. and Bishop, J. (2019), "Detecting Non-injured Passengers and Drivers in Car Accidents: A New Under-resampling Method for Imbalanced Classification", Advances in Business and Management Forecasting (Advances in Business and Management Forecasting, Vol. 13), Emerald Publishing Limited, Bingley, pp. 93-105. https://doi.org/10.1108/S1477-407020190000013011

Publisher

:

Emerald Publishing Limited

Copyright © 2019 Emerald Publishing Limited