To read the full version of this content please select one of the options below:

A methodology for classification and validation of customer datasets

Dongyun Nie (Insight Centre for Data Analytics, School of Computing, Dublin City University, Dublin, Ireland)
Paolo Cappellari (College of Staten Island, City University of New York, New York, USA)
Mark Roantree (Insight Centre for Data Analytics, School of Computing, Dublin City University, Dublin, Ireland)

Journal of Business & Industrial Marketing

ISSN: 0885-8624

Article publication date: 28 September 2020

Issue publication date: 25 May 2021




The purpose of this paper is to develop a method to classify customers according to their value to an organization. This process is complicated by the disconnected nature of a customer record in an industry such as insurance. With large numbers of customers, it is of significant benefit to managers and company analysts to create a broad classification for all customers.


The initial step is to construct a full customer history and extract a feature set suited to customer lifetime value calculations. This feature set must then be validated to determine its ability to classify customers in broad terms.


The method successfully classifies customer data sets with an accuracy of 90%. This study also discovered that by examining the average value for key variables in each customer segment, an algorithm can label the group of clusters with an accuracy of 99.3%.

Research limitations/implications

Working with a real-world data set, it is always the case that some features are unavailable as they were never recorded. This can impair the algorithm’s ability to make good classifications in all cases.


This study believes that this research makes a novel contribution as it automates the classification of customers but in addition, the approach provides a high-level classification result (recall and precision identify the best cluster configuration) and detailed insights into how each customer is classified by two validation metrics. This supports managers in terms of market spend on new and existing customers.



This research work was funded by Science Foundation Ireland under grant numbers: SFI/12/RC/2289 and SFI/12/RC/2289-P2.

The authors would also like to acknowledge the insightful feedback and level of detail provided by the anonymous reviewers.


Nie, D., Cappellari, P. and Roantree, M. (2021), "A methodology for classification and validation of customer datasets", Journal of Business & Industrial Marketing, Vol. 36 No. 5, pp. 821-833.



Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles