WisdomModel: convert data into wisdom

Purpose – Traditional classification algorithms always have an incorrect prediction. As the misclassification rate increases, the usefulness of the learning model decreases. This paper presents the development of a wisdom framework that reduces the error rate to less than 3% without human intervention. Design/methodology/approach – The proposed WisdomModel consists of four stages: build a classifier, isolate the misclassified instances, construct an automated knowledge base for the misclassified instances and rectify incorrectprediction. This approach willidentify misclassifiedinstances by comparingthemagainst the knowledgebase.Ifaninstanceisclosetoarulein theknowledgebasebyacertainthreshold,thenthisinstance is considered misclassified. Findings – The authors have evaluated the WisdomModel using different measures such as accuracy, recall, precision, f-measure, receiver operating characteristics (ROC) curve, area under the curve (AUC) and error rate withvariousdatasetstoproveitsabilitytogeneralizewithouthumaninvolvement.Theresultsoftheproposed modelminimizethenumberofmisclassifiedinstancesbyatleast70%andincreasetheaccuracyofthemodelminimallyby7%. Originality/value – This research focuses on defining wisdom in practical applications. Despite of the developmentininformationsystem,thereisstillnoframeworkoralgorithmthatcanbeusedtoextractwisdom fromdata.Thisresearchwillbuildageneralwisdomframeworkthatcanbeusedinanydomaintoreachwisdom.


Introduction
The misclassification rate is a fundamental issue in many domains, where wrong predictions are costly. In real life, a high error rate in applications such as churn prediction [1], fraud detection [2], flood forecasting [3] and medical diagnosis [4] can lead to a costly outcome. Classification algorithms are often used to convert the information into knowledge that can be used by specialists to facilitate their work. This knowledge is represented as models. A model with high-performance metrics means that the classification algorithm was successfully able to extract knowledge and distinguish between different patterns of the data [5]. However, the extracted knowledge may not always provide correct predictions.
In some domains, models that have incorrect predictions even if the model accuracy is high (e.g. >80%) may not satisfy specialists due to the great impact of misclassification. Nevertheless, eliminating incorrect predictions is not possible in traditional classification algorithms unless a domain specialist can examine all the data to identify misclassified samples, which would make the model pointless.
The main problem is that traditional classification algorithms will always have incorrect predictions. Eliminating misclassification error means the use of human intervention to cross-check the result of the classifier and identify the misclassified instances, which eliminates the need for machine learning algorithms. The goal of this study is to build a WisdomModel that achieves a minimum error rate (less than 3%) without human intervention. Our approach identifies misclassified instances and corrects them. In contrast to classical classification algorithms that have overfitting issues when the accuracy is too high, the WisdomModel is unlikely to face such an issue. Moreover, the WisdomModel is designed for binary classification.
In this research, we develop a WisdomModel that uses the knowledge gained from traditional classification algorithms to make a better judgment. The model builds a knowledge base (KB) for misclassified instances of the trained data. The KB is used by the WisdomModel to find data with incorrect predictions.
The main contributions of this study are illustrated below: (1) WisdomModel is a general framework that converts the knowledge gained from machine learning models into wisdom, which can be used to make a good decision regardless of the data domain and without human intervention.
(3) WisdomModel can be applied to formerly developed models.
This research is organized as follows: Section 2 presents the related work. Section 3 describes the architecture of the WisdomModel. Section 4 presents the experiment and results of the proposed system, and Section 5 provides the conclusion and the direction of future work.

Related work
In recent years, there has been a trend toward machine learning algorithms in different fields such as medical diagnosis [6,7], earthquake prediction [8], churn prediction [9], voice recognition [10], scientific research [11,12] and risk estimation [13]. The purpose of using these algorithms is to avoid human interference and minimize the error rate. Many studies have aimed to eliminate the error rate by refusing to predict doubtful instances [14]. Instead, these methods deferred the decision of these suspicious instances to specialists who have to examine the data manually. Alpaydin [15] introduced several ideas regarding the doubt region in classification problems. Alpaydin defined the doubt region by the data that is covered by the general hypothesis and not included in the specific hypothesis. The general hypothesis contains positive instances and includes as many features without covering negative instances. The specific hypothesis contains only positive instances and includes as small feature extent as possible.
Amin et al. [16] developed a prudent churn prediction model that will generate an alert to the decision-makers whenever there is a new case that is not covered by the KB system. The developed approach uses ripple down rule (RDR) classifier to build the KB.
Tran and Aygun [17] proposed a WisdomNet model that adds an extra neuron to the output layer of the neural network. The additional neuron is used to recognize the rejected data. The trustable learning model refuses to decide for suspicious instances. The WisdomNet model minimizes the error rate to 0% after deferring more than 10% of the data to the specialists.
Although these models achieve zero misclassification error, the rejected rate may reach more than 10%. As the rejection rate increases, the cost associated with scanning the data increases. When dealing with big-data applications, a 10% rejection rate means millions of instances that need to be scanned manually by the experts. Such a process is unfeasible and abolishes the need for classification algorithms. Our approach eliminates the need for human experts and focuses on achieving a minimum error rate.
In the literature, there is very little research regarding converting data into wisdom in practical application. However, there are many discussions about the definitions of wisdom in information systems. Jankowski and Skowron [18] describe wisdom as the capability to recognize critical issues and solve them using the cumulated knowledge and experience to achieve its target. Wognin et al. [19] define wisdom as the capability to utilize knowledge to act correctly in a certain context to achieve its goal. Liew [20] lists wisdom dimensions: perspicacity, the quick use of information, judgment, learning from mistakes, sagacity and the ability to reason. Similarly, Ermine [21] divides the concept of wisdom into two types: individual wisdom, which means the ability to use skills and knowledge to enhance performance, and organizational wisdom that can be defined as the capability to perform actions. Van Meter [22] states several characteristics of wisdom such as wisdom does not mean excellence or infallibility, it can be constructed by grouping information, knowledge and experience, and it is used to improve the environment.
The WisdomModel presented in this paper is a general framework that can be used in any field to convert data into wisdom. The proposed model aims to minimize the error rate without human intervention by building an automated KB for misclassified instances and using it to reduce the error rate. Wisdom as it relates to machine learning is defined as the ability to reduce the misclassification rate without increasing the cost.

Architecture of WisdomModel
In this section, the architecture of data, information, knowledge and wisdom (DIKW) pyramid is discussed in Section 3.1. Datasets used in this research are presented in Section 3.2. The architecture of the WisdomModel is illustrated in Section 3.3.

DIKW model
The DIKW hierarchy is very popular in information and knowledge systems. DIKW is often represented as a pyramid, with data as the foundation and wisdom at the top of the pyramid [23]. This representation means that wisdom cannot be reached without processing the lowest elements of the pyramid. In this paper, our definitions for each element in the DIKW pyramid will be discussed.
(1) Data is the untreated material that is gathered by sensors or humans.
(2) Information is the interpretation of data that makes it easier to present and evaluate.
(3) Knowledge is the ability to distinguish between various patterns of information.
(4) Wisdom is the capability to use knowledge, experience and fine judgment to achieve the required goals

Datasets
In this study, we tested our approach on multiple datasets in different areas such as telecommunication, financial services and the medical field. The description of the datasets is listed below: (1) Churn prediction dataset that belongs to a telecom operator.
(2) Credit card fraud detection dataset can be found in Kaggle [24]. However, we noticed that the dataset is imbalanced, and we used the python SMOTE technique to balance it.
We used 80% of the data for training and 20% for testing. The number of instances used in training and testing is illustrated in Table 1.

WisdomModel
The WisdomModel is a general structure that is used to convert data into wisdom. Its name is inspired by Benjamin Franklin's quote, "The doorstep to the temple of wisdom is a knowledge of our own ignorance." Intelligent systems should be able to identify the ignorance regions of the data and deal with them in a wise way [27]. The ignorance region of the data is when the classifier fails to predict data correctly. The proposed WisdomModel is used to find the weakness of the classifier by recognizing and analyzing the misclassified instances and correcting them. The model consists of four stages: (1) build a classifier using the training dataset, (2) isolate misclassified instances, (3) construct an automated KB and (4) rectify incorrect predictions. Our approach is designed to work on binary classification issues for structured datasets. where nn 0 represents the number of neurons at the first hidden layer, nn 1 represents the number of neurons at the second hidden layer and so on. We will train the deep learning model D using the training dataset. The architectures of the deep learning models used in churn prediction, credit card fraud detection, lung cancer prediction and diabetes detection are expressed in D1, D2, D3 and D4, respectively.

Isolate misclassified instances.
The trained deep learning model D is used to classify the training dataset. We obtain the model prediction and compare it against the actual output to identify the incorrect prediction. The misclassified instances are isolated in a new dataset called R.
3.3.3 Construct an automated knowledge base. The R dataset is used to build the KB of misclassified instances. Choosing to build a KB for incorrect predictions instead of correct predictions should make the WisdomModel faster. We developed an algorithm to generate the KB in an automated manner without human intervention. The algorithm obtains rules that describe the general characteristics of misclassified data. The steps of building the KB are illustrated in Algorithm 1.

Rectify incorrect prediction.
The WisdomModel will use the knowledge and experience that was gained from the previous stages to provide good judgment. The KB and the deep learning model prediction are used in this stage to identify incorrect predictions in the test dataset and rectify them. Our approach will first use the D model to get the predictions of test dataset and then identify misclassified instances by comparing them against the KB. If an instance is close to a rule in the KB by a certain threshold, then this instance is considered misclassified. The process is summarized in Algorithm 2.

Results and discussion
The performance of the WisdomModel is evaluated against various datasets using diverse measures. Different architectures of the neural network were used to demonstrate the ability of the WisdomModel to generalize. We developed our model in Python3. The performance measures used in this paper are provided in Section 4.1. The results of the WisdomModel are discussed in Section 4.2.

Evaluation measures
Various measures are used to evaluate the performance of traditional classification algorithms such as accuracy, precision, recall, F-measure, receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) [28,29]. These measures are derived from the confusion matrix that is used to depict the performance of the classifier. The confusion matrix consists of four items: true positive (TP), true negative (TN), false positive (FP) and false negative (FN) [30,31]. The definition of each measure is illustrated in Table 2.

Analysis and discussion of WisdomModel
We first tested the performance of the WisdomModel against a big data telecom dataset that consists of millions of records. The deep learning model D1 was used to train the classifier and was able to achieve a 90.2% accuracy rate on the test dataset, although a 10% error rate is considered acceptable in many domains. But when dealing with big data such as our churn dataset, the 10% means that more than 260,000 subscribers were misclassified. This very large number of misclassified instances will highly affect the performance and the revenue of the company. Table 3 shows the results of the WisdomModel and deep learning model. The ROC curve of the WisdomModel is closer to the top left corner than the deep learning model, which is shown in Figure 1. We also measured the ability of the WisdomModel to separate between classes using an AUC measure. The WisdomModel surpassed the deep learning model with 0.97 AUC, whereas deep learning attained 0.89. Moreover, the error rate was reduced to 3%, and the number of misclassified instances was reduced by 70%.
We also applied the framework to other datasets to demonstrate the ability of the WisdomModel to generalize. We used a deep learning model D2 to predict credit card fraud.   Table 3. Furthermore, WisdomModel obtained 0.97 AUC while the deep learning model achieved 0.89 AUC. The error rate was reduced from 15.84% to 2.74%, and the number of misclassified instances was reduced by 82%.
Later on, we applied the model to two medical datasets. The results of the lung cancer prediction dataset showed that it is possible to reduce the error rate to 0% using the WisdomModel framework, as depicted in Figure 2(a).
Additionally, the results of the diabetes dataset revealed that the WisdomModel could increase the accuracy by 20% and attained a 98.51% accuracy rate compared to the deep learning model D4, which achieved a 78.33% accuracy rate, as presented in Figure 2(b). The error rate was reduced from 21.67% to 1.49%. Additionally, the WisdomModel attained 0.982 AUC compared to the deep learning model, which obtained 0.841.

Conclusion
Classical machine learning algorithms are always negatively affected by the incorrect predictions. Often error rate means that there are certain patterns in the data that the model is unable to classify. The benefit of applying learning models decreases when the misclassification rate increases. This research aims to develop a general framework that can convert data into wisdom in practical application. The WisdomModel uses the knowledge and experience gathered from the deep learning model and KB to identify misclassified instances and rectify them without human intervention. The proposed model was able to reduce the error rate to less than 3% in different domains. It is also possible to reach a zero error rate if the model continues to update the KB. Our approach can be applied to any classification algorithm and is not limited to neural networks. However, the WisdomModel cannot work on unstructured datasets such as images and videos. The future work is related to developing a wisdom model that can work on multinomial classification issues.