Building a novel classifier based on teaching learning based optimization and radial basis function neural networks for non-imputed database with irrelevant features

This work presents a novel approach by considering teaching learning based optimization (TLBO) and radial basis function neural networks (RBFNs) for building a classifier for the databases with missing values and irrelevant features. The least square estimator and relief algorithm have been used for imputing the database and evaluating the relevance of features, respectively. The preprocessed dataset is used for developing a classifier based on TLBO trained RBFNs for generating a concise and meaningful description for each class that can be used to classify subsequent instances with no known class label. The method is evaluated extensively through a few bench-mark datasets obtained from UCI repository. The experimental results confirm that our approach can be a promising tool towards constructing a classifier from the databases with missing values and irrelevant attributes.


Introduction
The occurrence of missing values and irrelevant features in real data are not uncommon, whereas data mining algorithms are designed for quality data [1].Hence, building a classifier for the dataset consist of missing values and many irrelevant attributes leads to non-useful results [2].Therefore, to derive novel and useful results for the decision maker, the process of imputing and identifying missing values and relevant features, respectively are highly recommended.Since decades ago these two problems are treated as the problem of importance in object detection & recognition (pattern recognition) [11] and data mining [3] in general and ECG signals diagnosis [13], power flow calculation [14], simulation and control of dynamic system [15], magnetic modeling [16], identification and classification of plant leaf diseases [17], discrimination of low and full fat Yogurts [19,20,22] in specific.
There are several approaches to impute missing values of which we concentrate on least square estimation method [2,3].A large variety of feature selection techniques have been developed under the umbrella of filter, wrapper, and embedded methods with a goal to select relevant subset of features [4].In this work a filter style approach known as "Relief" method is used for selecting a subset of attributes that preserves the relevant information found in the entire set of attributes [5].After the task of imputation of missing values and selection of the relevant set of features, we develop a classifier based on TLBO and RBFNs by inheriting their best features [6,7,8].RBFN one of the members of artificial neural networks (ANNs) [21,22] has good generalization, simple structure and strong tolerance to noise which ignited us to consider here as a suitable method of classification.Many methods have been developed for training RBFNs [12,17,18], however, to the best of our knowledge, training RBFNs using TLBO is new.TLBO is a population based optimization algorithm motivated by a teacher on the output of learners within a classroom environment, where learners first obtain knowledge from teacher and subsequently from classmates.Moreover, a new improved TLBO (iTLBO) has been proposed to train the RBFNs.
In a nutshell, this work undergoes three different phases like imputation of missing values by least square estimation approach, feature selection through Relief, and classification by iTLBO trained RBFNs in pipeline.

Background
The background of this research work like missing values imputation, feature selection, RBFNs, and TLBO are discussed here.

Imputation of missing values and feature selection
The problem of classification is basically the foundation of dividing the feature space into sections, one section for each category of inputs.Classifiers are usually, designed with labeled data, which is sometime referred to as supervised classification.In general, classification with missing data and irrelevant features focuses on three distinct tasks: handling missing values [1] (i.e., imputing values), feature selection, and pattern classification.Let D 5 [x ij ] Nxd , where i 5 1, 2,. .., N , and j 5 1,2,. ..,d, is the dataset containing N samples and d features.In D, each sample is assigned a class label from the set C 5 {c 1 , c 2 ,..,c M }, where jCj5 M. Let each x ij , be represented as a tuple (x ij , y ij ), in which y ij can take only two values either 0 or 1.If the value of v ij 5 0, then its associated x ij value is missing, otherwise present.Input data has quantitative and qualitative variables.Quantitative or continuous data is measured on a numerical scale.Non-numerical (i.e., colors, names, opinions) is called qualitative data, which can be discrete or categorical.
The overall goal of handling missing value is to map the value of y ij from 0 to 1 by substituting an appropriate value of x ij with less bias.
Alongside feature selection problem is defined as to select a subset of features from the given set of features, thereby the dataset is mapped from (x ij ) Nxd to (x ij ) Nxk , where k ( d.With this intention, filter method is selecting the most relevant features, however, a predefined quality measure is necessary to establish the level of relevance of the features.Filter method is not able to identify correlation among the features simultaneously.Unlike filter, wrapper is able to address correlation among features because it uses the performance of the classifier to optimize the subset.This also led towards problem of intractability.Moreover, this method has the additional cost of reconstructing the classifier with modified feature subset.Hence to avoid these issues, a filter like algorithm known as Relief method is employed here.

Radial basis function networks
The RBF network [8] is a topology having three layers: an input, a hidden, and a linear output layer (see Figure 1).The input can be modeled as n-dimensional input vector.The hidden layer implements a radial activation function and that carry out a non-linear transformation from the input space to the hidden space.The center and width are two parameters associated for each hidden node.Usually, a nonlinear transformation from input to the hidden space is made based on Gaussian kernel as described in Eq. ( 1).
where jj. ..jj represents Euclidean norm, μ i , σ i , and w i are center, spread, and the output of ith hidden unit, respectively.The interconnection between the hidden and output layer are made through a weighted connections w i .The output layer, a summation unit, supplies the response of the network to the outside world.
The radial basis function is so named because the value of the function is same for all points which are at the same distance from the center.

Building a novel classifier for non-imputed database
In literature, radial basis function networks [6] have many extensive uses, including classification, time series prediction, function approximation, etc. Training RBF networks is normally faster than training multi-layer perceptron (MLP) networks.Training of RBF network [9,11] involves two steps: (1) the kernel parameters of the hidden neurons are determined by an unsupervised method or heuristic method; (2) The weights of the outputlayer are determined by pseudo-inverse method.

Teaching learning based optimization
Teaching learning based optimization is one of the population based nature inspired algorithms introduced by Rao et al. [6,9].This is inspired purely from the natural phenomena of teaching-learning process that motivated by a teacher on the output of learners within a classroom environment, where learners first obtain knowledge from teacher and subsequently from classmates.In the first phase, a teacher imparts knowledge directly to his/her students.In practice, the possibility of a teacher's teaching being successful, is distributed under Gaussian law.Overall, how much knowledge is transferred to a student depends not only on his/her teacher but also interactions among the students through peer learning.A basic algorithm of TLBO is presented below.

Proposed method
Our integrated approach is undergoing three phases in pipeline.In first phase, the missing values are imputed by least square estimator, in second phase the relief algorithm is used for feature selection and finally our improved TLBO based RBFN is used for building the classifiers for the preprocessed database.Figure 2 is illustrating our approach.

Missing value imputation using least-square estimator
In this phase, we estimate the missing value from D by formulating a matrix A, where all the attribute values are known.In the least-square problem, the output of a model is given by the linearly parameterized expression, If the target system has q outputs, expressed as y ¼ ½y 1 ; . . .y q T with q > 1 then we have a set of linear equations in matrix form, AΘ þ E 5 Y, where A is an mxn matrix as given below: and Θ is an n 3 q unknown parameter matrix: is an m 3 q output matrix with y ij denoting the jth output value in the ith data pairs.
After getting the value of Θ, we will continue imputing all the values in the data set D. Building a novel classifier for non-imputed database

Relief algorithm for feature selection
In this second phase of our work, we discuss Relief algorithm inspired by instance based learning.It is an filter method algorithm for individual feature selection.It calculates a proxy statistics for each feature that can be used to estimate the feature quality or relevance to the target concept.The pseudocode of this method is given below.

Improved TLBO based RBFN
In third phase, we are building a RBFN classifier which is trained by TLBO and improved TLBO.First we will provide a detailed introduction to the improved TLBO and then the improved TLBO þ RBFN network is developed with the aim of achieving better classification accuracy.

Improved TLBO (iTLBO).
In the canonical TLBO, during the learning phase the learner is exposed to the entire population of the class.However, it has been realized that if the learner is restricted with a peer team instead of all individuals of the population then he/ she can raise his/her level of acquiring knowledge.With this idea, we are introducing a neighborhood structure of learners as peer learners group for making a learner to learn.Hence, in the learner phase, we have adopted a square topology as peer learners group for a learner.That means a student will not only acquire knowledge from the best of all individuals (i.e., teacher) but also he/she improves his/her standard from his neighborhood of fellow learners.In that context, the learner phase of TLBO has been modified as given below.
Here the nearest_neighbor( ) will find out a group of peer learners for a learner.The size of the neighborhood can be treated as a parameter for learner phase.Alongside, we have also made the teaching parameter (T F ) adaptive by considering the individual fitness value and population diversity.Recall that the teaching factor decides the value of mean to be changed.In the canonical TLBO, the value of T F is either 1 or 2 thereby learners learn nothing from the teacher or learn all the things from the leaner.But in real practice, the value of T F may be between 1 and 2 include both.Hence to make this idea fruitful, the fitness variable is selected as inputs to choose T F .BS is containing the global best solution denoted as X k g found so far i.e. up to kth iteration, which is just a position for one individual, corresponding to the best fitness F k g .So the global best solution fitness differentials between kth and k-1th can be defined as: Now, we can give definition for function of convergence speed as follows: where Δ 1 ¼ maxfΔF 1 ; ΔF 2 ; . . .ΔF k g Eq. ( 5) can calculate the convergence speed, which is less than or equal to 1.
In evolution process of TLBO, population diversity is a major factor.For computing the diversity of the population, standard deviation of the individual fitness values of population can be used.In this paper, we present a new strategy for calculating population position diversity by fitness value.The population position diversity can be obtained by using deviation ideology approach defined in Eqs. ( 6)- (8).

Building a novel classifier for non-imputed database
where F k avg is the average population fitness for current kth iteration; F k ðiÞ stands for ith individual fitness; Δ 2 stands for normalization factor; jPj is the population size, σ 2 represents the population diversity.It is evident that larger the σ 2 , the larger is the population diversity.
To improve adaptive teaching factor T F , we use the index C_S for representing the convergence speed with respect to the best solution fitness found so far in current iteration, and the index σ 2 to represent diversity with regards to the population fitness deviation.Hence we can compute T F by C_S and σ 2 adaptively, as follows: where α; β are factors; σ 2 and C_S are less than or equal to 1 and greater than 0, so [6] suggested that the value of T F can be either 1 or 2. Hence, we set α þ β þ 1 ≤ 2. The proposed method of adaptive teaching factor (T F ) is applied for better local searching ability that improves the accuracy and convergence speed.
3.3.2iTLBO þ RBFN.This section describes the iTLBO þ RBFN which can adjust the network parameters during the training process.In the initialization stage, let the position of the ith individual be represented as shown in Figure 3. RBFNs mainly depend on center and width of the kernel in addition to weights and bias.However, here, we just encode the centers, widths, and bias into an individual for stochastic search using iTLBO.
Suppose the maximum number of kernel nodes is set to K max , then the structure of the individual is represented as follows (c.f., Figure 3): In other words, each individual has three constituent parts such as center, width, and bias.The length of the individual is 2K max þ 1.
The fitness function which is used to guide the search process is defined in equation (10).
where, N is the total number of training instances, t i is the actual output and b Φð x i !Þ is the estimated output of RBFNs.Initially, the centers, widths, and bias are computed using training vectors, the weight is computed using pseudo-inverse method.

Experimental study
In the experimental study, we start with a brief description of the datasets, their characteristics about missing information and parameters used for simulation.Then we display results obtained by two different methods like TLBO þ RBFN and iTLBO þ RBFN along with detail analysis.

Description of datasets and parameters
The datasets used in this work were obtained from the UCI machine learning repository [10].Seven datasets have been chosen to validate the proposed method i.e., iTLBO þ RBFN.The details about the seven datasets are given in Table 1.The algorithmic parameters like population size, number of iterations, etc are fixed based on empirical analysis as follows.
The size of the population is equal to 100, number of iterations fixed at 300, size of the neighborhood is restricted with 10% of population size, and the value of T F has been adapted as per suggestions given in sub-section 3.3.1 with α value from (0, 1) and β value from (0, 1).
The parameters of multi-layer perceptron (MLP) along with training algorithms and Simple Logistic are defined as prescribed in [3].

Results and analysis
The average results of the experiment obtained from 10 fold cross validation of 30 independent runs are given in Tables 2-7.
From Table 2 it is found that for 7 different datasets iTLBO þ RBFN gives better accuracy than TLBO þ RBFN, MLP, and Simple Logistic.To support the above results of TLBO þ RBFN, statistical analysis based on the measures derived through confusion matrix is presented in Tables 3 and 4  Table 1.

Conclusions
An integrated approach of iTLBO and RBFN has been proposed for making a classifier to classify unseen data by carefully considering the issues like missing values and dimensionality reduction.The approach undergoes three different phases before drawing any conclusions.In first phase, preprocessing task like missing value imputation is carried out by least square estimator.In second phase by Relief the relevant attributes are selected.Finally in the third phase a classifier is built by integrating iTLBO and RBFN.Determining the optimum key parametric values of RBFN, iTLBO is adopted.After careful training, the model was tested and it was noticed that in all datasets, iTLBO þ RBFN is performing better than TLBO þ RBFN in the case of complete dataset.Our bag of future research includes applications in big data and more parametric analysis of iTLBO in correspondence with the natural teaching-learning process.

Table 2 .
ClassificationFrom the Statistical analysis it can be observed that the calculated Kappa-values for TLBO þ RBFN with feature selection are much better than TLBO þ RBFN without feature selection.