Design of ensemble recurrent model with stacked fuzzy ARTMAP for breast cancer detection

Purpose – In time and accurate detection of cancer can save the life of the person affected. According to the World Health Organization (WHO), breast cancer occupies the most frequent incidence among all the cancers whereas breast cancer takes fifth place in the case of mortality numbers. Out of many image processing techniques, certain works have focused on convolutional neural networks (CNNs) for processing these images. However, deep learning models are to be explored well. Design/methodology/approach – In this work, multivariate statistics-based kernel principal component analysis (KPCA) is used for essential features. KPCA is simultaneously helpful for denoising the data. These features are processed through a heterogeneous ensemble model that consists of three base models. The base models comprise recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). The outcomes of these base learners are fed to fuzzy adaptive resonance theory mapping (ARTMAP) model for decision making as the nodes are added to the F_2^a layer if the winning criteria are fulfilled that makes the ARTMAP model more robust. Findings – The proposed model is verified using breast histopathology image dataset publicly available at Kaggle. The model provides 99.36% training accuracy and 98.72% validation accuracy. The proposed model utilizes data processing in all aspects, i.e. image denoising to reduce the data redundancy, training by ensemble learning to provide higher results than that of single models. The final classification by a fuzzy ARTMAP model that controls the number of nodes depending upon the performance makes robust accurate classification. Research limitations/implications – Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signal instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models. Also, the practical implementation will be carried out for its real-time performance and cost efficiency. Originality/value – The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA. Training and classification are performed using heterogeneous ensemblemodeldesigned using RNN,LSTMand GRU as base classifiersto providehigher resultsthan that of single models. Use of adaptive fuzzy mapping model makes the final classification accurate. The effectiveness of combining these methods to a single model is analyzed in this work.


Introduction
Cancer is triggered by various genetic and environmental factors. In most cases, cancers originated as malignant tumors. Rapid growth in these tumors leads to cancer. Breast cancer comes under this category and is deadly. Non-invasive kinds of cancer are observed within the milk ducts with no growth and no spread to nearby tissues, whereas invasive ductal carcinoma (IDC) is just opposite to the previous type and appears in 80% of all cases of breast cancer. Breast cancer is causing the highest mortality rate among all other cancers as per the report shown in Ref.
[1] till 2020. In time detection of breast cancer may lead to an increase in the life span of the affected person. The cancer detection technique involves various imaging methods to picture the location, shape and size of the affected tissue producing cancer. The physicians manually observe the scanning reports which may lead to a time-consuming process. Automatic and accurate detection of breast cancer from the images obtained through various means can be preferred in the era of artificial intelligence.
Different deep learning models are preferred in various fields of medical image processing, signal processing, speech enhancement, speech recognition, image generation and also in many other areas. Researchers are attracted to these methods due to their human-like training algorithms as well as for high performance. Initially, single models like convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU) have been designed for specific purposes. Then, the era of the concatenation of these models started to study the performance. Few concatenated models were designed by concatenating the LSTM layers with convolution layers [2], and even, the traditional machine learning model support vector machine (SVM) [3] has also been attached as the classifier with convolutional layers.
Ensemble learning emerged as the highly improved performing method as compared to single deep learning models. Ensemble learning models are broadly classified as homogeneous and heterogeneous depending upon the types of base learners chosen for initial stage processing. An ensemble model designed with the same base models is termed homogeneous, whereas different base models lead to a heterogeneous ensemble model [4]. The heterogeneous ensemble learning method has been mostly preferred over homogeneous ones. A deep ensemble learning-based approach for accurate detection of breast cancer is used. The main contributions of this work are summarized in the points as follows: (1) The proposed model is utilized for denoising and to reduce the data redundancy so that the feature selection is done using KPCA; (2) Training and classification are performed using a heterogeneous ensemble model designed using RNN, LSTM and GRU as base classifiers to provide higher results than that of single models and (3) The final classification by a fuzzy adaptive resonance theory mapping (ARTMAP) model that controls the number of nodes depending upon the performance makes it robust for accurate classification.
The rest part of the paper is arranged subsequently: The second section describes the various works proposed in the field of breast cancer detection. The third section provides a detailed description of the proposed technique designed for breast cancer detection. The fourth section delivers the outcomes found from the proposed model. In the fifth section, conclusions are drawn along with the future scope of this work followed by the references.

Literature survey
The deep learning algorithm has appeared as the most chosen method for the detection of breast cancer. Deep learning is performing with more accuracy on image and signal data in single [5,6] and hybrid forms [7,8]. Principal component analysis (PCA) was used to reduce the data size followed by feature extraction by using a multilayer perceptron (MLP) model. The final classification was done by using SVM [9]. Semantic segmentation before detection using SVM and MLP has been used for breast cancer detection [10] from histopathology images. Color normalization followed by enhancement was carried out as preprocessing step. Refinement (R), correlation (C) and adaptive (A) algorithms were used for fine-tuning the features and classification by AdaBoost-oriented tree model [11]. Deep learning techniques are gaining the interest of researchers these days. Mitosis countbased cancer detection from breast histopathology images has been proposed using the Atrous Fully Connected Neural Network (A-FCNN) for segmentation and multi-scale and region-based CNN (MS-RCNN) model for detection [12]. A multi-instant pooling layer-based CNN (MI-CNN) model [13] has been suggested for breast cancer detection from histopathology images. Mislabeled patch correction and classification of histopathology images have been proposed for breast cancer detection [14]. The anomaly detection using generative adversarial network (Ano-GAN) was used for anomaly detection, whereas classification was done using DenseNet-121. A deep learning framework has been proposed for mitotic cell count-based cancer detection from histopathology images using Recurrent Residual U-Net (R2U-Net) for segmentation and Inception Recurrent Residual CNN (IRRCNN) for classification [15]. An auto-encoder designed with the concept of the residual-CNN model [16] has been proposed for cancer detection from hematoxylin and Eosin-stained histopathology images. The use of CNN as a feature extractor and Extreme Machine Learning (ELM) as a classifier has been found in Ref. [17] for malignant versus benign classification. A combination of SVM as anomaly detector and FCNN as a classifier has been proposed in Ref. [18] for breast cancer detection from histopathology images. Labeling the data and then classification has been proposed for cancer detection from histopathology images of the BreastHis Dataset using a deep active learning model [19]. Mitotic cell detection for breast cancer classification has been proposed using two CNNs connected in parallel [20]. Application of empirical wavelet transform (EWT), as well as variational mode decomposition (VMD) as preprocessors followed by ensemble of three CNNs and MLP [21], has been proposed for breast cancer detection from histopathology images. That work also used gene data from breast cancer for image conversion using the DeepInsight framework [22]. For bioimage classification, an ensemble model containing CNNs as a base classifier and a sum rule for final decision-making has been proposed [23]. A modified residual neural network [24] has been proposed for breast cancer detection from histopathology images using modified ResNet34 and modified ResNet50 models. A Stochastic Dilated Residual Ghost (SDRG) method [25] has been proposed for cancer detection from breast histopathology images. In recent work, the combination of the attention technique and residual CNN model [26] has been utilized for breast cancer detection. A combination of the Xception model as feature extractor and radial basis function (RBF) kernel-based SVM as classifier [27] has been proposed for breast cancer detection from histopathology images. The authors have also studied the effect of magnification factors on performance. A combined deep learning model [28] designed using inception and residual blocks has been proposed for cancer detection from histopathology images. The magnification values are also considered for images as a preprocessing step.
From the literature analysis, it is found that ensemble models in this field have not yet been explored. In this work, we have considered the detection of breast cancer from breast histopathology images using data denoising by kernel principal component analysis (KPCA) followed by ensemble recurrent models stacked with fuzzy ARTMAP.

Proposed method
It has been studied in various literature that the heterogeneous form of ensemble learning provides better performance in comparison to homogeneous ensemble models and single models due to the use of different fine-tunes algorithms [29][30][31]. KPCA [32] is used to denoise the normalized raw signal and reduce the redundant information. The ensemble model is of Recurrent model for detecting breast cancer three recurrent models, i.e. RNN, LSTM and GRU. The outcomes are fed to the ARTMAP model for final detection. The workflow diagram of the proposed model is shown in Figure 1.
The parameters which are set before training to obtain better performance and accuracy are termed hyper-parameters. In this work, each image of the dataset has 2500 numbers of pixels. Such enormous data will cause higher processing time. The presence of noise in the image data also increases the data complexity. To lower this effect of data complexity, KPCA is used to select the 500 highest eigenvalues. The data are mapped into higher space with the use of KPCA. These features are used to train each base model for further processing and feature extraction. To receive the 500 features provided by the KPCA, each neural network model has 500 nodes in the input layer. Each base model has 2 hidden layers with 32 and 64 numbers of nodes for deep feature extraction. Each model provides two predictions on the same data with two nodes in the output layer. The prediction values are the corresponding probabilities generated by the base models along with the labels of data. These parameters are concatenated to form six values (two values from three base classifiers) as metadata used to train the meta classifier fuzzy ARTMAP for final classification.

Dataset
The proposed model is experienced with the dataset [33]. The dataset contains 198,738 non-IDC images and 787860 images of the IDC category. 0 as class value represents non-IDC and IDC is considered as 1. Sample images from each category are shown in Figure 2. As 80% of the total dataset is used for training, the rest 20% is used for testing. However, the training set  Samples of breast histopathology images ACI is again divided into train and validation subsets using 80:20 ratios. Validation data are used for training, whereas testing data are neither used for training nor validation. Test data are separate and meant for testing only. The accuracy along with the robustness is explained in the result section.

Preprocessing
The color images of the breast histopathology are converted to grayscale images to reduce the computational requirements. The height and width values of each image remain the same as it is. Only the depth of each image was reduced from 3 to 1, i.e. the 50 3 50 3 3 images are converted to 50 3 50 3 1 sized images. Figure 3 shows the sample image after conversion from a color image to a gray image.
The grayscale images with attributes ðdÞ are then normalized using the mean normalization technique as given in Eq. (1).
3.3 Image denoising and feature selection RNN model may pass through vanishing gradient problems due to the large size of data, whereas the LSTM and GRU models are free from this problem [34]. Each image of the dataset has 2,500 numbers of pixels. Such enormous data will cause higher processing time.
The presence of noise in the image data also increases the data complexity. To lower this effect of data complexity, KPCA is used to select the 500 highest eigenvalues. Cosine kernel is considered as it is a metric that measures similar documents irrespective of size. Let the data fd i g ∈ ℝ D ∀ i ¼ 1 . . . n; of the high dimension D to higher dimension space fðd i Þ. fðd i Þ is the kernelized version of the input data space and the ability to capture the reduced form of the data. It is common to consider the feature space to have zero mean, which is given by Eq. (2).
The covariance matrix is then computed using Eq. (3). The corresponding eigenvalues and eigenvectors are given by Eq. (4). Where According to Mercer's theory, the kernel κ is represented by Substituting Eq. (6) into Eq. (5) and multiplying fðd i Þ. On both the sides of equations, we get Rewriting Eq. (9) in terms of the kernel, yields Where K ¼ κðd l ; d i Þ and a k represents the N-dimensional column eigenvectors of a ki . The denoised form obtained using KPCA is given by After computing the zero mean of the kernel, we obtained where K ¼ K ij : b K is known as the Gram matrix. In this work, we have used a cosine kernel in the KPCA. The cosine kernel is given by Eq. (15).

Training
The denoised data are used to train the three base learners connected in parallel to each other. All the base models are designed with four layers. The output layer has two nodes activated with the SoftMax activation function as two classes are there for classification. These deep learning models are evaluated using the binary cross-entropy loss (BCE) that is mathematically represented by Eq. (16) as follows: where y i represents the actual prediction given as target and b y i represents the experimental prediction done by the model.
From Figure 1, it can be visualized that the base classifiers are indirectly creating the features for ARTMAP. Majority voting, MLP [35] and Fuzzy Min-Max [36] models have been used as the meta classifier in various works. In those models, the number of nodes in each hidden layer is fixed to a certain value before training and cannot be updated according to the training requirements. But in fuzzy ARTMAP, the nodes are added to the F a 2 layer if the winning criteria are fulfilled. That makes the ARTMAP model more robust in comparison to earlier ones. ARTMAP is designed with the structure as shown in Figure 4.
Fuzzy ARTMAP is designed with two ART models, i.e. ART a and ART b , connected by a map field [37] that is capable of developing maps between the clusters generated in the input domain ART a and output domain ART b . In Figure 4, F 0 ; F 1 ; and F 2 represent the normalization layer, input layer and recognition layer, respectively, for both the domains.
The ART a module receives the concatenated outputs a of M dimension from ensemble recurrent models and forms a 2M-dimensional complement-coded vector A as given in Eq. (17).
The category nodes in the network are selected using the category choice function (CCF) given by Eq. (18).
Resonance occurs when the above winning criterion is fulfilled and the weights are updated using the weight update equation given in Eq. (20).
where β is the learning rate.
A new node is generated when the winning criterion is fulfilled in the F a 2 layer. The following algorithm is utilized in training the proposed model. The symbols used in the algorithm are as follows: Step 1. Preprocessing a.
Resize the images to 50x50 b.
Denoise the data using kPCA c.

Denoised features for training and testing
The size of each image of the dataset is denoised and the 2,500 numbers of pixel values are converted to high priority 500 values using KPCA. It is maintained because a low number of features degrade the performance of deep learning models.

Training, validation and classification results
The three recurrent models are trained with the denoised data and the well-fitted models are stored for stacking with each other. The performance of each model is represented in terms of accuracy in Figure 5. The overall performance of the stacked ensemble model is also shown in the same figure to show the improvement in the performance. The details of accuracies are provided in Table 1. Figure 6 provides the confusion matrix for each base model and the whole proposed model for validation data.   The performance of the whole model is increasing in comparison to base models due to the second stage training by the ARTMAP which helps in decision making.
The evaluation parameters such as accuracy, F1-score, recall, precision, sensitivity and specificity are calculated using the mathematical expressions given in Eq. (21) to Eq. (26). Recall The proposed model is also compared to a few state-of-the-art models and is provided in Table 2. The results in italics show the highest result. The training and validation loss graphs of the proposed stacked ensemble model are shown in Figure 7.

Discussion
The number of base learners is chosen as three for certain reasons. Two or four base models may affect the performance of the meta classifier if half of them will provide opposite results to that of another half. Three base learners are chosen so that the training of the meta classifier will not be adversely affected and the computation time will be less in comparison to five base classifiers. Figure 5 shows the accuracy of base models with the proposed stacked ensemble model. As it is observed, the accuracy of each individual starts from 40% and increased gradually. However, the stacked model provides 99.36% accuracy, which is given in Table 1. Also, this is proved in terms of the confusion matrix given in Figure 6. A total of 156 images were taken from the test data including both categories. In total,114 images were from non-IDC images, whereas 42 number images belong to IDC. From this figure, it is observed that the three base learners have the results of a maximum of 105 images as true positive (TP), whereas the   Recurrent  model for  detecting  breast cancer   stacked ensemble models detected 113 images as TP. Similarly, the proposed model predicted  41 images as true negative (TN) out of 42 IDC images, whereas this count is 33, 36 and 35, respectively, for RNN, LSTM and GRU models. Very low numbers of data are classified as false positive (FP) and false negative (FN) by the proposed model. Data overfitting is analyzed from the validation loss plot shown in Figure 7. The proposed model is free from data overfitting which is verified through convergence. Within 200 epochs, the convergence is found for both training and validation that proves the efficacy. Table 1 and Table 2 are showing the improvement in performance due to the proposed stacked ensemble method in comparison to both base models and state-of-the-art methods, respectively. The proper utilization of data preprocessing to avoid data redundancy and a suitable number of base classifiers of the proposed ensemble model combined together improved the performance.

Conclusion
In this work, we have utilized the data denoising by KPCA to reduce the data complexity burden from the classification model for faster processing. Features are extracted using the best form of deep learning, i.e. the deep ensemble learning developed using three recurrent models. The final classification is done with the stacked fuzzy ARTMAP. The proposed model is also free from data overfitting by considering a suitable number of iterations. It provides that the histopathology images are efficiently classified into IDC and non-IDC with 99.36% training and 98.72% validation accuracy.
Research in the field of medical applications is an ongoing method. More advanced algorithms are being developed for better classification. Still, the scope is there to design the models in terms of better performance, practicability and cost efficiency in the future. Also, the ensemble models may be chosen with different combinations and characteristics. Only signals instead of images may be verified for this proposed model. Experimental analysis shows the improved performance of the proposed model. This method needs to be verified using practical models and is kept for future work. Also, the practical implementation will be carried out for its real-time performance and cost efficiency.