Brain tumor classification using ResNet50-convolutional block attention module

Oladosu Oyebisi Oladimeji (Faculty of Engineering and Design, Atlantic Technological University, Sligo, Ireland)

Ayodeji Olusegun J. Ibitoye (School of Computing and Mathematical Science, University of Greenwich, Greenwich, UK)

Applied Computing and Informatics

ISSN: 2634-1964

Article publication date: 21 December 2023

Downloads

1181

pdf (3.3 MB)

Abstract

Purpose

Diagnosing brain tumors is a process that demands a significant amount of time and is heavily dependent on the proficiency and accumulated knowledge of radiologists. Over the traditional methods, deep learning approaches have gained popularity in automating the diagnosis of brain tumors, offering the potential for more accurate and efficient results. Notably, attention-based models have emerged as an advanced, dynamically refining and amplifying model feature to further elevate diagnostic capabilities. However, the specific impact of using channel, spatial or combined attention methods of the convolutional block attention module (CBAM) for brain tumor classification has not been fully investigated.

Design/methodology/approach

To selectively emphasize relevant features while suppressing noise, ResNet50 coupled with the CBAM (ResNet50-CBAM) was used for the classification of brain tumors in this research.

Findings

The ResNet50-CBAM outperformed existing deep learning classification methods like convolutional neural network (CNN), ResNet-CBAM achieved a superior performance of 99.43%, 99.01%, 98.7% and 99.25% in accuracy, recall, precision and AUC, respectively, when compared to the existing classification methods using the same dataset.

Practical implications

Since ResNet-CBAM fusion can capture the spatial context while enhancing feature representation, it can be integrated into the brain classification software platforms for physicians toward enhanced clinical decision-making and improved brain tumor classification.

Originality/value

This research has not been published anywhere else.

Keywords

Citation

Oladimeji, O.O. and Ibitoye, A.O.J. (2023), "Brain tumor classification using ResNet50-convolutional block attention module", Applied Computing and Informatics, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/ACI-09-2023-0022

Publisher

:

Emerald Publishing Limited

License

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

1. Introduction

All functions of the body are regulated by the brain, which also acts as the central nervous system’s command hub [1]. Hence, any brain anomaly poses a risk to an individual’s health [2]. Among the anomaly that could occur is a brain tumor, which is a deformed mass of tissue. Brain tumors can be broadly categorized into two types: malignant tumors, in which brain tissue’s cells multiply quickly and unceasingly and benign tumors, which have a relatively slow growth rate and are non-invasive [3]. There are four grades of brain tumors, based on the World Health Organization (WHO) classification; Grade I and Grade II tumors are designated as lower-grade tumors; however, Grade III and Grade IV tumors are more serious ones [4].

Brain tumor is a life-threatening condition that could even lead to death [5]. Hence, to be effectively treated, a timely and accurate diagnosis of brain tumors is necessary [6]. Magnetic resonance imaging (MRI) and computerized tomography (CT) are used for the diagnosis while a biopsy and pathological examination are then carried out to ascertain the diagnosis. MRI is the most desirable of all the image modalities since it is the only non-invasive and non-ionizing modality [7]. Manual examination of medical images for diagnosis has been discovered to be time-consuming [8], demanding and potentially error-prone as a result of patient flow [2]. Therefore, to alleviate this challenge, computer-aided diagnosis (CAD) methods have been helping neuro-oncologists in detecting, classifying and grading tumors.

Current efforts on computer-aided medical diagnosis have achieved enhanced performances due to the development of deep learning principles [9]. Deep learning approaches have been utilized to detect and classify brain tumors, one of such is [1]. Recently, deep transfer learning, a branch of artificial intelligence, has taken the lead in studies on visual categorization and object detection and image classification tasks [10]. Transfer learning has demonstrated potential in the CAD of medical issues. The use of transfer learning on the neuro-oncology subject matter has been gaining the attention of researchers and several works have used and have extracted features from brain MRI using pre-trained networks [11]. It has been revealed that transfer learning is effective with smaller datasets. Özkaraca et al. [2] used DenseNet to classify brain MRI images. Tariq and Naqvi [12] adopted efficentnetb4 to classify brain MRI images into four classes in which 98.58% accuracy was achieved. In the same vein, Al-Ani and Al-Shamma [13] used four common CNN architectures: AlexNet, VGG-16, GoogLeNet and ResNet-50, in which AlexNet performed best. Similarly, Ali et al. [14] adopted GoogLeNet, Shuffle-Net and NasNet-Mobile architectures for feature extraction after which supervised machine-learning algorithms were used for the classification in combination with Shuffle-Net and SVM has the best performance.

It has been discovered that convolutional neural networks (CNNs) learned several features in which some features are vital while others are irrelevant [15] in the prediction task as CNNs are mainly based on convolution and pooling layers for feature extraction [16]. Hence, the vital features deserve more attention. Attention-based models for brain tumor classification are dearth in the literature. The existing models are mostly based on CNNs and transfer learning [17], employed 3D-CNNs by introducing a novel network architecture designed to harness multi-channel data, while enabling the acquisition of supervised features for brain tumor classification with an accuracy of 89.9%. By segmenting brain tumors in MRI scans [18], use a fully CNN while demonstrating its effectiveness in accurately segmenting tumors. Through the merges CNN principles with classical architectural elements [19], introduced a correlation learning mechanism (CLM) designed for DNN architectures for CT brain tumor detection with 96% accuracy. Brain tumor image classification was carried out using the AlexNet, GoogLeNet and ResNet50 architectures [20]. Among these, the ResNet50 architecture demonstrated the highest accuracy rate of 85.71%. Two deep-learning models designed for detecting both binary and multiclass brain tumors were proposed by Ref. [21] using a 23-layer CNN on a publicly available dataset comprising 3,064 and 152 MRI images, alongside VGG16 architecture and accuracy of about 97.8% and 100% classification accuracy, respectively. Hence, this research aims at incorporating attention mechanism to brain tumor classification task for improved performance. Attention mechanisms have been proven to be effective in improving the identification of relevant features. Shaikh et al. [22] adapted the recurrent attention mechanism (RAM) model proposed by Minh et al. [23] for enhanced classification of biomedical images and the results showed better performance than CNNs. Similarly, the channel attention mechanism was applied by Liu and Yang [24] to concentrate on the position of the brain tissue in the image for brain tumor-classification task. However, in this research, the convolutional block attention module (CBAM) by Ref. [25] was adapted to give priority to the vital features. The rest of this paper is organized as follows: the second section showcases the description of the dataset used in this research as well as the complete structure of the proposed classification algorithm. The third section presents the experimental results of the methodology. In the fourth section, the conclusion was drawn.

2. Materials and methods

Brain tumor classification using deep learning entails employing sophisticated neural network architectures to autonomously categorize medical images of brain scans into distinct tumor types. This approach capitalizes on the ability of deep-learning models to extract complex patterns and features from raw image data, enabling precise and efficient classification. To achieve a higher level of discrimination between different brain classes, leading to improved diagnostic outcomes, here, ResNet50-CBAM fusion aims to capture both intricate features within the brain images and their contextual relationships, ultimately enhancing the model’s ability to accurately classify and identify various brain conditions. The procedures as discussed briefly below entail key stages of data gathering and preprocessing, model selection, training and testing of the deep learning model.

2.1 Data gathering and preprocessing

The dataset utilized in this research is a publicly available dataset gotten from Kaggle by Nickparvar [26]. The dataset entails 7,023 brain MR images of four classes: glioma, meningioma, no tumor and pituitary. Table 1 gives the summary of the dataset.

To have equal and compatible size as input into the model, the images were resized to 256x256 pixels. Additionally, to prevent overfitting and have the proper computation, normalization was done using min–max normalization technique. The quality of the medical images was then improved using the dynamic histogram equalization (DHE) algorithm.

2.2 Dynamic histogram equalization (DHE)

The contrast of an image is a crucial factor used to determine the image’s quality [27]. Contrast enhancement is a technique utilized to improve the visual quality of an image, making it more suitable for either human visual analysis or subsequent machine analysis. In this work, DHE [28], an algorithm used to adjust too bright or too dark images, was used for contrast enhancement. Figure 1, depicts the classes of the dataset before and after the application of DHE.

After the images were preprocessed by resizing, normalization and the histogram equalization, the model was built using the training set and tested with the testing sets.

2.3 ResNet50-CBAM model development

In this research, the residual network (ResNet50) [29], which leveraged pre-trained weights from ImageNet [30] was used to extract features from the preprocessed image and to prevent the modification of the weights in the convolutional and max-pooling layers, we froze them during training. The choice of ResNet as against other pre-trained networks is due to its superior performance and the vanishing gradient problem it addresses [31]. The extracted feature F from ResNet50 was fed into CBAM (dashed lines in Figure 2), which leverages both spatial and channel-wise attention mechanisms [32, 33]. The channel attention focuses on the importance of individual channels within the feature map, allowing the model to adaptively weigh the significance of different features. Spatial attention, on the other hand, concentrates on the relevance of spatial locations within the feature map, enabling the model to attend to specific regions of interest. However, both mechanisms work together to enhance the model’s ability to capture and leverage meaningful information from the input data.

The feature extraction process begins with the output, denoted as F from the ResNet50 architecture. This feature map has dimensions where C represents the number of channels, while H and W represent the height and width of the feature map, respectively.

The CBAM module incorporates both spatial and channel-wise attention mechanisms to refine the extracted features. To reduce the spatial dimensionality, max pooling and average pooling layers are applied to the input feature map. The global average pooling layer computes the average value of each channel across the spatial dimensions, while the global max pooling layer selects the maximum value for each channel. This process aggregates spatial information and captures unique object attributes, respectively. The channel attention map (CAM) is computed using shared dense layers, reflecting the importance of each channel in the feature map. The CAM is then element-wise multiplied with the original feature map F, resulting in a channel-refined feature map denoted as R, where each element is weighted based on its channel importance.

(1)R=CAMʘ F

This refined feature map enhances the model’s ability to emphasize relevant features within the channels. The spatial attention module focuses on specific regions of the feature map by compressing the channel-refined feature map into two 2D feature maps through maximum and average pooling operations along the channel axis. The spatial attention map is obtained by combining these 2D feature maps and is subsequently multiplied with the channel-refined feature map R. The final output of the CBAM is generated by combining both spatial and channel-wise attention. This output undergoes global average pooling, followed by a fully connected layer with SoftMax activation, resulting in the final output of the CBAM module.

3. Results and discussion

In this research, 80% of the training dataset was used for the training while the remaining 20% was used for validation and the testing dataset was used for the ResNet50-CBAM model testing. Subsequently, a five-fold cross-validation approach was also used to develop and validate the model. The performance of this model was evaluated based on accuracy, precision, recall and AUC metrics. Table 2 gives the details of the hyperparameter of the network. Different optimizers were used for the model including Adam and Stochastic Gradient Descent (SGD) as shown in Table 2. Adam is chosen for Model A to harness its adaptive learning rate feature, beneficial for handling complex loss landscapes and non-stationary gradients, leading to faster convergence and enhanced generalization [34], while SGD is adopted for Model B due to its simplicity, resource efficiency and proven effectiveness [35]. The learning rate of 0.001 was chosen to strike a balance between convergence speed and stability during the training. Finally, a batch size of 32 for Model A pairs well with the efficiency and adaptiveness of the Adam optimizer, while a smaller batch size of 16 for Model B complements the simplicity and resource efficiency of SGD. These choices align with the strengths of the respective optimizers.

Table 3 showcases the results obtained from the experiments based on the hyperparameters defined in Table 2, results of the train-test split and five-fold cross validation were given. Figure 3 shows the training and testing accuracy and loss for the models.

As in the accuracy and loss plots of the models, Adam (a) is generally known for faster initial convergence, it can also exhibit rapid fluctuations in the early stages of training [36]. On the other hand, SGD (b) may converge more gradually but with smoother progress. Additionally, the low standard deviation in performance metrics with the optimizers indicates that the model is stable and performs consistently, regardless of which optimization algorithm is used.

Figure 4 shows the receiver operating characteristic (ROC) curve plots for the models together with the AUC score of each class.

To affirm that the vital image features and their contextual relationships are learnt by these models, the feature maps are visualized as shown in Figures 5–7 (larger versions are available at https://github.com/OladosuO/AI-for-Brain-Tumor-Classification). The feature maps of the first, mid and last three layers are visualized.

As seen in the feature maps of the first three layers of the model, the early layers usually tend to capture low-level features like edges, textures and simple shapes. They respond to basic patterns in the input.

As evident in Figures 6 and 7, as the model network goes deeper, the feature maps become more abstract and represent complex patterns and parts of the brain MRIs. The deeper part of the network responds to higher-level features like textures or object-specific shapes. Particularly in Figure 6, the CBAM module has emphasized, highlighting regions with important spatial and channel-wise information. These regions and channels are expected to be more informative for making predictions while the less relevant areas and channels are de-emphasized.

Based on the impressive performance obtained, a comparative analysis of the approach with existing state-of-the-art methods in the literature that have used the same dataset used in this research was performed. The comparative results demonstrated that the ResNet50-CBAM outperformed the other techniques. Table 4 gives the details of the comparison. It is important to note that the training and evaluation methods used in these existing works were used to evaluate the ResNet50-CBAM model as shown in Table 4.

3.1 Ablation study

Furthermore, an ablation study was conducted on the model using the model a parameter setting and the 80%/20% train-test split evaluation approach. Table 5 presents the results of the ablation study. In summary, the removal of each module resulted in a decline in the performance of predicting brain tumor. When employing all components together, the proposed method demonstrates the most superior performance, underscoring the essential role of combining all components in predicting brain tumor.

Based on the results obtained, given that Model A outperformed Model B in both the test split and cross-validation, it suggests that the combination of parameters in Model A led to a more effective learning process. The use of the Adam optimizer in Model A could have played a crucial role in its superior performance. Adam adapts the learning rate individually for each parameter, which can be advantageous in optimizing complex models [36] while SGD uses a fixed learning rate for all parameters during each iteration of model training. Brain tumor classification tasks involve intricate and high-dimensional feature spaces where certain features may require more nuanced adjustments during training. The adaptability of the learning rate in the Adam optimizer addresses this challenge by adjusting the learning rates for each parameter individually and dynamically throughout the training process. It might be worthwhile to explore the impact of changing the batch size or using a different optimizer to see if the performance can be improved.

Notably, the ablation study showed that the model’s attention mechanisms enable it to selectively emphasize relevant features while suppressing noise, contributing to its exceptional performance. However, the observation that ResNet + Channel attention outperformed ResNet + Spatial attention introduces an interesting dimension to the study. It suggests that, in the context of brain tumor classification, attending to features at the channel level might be more beneficial than focusing on spatial relationships. This finding emphasizes the importance of carefully selecting and tuning attention mechanisms based on specific characteristics. It is worthy to note that while ResNet had the least performance in the ablation study; it still had better performance than some of the existing works like [37] as shown in Table 4.

In this research, the focus was on multiclass brain tumor classification for MR images using ResNet50-CBAM model. The experimental results show that our approach is superior to the state-of-the-art CNN models in terms of performance. Additionally, since MRI images has distinct features and different imaging modalities; hence, it is challenging for the pretrained model which was mostly used in previous works to effectively learn the pertinent medical brain MRI features [40]. CBAM module which added attention mechanism has helped to overcome the challenge by focusing on relevant features as shown in Figures 5–7 which improved the performance of the model.

In the context of clinical application, our results suggest that implementing the ResNet50-CBAM model in real-world settings could lead to more accurate and timely diagnoses of brain tumors. This is particularly significant in cases where early detection is crucial for treatment planning and patient outcomes. Healthcare professionals could leverage the model’s enhanced performance to streamline diagnostic processes and improve overall patient care. However, when applied in real-world clinical settings, challenges such as explainability and data privacy arise. Clinicians seek to comprehend the model’s decision-making process, making subsequent clinical validation crucial for ensuring efficacy, reliability and ethical integrity. Addressing data privacy concerns, further evaluation across diverse demographics and adopting federated learning approaches are imperative for enhancing the model’s generalizability.

As future research directions, exploring model-agnostic explanation techniques, other forms of attention mechanisms and data preprocessing techniques will contribute to the ongoing advancement of brain tumor classification models. Additionally, extending this work to 3D MRI using volumetric attention mechanisms opens avenues for more comprehensive and nuanced feature capture.

4. Conclusions

Deep learning has been playing a vital role in accurate classification of medical images. In this research, we have developed a deep learning-based approach for the classification of brain tumors in medical imaging. The proposed approach leveraged on convolutional block attention mechanism to accurately classify different types of MRI of the brain including glioma, meningioma, no tumor and pituitary classes. The experimental results of this research showed the superior performance of the convolutional block attention mechanism framework in brain tumor classification. With an accuracy of 99.43%, the model outperforms baseline methods, highlighting its effectiveness in accurately diagnosing and classifying brain tumors. The high accuracy of the proposed method can be attributed to effective data preprocessing, transfer learning and attention mechanism. As a result of the impressive performance obtained in this research, it should be integrated into the software platforms used by physicians for enhanced clinical decision-making and improved patient care. In future research, we plan to utilize additional brain tumor datasets and explore different deep learning techniques to further improve the diagnosis of brain tumors. The limitation of this model is the computational complexity; the addition of CBAM attention modules to the ResNet50 architecture introduces additional parameters and increases the model size and hence, requiring more memory for the model development. Additionally, CBAM modules perform operations such as global pooling, convolution and element-wise multiplication, all of which contribute to increased computational demand. Therefore, it would be interesting in future research to develop lightweight deep learning model with attention mechanisms for brain tumor classification. Conclusively, in a clinical setting the ResNet50-CBAM model with its ability to capture relevant features in brain MRI would provide more timely and accurate diagnoses, which can lead to more effective treatment planning and increases the chance of patients’ survival. Additionally, the reduced likelihood of false positives and false negatives could alleviate patient anxiety.

Figures

Figure 1

Visualization of before and after the application of dynamic histogram equalization (DHE)

Figure 2

The ResNet50-CBAM model architecture

Figure 3

The training and testing accuracy and loss plot for the models

Figure 4

The receiver operating characteristic (ROC) curves of the models

Figure 5

Feature maps of the first three layers

Figure 6

Feature maps of the mid three layers

Figure 7

Feature maps of the last three layers