Search results
1 – 10 of 765Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán
COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…
Abstract
Purpose
COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).
Design/methodology/approach
This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.
Findings
Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.
Originality/value
Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.
Details
Keywords
Weixin Zhang, Zhao Liu, Yu Song, Yixuan Lu and Zhenping Feng
To improve the speed and accuracy of turbine blade film cooling design process, the most advanced deep learning models were introduced into this study to investigate the most…
Abstract
Purpose
To improve the speed and accuracy of turbine blade film cooling design process, the most advanced deep learning models were introduced into this study to investigate the most suitable define for prediction work. This paper aims to create a generative surrogate model that can be applied on multi-objective optimization problems.
Design/methodology/approach
The latest backbone in the field of computer vision (Swin-Transformer, 2021) was introduced and improved as the surrogate function for prediction of the multi-physics field distribution (film cooling effectiveness, pressure, density and velocity). The basic samples were generated by Latin hypercube sampling method and the numerical method adopt for the calculation was validated experimentally at first. The training and testing samples were calculated at experimental conditions. At last, the surrogate model predicted results were verified by experiment in a linear cascade.
Findings
The results indicated that comparing with the Multi-Scale Pix2Pix Model, the Swin-Transformer U-Net model presented higher accuracy and computing speed on the prediction of contour results. The computation time for each step of the Swin-Transformer U-Net model is one-third of the original model, especially in the case of multi-physics field prediction. The correlation index reached more than 99.2% and the first-order error was lower than 0.3% for multi-physics field. The predictions of the data-driven surrogate model are consistent with the predictions of the computational fluid dynamics results, and both are very close to the experimental results. The application of the Swin-Transformer model on enlarging the different structure samples will reduce the cost of numerical calculations as well as experiments.
Research limitations/implications
The number of U-Net layers and sample scales has a proper relationship according to equation (8). Too many layers of U-Net will lead to unnecessary nonlinear variation, whereas too few layers will lead to insufficient feature extraction. In the case of Swin-Transformer U-Net model, incorrect number of U-Net layer will reduce the prediction accuracy. The multi-scale Pix2Pix model owns higher accuracy in predicting a single physical field, but the calculation speed is too slow. The Swin-Transformer model is fast in prediction and training (nearly three times faster than multi Pix2Pix model), but the predicted contours have more noise. The neural network predicted results and numerical calculations are consistent with the experimental distribution.
Originality/value
This paper creates a generative surrogate model that can be applied on multi-objective optimization problems. The generative adversarial networks using new backbone is chosen to adjust the output from single contour to multi-physics fields, which will generate more results simultaneously than traditional surrogate models and reduce the time-cost. And it is more applicable to multi-objective spatial optimization algorithms. The Swin-Transformer surrogate model is three times faster to computation speed than the Multi Pix2Pix model. In the prediction results of multi-physics fields, the prediction results of the Swin-Transformer model are more accurate.
Details
Keywords
Ankang Ji, Xiaolong Xue, Limao Zhang, Xiaowei Luo and Qingpeng Man
Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack…
Abstract
Purpose
Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack contributes to establishing an appropriate road maintenance and repair strategy from the promptly informed managers but still remaining a significant challenge. This research seeks to propose practical solutions for targeting the automatic crack detection from images with efficient productivity and cost-effectiveness, thereby improving the pavement performance.
Design/methodology/approach
This research applies a novel deep learning method named TransUnet for crack detection, which is structured based on Transformer, combined with convolutional neural networks as encoder by leveraging a global self-attention mechanism to better extract features for enhancing automatic identification. Afterward, the detected cracks are used to quantify morphological features from five indicators, such as length, mean width, maximum width, area and ratio. Those analyses can provide valuable information for engineers to assess the pavement condition with efficient productivity.
Findings
In the training process, the TransUnet is fed by a crack dataset generated by the data augmentation with a resolution of 224 × 224 pixels. Subsequently, a test set containing 80 new images is used for crack detection task based on the best selected TransUnet with a learning rate of 0.01 and a batch size of 1, achieving an accuracy of 0.8927, a precision of 0.8813, a recall of 0.8904, an F1-measure and dice of 0.8813, and a Mean Intersection over Union of 0.8082, respectively. Comparisons with several state-of-the-art methods indicate that the developed approach in this research outperforms with greater efficiency and higher reliability.
Originality/value
The developed approach combines TransUnet with an integrated quantification algorithm for crack detection and quantification, performing excellently in terms of comparisons and evaluation metrics, which can provide solutions with potentially serving as the basis for an automated, cost-effective pavement condition assessment scheme.
Details
Keywords
This paper aims to present two different methods to speed up a test used in the sanitary ware industry that requires to count the number of granules that remains in the commodity…
Abstract
Purpose
This paper aims to present two different methods to speed up a test used in the sanitary ware industry that requires to count the number of granules that remains in the commodity after flushing. The test requires that 2,500 granules are added to the lavatory and less than 125 remain.
Design/methodology/approach
The problem is approached using two deep learning computer vision (CV) models. The first model is a Vision Transformers (ViT) classification approach and the second one is a U-Net paired with a connected components algorithm. Both models are trained and evaluated using a proprietary data set of 3,518 labeled images, and performance is compared.
Findings
It was found that both algorithms are able to produce competitive solutions. The U-Net algorithm achieves accuracy levels above 94% and the ViT model reach accuracy levels above 97%. At this time, the U-Net algorithm is being piloted and the ViT pilot is at the planning stage.
Originality/value
To the best of the authors’ knowledge, this is the first approach using CV to solve the granules problem applying ViT. In addition, this work updates the U-Net-Connected components algorithm and compares the results of both algorithms.
Details
Keywords
Fei Xie and Haijun Wei
Using computer technology to realize ferrographic intelligent fault diagnosis technology is fundamental research to inspect the operation of mechanical equipment. This study aims…
Abstract
Purpose
Using computer technology to realize ferrographic intelligent fault diagnosis technology is fundamental research to inspect the operation of mechanical equipment. This study aims to effectively improve the technology of deep learning technology in the field of ferrographic image recognition.
Design/methodology/approach
This paper proposes a binocular image classification model to solve ferrographic image classification problems.
Findings
This paper creatively proposes a binocular model (BesNet model). The model presents a more extreme situation. On the one hand, the model is almost unable to identify cutting wear particles. On the other hand, the model can achieve 100% accuracy in identifying Chunky and Nonferrous wear particles. The BesNet model is a bionic model of the human eye, and the used training image is a specially processed parallax image. After combining the MCECNN model, it is changed to BMECNN model, and its classification accuracy has reached the highest level in the industry.
Originality/value
The work presented in this thesis is original, except as acknowledged in the text. The material has not been submitted, either in whole or in part, for a degree at this or any other university. The BesNet model developed in this article is a brand new system for ferrographic image recognition. The BesNet model adopts a method of imitating the eyes to view ferrography images, and its image processing method is also unique. After combining the MCECNN model, it is changed to BMECNN model, and its classification accuracy has reached the highest level in the industry.
Peer review
The peer review history for this article is available at: https://publons.com/publon/10.1108/ILT-05-2023-0150/
Details
Keywords
Jiqian Dong, Sikai Chen, Mohammad Miralinaghi, Tiantian Chen and Samuel Labi
Perception has been identified as the main cause underlying most autonomous vehicle related accidents. As the key technology in perception, deep learning (DL) based computer vision…
Abstract
Purpose
Perception has been identified as the main cause underlying most autonomous vehicle related accidents. As the key technology in perception, deep learning (DL) based computer vision models are generally considered to be black boxes due to poor interpretability. These have exacerbated user distrust and further forestalled their widespread deployment in practical usage. This paper aims to develop explainable DL models for autonomous driving by jointly predicting potential driving actions with corresponding explanations. The explainable DL models can not only boost user trust in autonomy but also serve as a diagnostic approach to identify any model deficiencies or limitations during the system development phase.
Design/methodology/approach
This paper proposes an explainable end-to-end autonomous driving system based on “Transformer,” a state-of-the-art self-attention (SA) based model. The model maps visual features from images collected by onboard cameras to guide potential driving actions with corresponding explanations, and aims to achieve soft attention over the image’s global features.
Findings
The results demonstrate the efficacy of the proposed model as it exhibits superior performance (in terms of correct prediction of actions and explanations) compared to the benchmark model by a significant margin with much lower computational cost on a public data set (BDD-OIA). From the ablation studies, the proposed SA module also outperforms other attention mechanisms in feature fusion and can generate meaningful representations for downstream prediction.
Originality/value
In the contexts of situational awareness and driver assistance, the proposed model can perform as a driving alarm system for both human-driven vehicles and autonomous vehicles because it is capable of quickly understanding/characterizing the environment and identifying any infeasible driving actions. In addition, the extra explanation head of the proposed model provides an extra channel for sanity checks to guarantee that the model learns the ideal causal relationships. This provision is critical in the development of autonomous systems.
Details
Keywords
Reema Khaled AlRowais and Duaa Alsaeed
Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of…
Abstract
Purpose
Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of data on the internet via platforms like social media sites. Stance detection system helps determine whether the author agree, against or has a neutral opinion with the given target. Most of the research in stance detection focuses on the English language, while few research was conducted on the Arabic language.
Design/methodology/approach
This paper aimed to address stance detection on Arabic tweets by building and comparing different stance detection models using four transformers, namely: Araelectra, MARBERT, AraBERT and Qarib. Using different weights for these transformers, the authors performed extensive experiments fine-tuning the task of stance detection Arabic tweets with the four different transformers.
Findings
The results showed that the AraBERT model learned better than the other three models with a 70% F1 score followed by the Qarib model with a 68% F1 score.
Research limitations/implications
A limitation of this study is the imbalanced dataset and the limited availability of annotated datasets of SD in Arabic.
Originality/value
Provide comprehensive overview of the current resources for stance detection in the literature, including datasets and machine learning methods used. Therefore, the authors examined the models to analyze and comprehend the obtained findings in order to make recommendations for the best performance models for the stance detection task.
Details
Keywords
Minh Thanh Vo, Anh H. Vo and Tuong Le
Medical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary…
Abstract
Purpose
Medical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.
Design/methodology/approach
This study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.
Findings
Experiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.
Originality/value
The proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.
Details
Keywords
Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…
Abstract
Purpose
Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.
Design/methodology/approach
This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.
Findings
Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.
Originality/value
MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.
Details
Keywords
Huaxiang Song, Chai Wei and Zhou Yong
The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…
Abstract
Purpose
The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.
Design/methodology/approach
This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.
Findings
This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.
Originality/value
This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.
Details