Search results

1 – 10 of 765

View access options

Article

Publication date: 10 January 2024

On the differences between CNNs and vision transformers for COVID-19 diagnosis using CT and chest x-ray mono- and multimodality

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

HTML

PDF (1.4 MB)

Downloads

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 28 December 2023

Prediction of multi-physics field distribution on gas turbine endwall using an optimized surrogate model with various deep learning frames

Weixin Zhang, Zhao Liu, Yu Song, Yixuan Lu and Zhenping Feng

To improve the speed and accuracy of turbine blade film cooling design process, the most advanced deep learning models were introduced into this study to investigate the most…

HTML

PDF (3.9 MB)

Downloads

Abstract

Purpose

To improve the speed and accuracy of turbine blade film cooling design process, the most advanced deep learning models were introduced into this study to investigate the most suitable define for prediction work. This paper aims to create a generative surrogate model that can be applied on multi-objective optimization problems.

Design/methodology/approach

The latest backbone in the field of computer vision (Swin-Transformer, 2021) was introduced and improved as the surrogate function for prediction of the multi-physics field distribution (film cooling effectiveness, pressure, density and velocity). The basic samples were generated by Latin hypercube sampling method and the numerical method adopt for the calculation was validated experimentally at first. The training and testing samples were calculated at experimental conditions. At last, the surrogate model predicted results were verified by experiment in a linear cascade.

Findings

The results indicated that comparing with the Multi-Scale Pix2Pix Model, the Swin-Transformer U-Net model presented higher accuracy and computing speed on the prediction of contour results. The computation time for each step of the Swin-Transformer U-Net model is one-third of the original model, especially in the case of multi-physics field prediction. The correlation index reached more than 99.2% and the first-order error was lower than 0.3% for multi-physics field. The predictions of the data-driven surrogate model are consistent with the predictions of the computational fluid dynamics results, and both are very close to the experimental results. The application of the Swin-Transformer model on enlarging the different structure samples will reduce the cost of numerical calculations as well as experiments.

Research limitations/implications

The number of U-Net layers and sample scales has a proper relationship according to equation (8). Too many layers of U-Net will lead to unnecessary nonlinear variation, whereas too few layers will lead to insufficient feature extraction. In the case of Swin-Transformer U-Net model, incorrect number of U-Net layer will reduce the prediction accuracy. The multi-scale Pix2Pix model owns higher accuracy in predicting a single physical field, but the calculation speed is too slow. The Swin-Transformer model is fast in prediction and training (nearly three times faster than multi Pix2Pix model), but the predicted contours have more noise. The neural network predicted results and numerical calculations are consistent with the experimental distribution.

Originality/value

This paper creates a generative surrogate model that can be applied on multi-objective optimization problems. The generative adversarial networks using new backbone is chosen to adjust the output from single contour to multi-physics fields, which will generate more results simultaneously than traditional surrogate models and reduce the time-cost. And it is more applicable to multi-objective spatial optimization algorithms. The Swin-Transformer surrogate model is three times faster to computation speed than the Multi Pix2Pix model. In the prediction results of multi-physics fields, the prediction results of the Swin-Transformer model are more accurate.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0961-5539

Keywords

View access options

Article

Publication date: 28 December 2023

A transformer-based deep learning method for automatic pixel-level crack detection and feature quantification

Ankang Ji, Xiaolong Xue, Limao Zhang, Xiaowei Luo and Qingpeng Man

Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack…

HTML

PDF (7.1 MB)

Downloads

152

Abstract

Purpose

Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack contributes to establishing an appropriate road maintenance and repair strategy from the promptly informed managers but still remaining a significant challenge. This research seeks to propose practical solutions for targeting the automatic crack detection from images with efficient productivity and cost-effectiveness, thereby improving the pavement performance.

Design/methodology/approach

This research applies a novel deep learning method named TransUnet for crack detection, which is structured based on Transformer, combined with convolutional neural networks as encoder by leveraging a global self-attention mechanism to better extract features for enhancing automatic identification. Afterward, the detected cracks are used to quantify morphological features from five indicators, such as length, mean width, maximum width, area and ratio. Those analyses can provide valuable information for engineers to assess the pavement condition with efficient productivity.

Findings

In the training process, the TransUnet is fed by a crack dataset generated by the data augmentation with a resolution of 224 × 224 pixels. Subsequently, a test set containing 80 new images is used for crack detection task based on the best selected TransUnet with a learning rate of 0.01 and a batch size of 1, achieving an accuracy of 0.8927, a precision of 0.8813, a recall of 0.8904, an F1-measure and dice of 0.8813, and a Mean Intersection over Union of 0.8082, respectively. Comparisons with several state-of-the-art methods indicate that the developed approach in this research outperforms with greater efficiency and higher reliability.

Originality/value

The developed approach combines TransUnet with an integrated quantification algorithm for crack detection and quantification, performing excellently in terms of comparisons and evaluation metrics, which can provide solutions with potentially serving as the basis for an automated, cost-effective pavement condition assessment scheme.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 13 July 2022

A deep learning counting model applied to quality control

Juan R. Jaramillo

This paper aims to present two different methods to speed up a test used in the sanitary ware industry that requires to count the number of granules that remains in the commodity…

HTML

PDF (819 KB)

Downloads

Abstract

Purpose

This paper aims to present two different methods to speed up a test used in the sanitary ware industry that requires to count the number of granules that remains in the commodity after flushing. The test requires that 2,500 granules are added to the lavatory and less than 125 remain.

Design/methodology/approach

The problem is approached using two deep learning computer vision (CV) models. The first model is a Vision Transformers (ViT) classification approach and the second one is a U-Net paired with a connected components algorithm. Both models are trained and evaluated using a proprietary data set of 3,518 labeled images, and performance is compared.

Findings

It was found that both algorithms are able to produce competitive solutions. The U-Net algorithm achieves accuracy levels above 94% and the ViT model reach accuracy levels above 97%. At this time, the U-Net algorithm is being piloted and the ViT pilot is at the planning stage.

Originality/value

To the best of the authors’ knowledge, this is the first approach using CV to solve the granules problem applying ViT. In addition, this work updates the U-Net-Connected components algorithm and compares the results of both algorithms.

Details

Journal of Modelling in Management, vol. 18 no. 5

Type: Research Article

DOI:

ISSN: 1746-5664

Keywords

View access options

Article

Publication date: 17 July 2023

BesNet: binocular ferrographic image recognition model based on deep learning technology

Fei Xie and Haijun Wei

Using computer technology to realize ferrographic intelligent fault diagnosis technology is fundamental research to inspect the operation of mechanical equipment. This study aims…

HTML

PDF (1.1 MB)

Downloads

Abstract

Purpose

Using computer technology to realize ferrographic intelligent fault diagnosis technology is fundamental research to inspect the operation of mechanical equipment. This study aims to effectively improve the technology of deep learning technology in the field of ferrographic image recognition.

Design/methodology/approach

This paper proposes a binocular image classification model to solve ferrographic image classification problems.

Findings

This paper creatively proposes a binocular model (BesNet model). The model presents a more extreme situation. On the one hand, the model is almost unable to identify cutting wear particles. On the other hand, the model can achieve 100% accuracy in identifying Chunky and Nonferrous wear particles. The BesNet model is a bionic model of the human eye, and the used training image is a specially processed parallax image. After combining the MCECNN model, it is changed to BMECNN model, and its classification accuracy has reached the highest level in the industry.

Originality/value

The work presented in this thesis is original, except as acknowledged in the text. The material has not been submitted, either in whole or in part, for a degree at this or any other university. The BesNet model developed in this article is a brand new system for ferrographic image recognition. The BesNet model adopts a method of imitating the eyes to view ferrography images, and its image processing method is also unique. After combining the MCECNN model, it is changed to BMECNN model, and its classification accuracy has reached the highest level in the industry.

Peer review

The peer review history for this article is available at: https://publons.com/publon/10.1108/ILT-05-2023-0150/

Details

Industrial Lubrication and Tribology, vol. 75 no. 6

Type: Research Article

DOI:

ISSN: 0036-8792

Keywords

Open Access

Article

Publication date: 13 July 2022

Development and testing of an image transformer for explainable autonomous driving systems

Jiqian Dong, Sikai Chen, Mohammad Miralinaghi, Tiantian Chen and Samuel Labi

Perception has been identified as the main cause underlying most autonomous vehicle related accidents. As the key technology in perception, deep learning (DL) based computer vision…

HTML

PDF (2.8 MB)

Downloads

916

Abstract

Purpose

Perception has been identified as the main cause underlying most autonomous vehicle related accidents. As the key technology in perception, deep learning (DL) based computer vision models are generally considered to be black boxes due to poor interpretability. These have exacerbated user distrust and further forestalled their widespread deployment in practical usage. This paper aims to develop explainable DL models for autonomous driving by jointly predicting potential driving actions with corresponding explanations. The explainable DL models can not only boost user trust in autonomy but also serve as a diagnostic approach to identify any model deficiencies or limitations during the system development phase.

Design/methodology/approach

This paper proposes an explainable end-to-end autonomous driving system based on “Transformer,” a state-of-the-art self-attention (SA) based model. The model maps visual features from images collected by onboard cameras to guide potential driving actions with corresponding explanations, and aims to achieve soft attention over the image’s global features.

Findings

The results demonstrate the efficacy of the proposed model as it exhibits superior performance (in terms of correct prediction of actions and explanations) compared to the benchmark model by a significant margin with much lower computational cost on a public data set (BDD-OIA). From the ablation studies, the proposed SA module also outperforms other attention mechanisms in feature fusion and can generate meaningful representations for downstream prediction.

Originality/value

In the contexts of situational awareness and driver assistance, the proposed model can perform as a driving alarm system for both human-driven vehicles and autonomous vehicles because it is capable of quickly understanding/characterizing the environment and identifying any infeasible driving actions. In addition, the extra explanation head of the proposed model provides an extra channel for sanity checks to guarantee that the model learns the ideal causal relationships. This provision is critical in the development of autonomous systems.

Details

Journal of Intelligent and Connected Vehicles, vol. 5 no. 3

Type: Research Article

DOI:

ISSN: 2399-9802

Keywords

Open Access

Article

Publication date: 23 November 2023

Arabic stance detection of COVID-19 vaccination using transformer-based approaches: a comparison study

Reema Khaled AlRowais and Duaa Alsaeed

Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of…

HTML

PDF (2 MB)

Downloads

273

Abstract

Purpose

Automatically extracting stance information from natural language texts is a significant research problem with various applications, particularly after the recent explosion of data on the internet via platforms like social media sites. Stance detection system helps determine whether the author agree, against or has a neutral opinion with the given target. Most of the research in stance detection focuses on the English language, while few research was conducted on the Arabic language.

Design/methodology/approach

This paper aimed to address stance detection on Arabic tweets by building and comparing different stance detection models using four transformers, namely: Araelectra, MARBERT, AraBERT and Qarib. Using different weights for these transformers, the authors performed extensive experiments fine-tuning the task of stance detection Arabic tweets with the four different transformers.

Findings

The results showed that the AraBERT model learned better than the other three models with a 70% F1 score followed by the Qarib model with a 68% F1 score.

Research limitations/implications

A limitation of this study is the imbalanced dataset and the limited availability of annotated datasets of SD in Arabic.

Originality/value

Provide comprehensive overview of the current resources for stance detection in the literature, including datasets and machine learning methods used. Therefore, the authors examined the models to analyze and comprehend the obtained findings in order to make recommendations for the best performance models for the stance detection task.

Details

Arab Gulf Journal of Scientific Research, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1985-9899

Keywords

View access options

Article

Publication date: 30 November 2021

A robust framework for shoulder implant X-ray image classification

Minh Thanh Vo, Anh H. Vo and Tuong Le

Medical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary…

HTML

PDF (2.5 MB)

Downloads

202

Abstract

Purpose

Medical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.

Design/methodology/approach

This study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.

Findings

Experiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.

Originality/value

The proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.

Details

Data Technologies and Applications, vol. 56 no. 3

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 19 October 2023

MBC-Net: long-range enhanced feature fusion for classifying remote sensing images

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

HTML

PDF (5.9 MB)

Downloads

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 14 December 2023

Efficient knowledge distillation for remote sensing image classification: a CNN-based approach

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

HTML

PDF (3.4 MB)

Downloads

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

Access

Year

Content type

1 – 10 of 765

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Peer review

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey