Search results

1 – 10 of 26
Article
Publication date: 10 January 2024

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 16 December 2022

Kinjal Bhargavkumar Mistree, Devendra Thakor and Brijesh Bhatt

According to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper…

Abstract

Purpose

According to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper aims to address the issue of Indian Sign Language (ISL) sentence recognition and translation into semantically equivalent English text in a signer-independent mode.

Design/methodology/approach

This study presents an approach that translates ISL sentences into English text using the MobileNetV2 model and Neural Machine Translation (NMT). The authors have created an ISL corpus from the Brown corpus using ISL grammar rules to perform machine translation. The authors’ approach converts ISL videos of the newly created dataset into ISL gloss sequences using the MobileNetV2 model and the recognized ISL gloss sequence is then fed to a machine translation module that generates an English sentence for each ISL sentence.

Findings

As per the experimental results, pretrained MobileNetV2 model was proven the best-suited model for the recognition of ISL sentences and NMT provided better results than Statistical Machine Translation (SMT) to convert ISL text into English text. The automatic and human evaluation of the proposed approach yielded accuracies of 83.3 and 86.1%, respectively.

Research limitations/implications

It can be seen that the neural machine translation systems produced translations with repetitions of other translated words, strange translations when the total number of words per sentence is increased and one or more unexpected terms that had no relation to the source text on occasion. The most common type of error is the mistranslation of places, numbers and dates. Although this has little effect on the overall structure of the translated sentence, it indicates that the embedding learned for these few words could be improved.

Originality/value

Sign language recognition and translation is a crucial step toward improving communication between the deaf and the rest of society. Because of the shortage of human interpreters, an alternative approach is desired to help people achieve smooth communication with the Deaf. To motivate research in this field, the authors generated an ISL corpus of 13,720 sentences and a video dataset of 47,880 ISL videos. As there is no public dataset available for ISl videos incorporating signs released by ISLRTC, the authors created a new video dataset and ISL corpus.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Open Access
Article
Publication date: 5 December 2022

Kittisak Chotikkakamthorn, Panrasee Ritthipravat, Worapan Kusakunniran, Pimchanok Tuakta and Paitoon Benjapornlert

Mouth segmentation is one of the challenging tasks of development in lip reading applications due to illumination, low chromatic contrast and complex mouth appearance. Recently…

Abstract

Purpose

Mouth segmentation is one of the challenging tasks of development in lip reading applications due to illumination, low chromatic contrast and complex mouth appearance. Recently, deep learning methods effectively solved mouth segmentation problems with state-of-the-art performances. This study presents a modified Mobile DeepLabV3 based technique with a comprehensive evaluation based on mouth datasets.

Design/methodology/approach

This paper presents a novel approach to mouth segmentation by Mobile DeepLabV3 technique with integrating decode and auxiliary heads. Extensive data augmentation, online hard example mining (OHEM) and transfer learning have been applied. CelebAMask-HQ and the mouth dataset from 15 healthy subjects in the department of rehabilitation medicine, Ramathibodi hospital, are used in validation for mouth segmentation performance.

Findings

Extensive data augmentation, OHEM and transfer learning had been performed in this study. This technique achieved better performance on CelebAMask-HQ than existing segmentation techniques with a mean Jaccard similarity coefficient (JSC), mean classification accuracy and mean Dice similarity coefficient (DSC) of 0.8640, 93.34% and 0.9267, respectively. This technique also achieved better performance on the mouth dataset with a mean JSC, mean classification accuracy and mean DSC of 0.8834, 94.87% and 0.9367, respectively. The proposed technique achieved inference time usage per image of 48.12 ms.

Originality/value

The modified Mobile DeepLabV3 technique was developed with extensive data augmentation, OHEM and transfer learning. This technique gained better mouth segmentation performance than existing techniques. This makes it suitable for implementation in further lip-reading applications.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 28 February 2022

Rui Zhang, Na Zhao, Liuhu Fu, Lihu Pan, Xiaolu Bai and Renwang Song

This paper aims to propose a new ultrasonic diagnosis method for stainless steel weld defects based on multi-domain feature fusion to solve two problems in the ultrasonic…

Abstract

Purpose

This paper aims to propose a new ultrasonic diagnosis method for stainless steel weld defects based on multi-domain feature fusion to solve two problems in the ultrasonic diagnosis of austenitic stainless steel weld defects. These are insufficient feature extraction and subjective dependence of diagnosis model parameters.

Design/methodology/approach

To express the richness of the one-dimensional (1D) signal information, the 1D ultrasonic testing signal was derived to the two-dimensional (2D) time-frequency domain. Multi-scale depthwise separable convolution was also designed to optimize the MobileNetV3 network to obtain deep convolution feature information under different receptive fields. At the same time, the time/frequent-domain feature extraction of the defect signals was carried out based on statistical analysis. The defect sensitive features were screened out through visual analysis, and the defect feature set was constructed by cascading fusion with deep convolution feature information. To improve the adaptability and generalization of the diagnostic model, the authors designed and carried out research on the hyperparameter self-optimization of the diagnostic model based on the sparrow search strategy and constructed the optimal hyperparameter combination of the model. Finally, the performance of the ultrasonic diagnosis of stainless steel weld defects was improved comprehensively through the multi-domain feature characterization model of the defect data and diagnosis optimization model.

Findings

The experimental results show that the diagnostic accuracy of the lightweight diagnosis model constructed in this paper can reach 96.55% for the five types of stainless steel weld defects, including cracks, porosity, inclusion, lack of fusion and incomplete penetration. These can meet the needs of practical engineering applications.

Originality/value

This method provides a theoretical basis and technical reference for developing and applying intelligent, efficient and accurate ultrasonic defect diagnosis technology.

Open Access
Article
Publication date: 4 August 2020

Alessandra Lumini, Loris Nanni and Gianluca Maguolo

In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning…

2319

Abstract

In this paper, we present a study about an automated system for monitoring underwater ecosystems. The system here proposed is based on the fusion of different deep learning methods. We study how to create an ensemble based of different Convolutional Neural Network (CNN) models, fine-tuned on several datasets with the aim of exploiting their diversity. The aim of our study is to experiment the possibility of fine-tuning CNNs for underwater imagery analysis, the opportunity of using different datasets for pre-training models, the possibility to design an ensemble using the same architecture with small variations in the training procedure.

Our experiments, performed on 5 well-known datasets (3 plankton and 2 coral datasets) show that the combination of such different CNN models in a heterogeneous ensemble grants a substantial performance improvement with respect to other state-of-the-art approaches in all the tested problems. One of the main contributions of this work is a wide experimental evaluation of famous CNN architectures to report the performance of both the single CNN and the ensemble of CNNs in different problems. Moreover, we show how to create an ensemble which improves the performance of the best single model. The MATLAB source code is freely link provided in title page.

Details

Applied Computing and Informatics, vol. 19 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 13 February 2024

Wenzhen Yang, Shuo Shan, Mengting Jin, Yu Liu, Yang Zhang and Dongya Li

This paper aims to realize an in-situ quality inspection system rapidly for new injection molding (IM) tasks via transfer learning (TL) approach and automation technology.

Abstract

Purpose

This paper aims to realize an in-situ quality inspection system rapidly for new injection molding (IM) tasks via transfer learning (TL) approach and automation technology.

Design/methodology/approach

The proposed in-situ quality inspection system consists of an injection machine, USB camera, programmable logic controller and personal computer, interconnected via OPC or USB communication interfaces. This configuration enables seamless automation of the IM process, real-time quality inspection and automated decision-making. In addition, a MobileNet-based deep learning (DL) model is proposed for quality inspection of injection parts, fine-tuned using the TL approach.

Findings

Using the TL approach, the MobileNet-based DL model demonstrates exceptional performance, achieving validation accuracy of 99.1% with the utilization of merely 50 images per category. Its detection speed and accuracy surpass those of DenseNet121-based, VGG16-based, ResNet50-based and Xception-based convolutional neural networks. Further evaluation using a random data set of 120 images, as assessed through the confusion matrix, attests to an accuracy rate of 96.67%.

Originality/value

The proposed MobileNet-based DL model achieves higher accuracy with less resource consumption using the TL approach. It is integrated with automation technologies to build the in-situ quality inspection system of injection parts, which improves the cost-efficiency by facilitating the acquisition and labeling of task-specific images, enabling automatic defect detection and decision-making online, thus holding profound significance for the IM industry and its pursuit of enhanced quality inspection measures.

Details

Robotic Intelligence and Automation, vol. 44 no. 1
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 24 October 2023

Calvin Ling, Muhammad Taufik Azahari, Mohamad Aizat Abas and Fei Chong Ng

This paper aims to study the relationship between the ball grid array (BGA) flip-chip underfilling process parameter and its void formation region.

Abstract

Purpose

This paper aims to study the relationship between the ball grid array (BGA) flip-chip underfilling process parameter and its void formation region.

Design/methodology/approach

A set of top-down scanning acoustic microscope images of BGA underfill is collected and void labelled. The labelled images are trained with a convolutional neural network model, and the performance is evaluated. The model is tested with new images, and the void area with its region is analysed with its dispensing parameter.

Findings

All findings were well-validated with reference to the past experimental results regarding dispensing parameters and their quantitative regional formation. As the BGA is non-uniform, 85% of the test samples have void(s) formed in the emptier region. Furthermore, the highest rating factor, valve dispensing pressure with a Gini index of 0.219 and U-type dispensing pattern set of parameters generally form a lower void percentage within the underfilling, although its consistency is difficult to maintain.

Practical implications

This study enabled manufacturers to forecast the void regional formation from its filling parameters and array pattern. The filling pressure, dispensing pattern and BGA relations could provide qualitative insights to understand the void formation region in a flip-chip, enabling the prompt to formulate countermeasures to optimise voiding in a specific area in the underfill.

Originality/value

The void regional formation in a flip-chip underfilling process can be explained quantitatively with indicative parameters such as valve pressure, dispensing pattern and BGA arrangement.

Details

Soldering & Surface Mount Technology, vol. 36 no. 1
Type: Research Article
ISSN: 0954-0911

Keywords

Article
Publication date: 27 January 2023

Yawen Li, Guangming Song, Shuang Hao, Juzheng Mao and Aiguo Song

The prerequisite for most traditional visual simultaneous localization and mapping (V-SLAM) algorithms is that most objects in the environment should be static or in low-speed…

Abstract

Purpose

The prerequisite for most traditional visual simultaneous localization and mapping (V-SLAM) algorithms is that most objects in the environment should be static or in low-speed locomotion. These algorithms rely on geometric information of the environment and restrict the application scenarios with dynamic objects. Semantic segmentation can be used to extract deep features from images to identify dynamic objects in the real world. Therefore, V-SLAM fused with semantic information can reduce the influence from dynamic objects and achieve higher accuracy. This paper aims to present a new semantic stereo V-SLAM method toward outdoor dynamic environments for more accurate pose estimation.

Design/methodology/approach

First, the Deeplabv3+ semantic segmentation model is adopted to recognize semantic information about dynamic objects in the outdoor scenes. Second, an approach that combines prior knowledge to determine the dynamic hierarchy of moveable objects is proposed, which depends on the pixel movement between frames. Finally, a semantic stereo V-SLAM based on ORB-SLAM2 to calculate accurate trajectory in dynamic environments is presented, which selects corresponding feature points on static regions and eliminates useless feature points on dynamic regions.

Findings

The proposed method is successfully verified on the public data set KITTI and ZED2 self-collected data set in the real world. The proposed V-SLAM system can extract the semantic information and track feature points steadily in dynamic environments. Absolute pose error and relative pose error are used to evaluate the feasibility of the proposed method. Experimental results show significant improvements in root mean square error and standard deviation error on both the KITTI data set and an unmanned aerial vehicle. That indicates this method can be effectively applied to outdoor environments.

Originality/value

The main contribution of this study is that a new semantic stereo V-SLAM method is proposed with greater robustness and stability, which reduces the impact of moving objects in dynamic scenes.

Details

Industrial Robot: the international journal of robotics research and application, vol. 50 no. 3
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 15 January 2024

Faris Elghaish, Sandra Matarneh, Essam Abdellatef, Farzad Rahimian, M. Reza Hosseini and Ahmed Farouk Kineber

Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly…

Abstract

Purpose

Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements.

Design/methodology/approach

To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models.

Findings

The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance.

Practical implications

With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation.

Originality/value

The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.

Details

Smart and Sustainable Built Environment, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2046-6099

Keywords

1 – 10 of 26