Search results

1 – 10 of 646
Article
Publication date: 21 August 2023

Minghao Wang, Ming Cong, Yu Du, Dong Liu and Xiaojing Tian

The purpose of this study is to solve the problem of an unknown initial position in a multi-robot raster map fusion. The method includes two-dimensional (2D) raster maps and…

Abstract

Purpose

The purpose of this study is to solve the problem of an unknown initial position in a multi-robot raster map fusion. The method includes two-dimensional (2D) raster maps and three-dimensional (3D) point cloud maps.

Design/methodology/approach

A fusion method using multiple algorithms was proposed. For 2D raster maps, this method uses accelerated robust feature detection to extract feature points of multi-raster maps, and then feature points are matched using a two-step algorithm of minimum Euclidean distance and adjacent feature relation. Finally, the random sample consensus algorithm was used for redundant feature fusion. On the basis of 2D raster map fusion, the method of coordinate alignment is used for 3D point cloud map fusion.

Findings

To verify the effectiveness of the algorithm, the segmentation mapping method (2D raster map) and the actual robot mapping method (2D raster map and 3D point cloud map) were used for experimental verification. The experiments demonstrated the stability and reliability of the proposed algorithm.

Originality/value

This algorithm uses a new visual method with coordinate alignment to process the raster map, which can effectively solve the problem of the demand for the initial relative position of robots in traditional methods and be more adaptable to the fusion of 3D maps. In addition, the original data of the map can come from different types of robots, which greatly improves the universality of the algorithm.

Details

Robotic Intelligence and Automation, vol. 43 no. 5
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 9 July 2024

Zengkun Liu and Justine Hui

This study aims to introduce an innovative approach to predictive maintenance by integrating time-series sensor data with event logs, leveraging the synergistic potential of deep…

Abstract

Purpose

This study aims to introduce an innovative approach to predictive maintenance by integrating time-series sensor data with event logs, leveraging the synergistic potential of deep learning models. The primary goal is to enhance the accuracy of equipment failure predictions, thereby minimizing operational downtime.

Design/methodology/approach

The methodology uses a dual-model architecture, combining the patch time series transformer (PatchTST) model for analyzing time-series sensor data and bidirectional encoder representations from transformers for processing textual event log data. Two distinct fusion strategies, namely, early and late fusion, are explored to integrate these data sources effectively. The early fusion approach merges data at the initial stages of processing, while late fusion combines model outputs toward the end. This research conducts thorough experiments using real-world data from wind turbines to validate the approach.

Findings

The results demonstrate a significant improvement in fault prediction accuracy, with early fusion strategies outperforming traditional methods by 2.6% to 16.9%. Late fusion strategies, while more stable, underscore the benefit of integrating diverse data types for predictive maintenance. The study provides empirical evidence of the superiority of the fusion-based methodology over singular data source approaches.

Originality/value

This research is distinguished by its novel fusion-based approach to predictive maintenance, marking a departure from conventional single-source data analysis methods. By incorporating both time-series sensor data and textual event logs, the study unveils a comprehensive and effective strategy for fault prediction, paving the way for future advancements in the field.

Details

Sensor Review, vol. 44 no. 5
Type: Research Article
ISSN: 0260-2288

Keywords

Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 November 2023

Juan Yang, Zhenkun Li and Xu Du

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…

Abstract

Purpose

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.

Design/methodology/approach

A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.

Findings

Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.

Originality/value

The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Article
Publication date: 9 July 2024

Zengrui Zheng, Kainan Su, Shifeng Lin, Zhiquan Fu and Chenguang Yang

Visual simultaneous localization and mapping (SLAM) has limitations such as sensitivity to lighting changes and lower measurement accuracy. The effective fusion of information…

Abstract

Purpose

Visual simultaneous localization and mapping (SLAM) has limitations such as sensitivity to lighting changes and lower measurement accuracy. The effective fusion of information from multiple modalities to address these limitations has emerged as a key research focus. This study aims to provide a comprehensive review of the development of vision-based SLAM (including visual SLAM) for navigation and pose estimation, with a specific focus on techniques for integrating multiple modalities.

Design/methodology/approach

This paper initially introduces the mathematical models and framework development of visual SLAM. Subsequently, this paper presents various methods for improving accuracy in visual SLAM by fusing different spatial and semantic features. This paper also examines the research advancements in vision-based SLAM with respect to multi-sensor fusion in both loosely coupled and tightly coupled approaches. Finally, this paper analyzes the limitations of current vision-based SLAM and provides predictions for future advancements.

Findings

The combination of vision-based SLAM and deep learning has significant potential for development. There are advantages and disadvantages to both loosely coupled and tightly coupled approaches in multi-sensor fusion, and the most suitable algorithm should be chosen based on the specific application scenario. In the future, vision-based SLAM is evolving toward better addressing challenges such as resource-limited platforms and long-term mapping.

Originality/value

This review introduces the development of vision-based SLAM and focuses on the advancements in multimodal fusion. It allows readers to quickly understand the progress and current status of research in this field.

Details

Robotic Intelligence and Automation, vol. 44 no. 4
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 10 January 2024

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. 58 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 2 November 2023

Khaled Hamed Alyoubi, Fahd Saleh Alotaibi, Akhil Kumar, Vishal Gupta and Akashdeep Sharma

The purpose of this paper is to describe a new approach to sentence representation learning leading to text classification using Bidirectional Encoder Representations from…

Abstract

Purpose

The purpose of this paper is to describe a new approach to sentence representation learning leading to text classification using Bidirectional Encoder Representations from Transformers (BERT) embeddings. This work proposes a novel BERT-convolutional neural network (CNN)-based model for sentence representation learning and text classification. The proposed model can be used by industries that work in the area of classification of similarity scores between the texts and sentiments and opinion analysis.

Design/methodology/approach

The approach developed is based on the use of the BERT model to provide distinct features from its transformer encoder layers to the CNNs to achieve multi-layer feature fusion. To achieve multi-layer feature fusion, the distinct feature vectors of the last three layers of the BERT are passed to three separate CNN layers to generate a rich feature representation that can be used for extracting the keywords in the sentences. For sentence representation learning and text classification, the proposed model is trained and tested on the Stanford Sentiment Treebank-2 (SST-2) data set for sentiment analysis and the Quora Question Pair (QQP) data set for sentence classification. To obtain benchmark results, a selective training approach has been applied with the proposed model.

Findings

On the SST-2 data set, the proposed model achieved an accuracy of 92.90%, whereas, on the QQP data set, it achieved an accuracy of 91.51%. For other evaluation metrics such as precision, recall and F1 Score, the results obtained are overwhelming. The results with the proposed model are 1.17%–1.2% better as compared to the original BERT model on the SST-2 and QQP data sets.

Originality/value

The novelty of the proposed model lies in the multi-layer feature fusion between the last three layers of the BERT model with CNN layers and the selective training approach based on gated pruning to achieve benchmark results.

Details

Robotic Intelligence and Automation, vol. 43 no. 6
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 3 August 2023

Yandong Hou, Zhengbo Wu, Xinghua Ren, Kaiwen Liu and Zhengquan Chen

High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the…

Abstract

Purpose

High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the semantic segmentation task challenging. In this paper, a bidirectional feature fusion network (BFFNet) is designed to address this challenge, which aims at increasing the accurate recognition of surface objects in order to effectively classify special features.

Design/methodology/approach

There are two main crucial elements in BFFNet. Firstly, the mean-weighted module (MWM) is used to obtain the key features in the main network. Secondly, the proposed polarization enhanced branch network performs feature extraction simultaneously with the main network to obtain different feature information. The authors then fuse these two features in both directions while applying a cross-entropy loss function to monitor the network training process. Finally, BFFNet is validated on two publicly available datasets, Potsdam and Vaihingen.

Findings

In this paper, a quantitative analysis method is used to illustrate that the proposed network achieves superior performance of 2–6%, respectively, compared to other mainstream segmentation networks from experimental results on two datasets. Complete ablation experiments are also conducted to demonstrate the effectiveness of the elements in the network. In summary, BFFNet has proven to be effective in achieving accurate identification of small objects and in reducing the effect of shadows on the segmentation process.

Originality/value

The originality of the paper is the proposal of a BFFNet based on multi-scale and multi-attention strategies to improve the ability to accurately segment high-resolution and complex remote sensing images, especially for small objects and shadow-obscured objects.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Open Access
Article
Publication date: 16 January 2024

Pengyue Guo, Tianyun Shi, Zhen Ma and Jing Wang

The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera…

Abstract

Purpose

The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera to improve the accuracy of object recognition in dark and harsh weather conditions.

Design/methodology/approach

This paper adopts the fusion strategy of radar and camera linkage to achieve focus amplification of long-distance targets and solves the problem of low illumination by laser light filling of the focus point. In order to improve the recognition effect, this paper adopts the YOLOv8 algorithm for multi-scale target recognition. In addition, for the image distortion caused by bad weather, this paper proposes a linkage and tracking fusion strategy to output the correct alarm results.

Findings

Simulated intrusion tests show that the proposed method can effectively detect human intrusion within 0–200 m during the day and night in sunny weather and can achieve more than 80% recognition accuracy for extreme severe weather conditions.

Originality/value

(1) The authors propose a personnel intrusion monitoring scheme based on the fusion of millimeter wave radar and camera, achieving all-weather intrusion monitoring; (2) The authors propose a new multi-level fusion algorithm based on linkage and tracking to achieve intrusion target monitoring under adverse weather conditions; (3) The authors have conducted a large number of innovative simulation experiments to verify the effectiveness of the method proposed in this article.

Details

Railway Sciences, vol. 3 no. 1
Type: Research Article
ISSN: 2755-0907

Keywords

Article
Publication date: 7 May 2024

Xinzhe Li, Qinglong Li, Dasom Jeong and Jaekyeong Kim

Most previous studies predicting review helpfulness ignored the significance of deep features embedded in review text and instead relied on hand-crafted features. Hand-crafted and…

Abstract

Purpose

Most previous studies predicting review helpfulness ignored the significance of deep features embedded in review text and instead relied on hand-crafted features. Hand-crafted and deep features have the advantages of high interpretability and predictive accuracy. This study aims to propose a novel review helpfulness prediction model that uses deep learning (DL) techniques to consider the complementarity between hand-crafted and deep features.

Design/methodology/approach

First, an advanced convolutional neural network was applied to extract deep features from unstructured review text. Second, this study used previous studies to extract hand-crafted features that impact the helpfulness of reviews and enhance their interpretability. Third, this study incorporated deep and hand-crafted features into a review helpfulness prediction model and evaluated its performance using the Yelp.com data set. To measure the performance of the proposed model, this study used 2,417,796 restaurant reviews.

Findings

Extensive experiments confirmed that the proposed methodology performs better than traditional machine learning methods. Moreover, this study confirms through an empirical analysis that combining hand-crafted and deep features demonstrates better prediction performance.

Originality/value

To the best of the authors’ knowledge, this is one of the first studies to apply DL techniques and use structured and unstructured data to predict review helpfulness in the restaurant context. In addition, an advanced feature-fusion method was adopted to better use the extracted feature information and identify the complementarity between features.

研究目的

大多数先前预测评论有用性的研究忽视了嵌入在评论文本中的深层特征的重要性, 而主要依赖手工制作的特征。手工制作和深层特征具有高解释性和预测准确性的优势。本研究提出了一种新颖的评论有用性预测模型, 利用深度学习技术来考虑手工制作特征和深层特征之间的互补性。

研究方法

首先, 采用先进的卷积神经网络从非结构化的评论文本中提取深层特征。其次, 本研究利用先前研究中提取的手工制作特征, 这些特征影响了评论的有用性并增强了其解释性。第三, 本研究将深层特征和手工制作特征结合到一个评论有用性预测模型中, 并使用Yelp.com数据集对其性能进行评估。为了衡量所提出模型的性能, 本研究使用了2,417,796条餐厅评论。

研究发现

广泛的实验验证了所提出的方法优于传统的机器学习方法。此外, 通过实证分析, 本研究证实了结合手工制作和深层特征可以展现出更好的预测性能。

研究创新

据我们所知, 这是首个在餐厅评论预测中应用深度学习技术, 并结合了结构化和非结构化数据来预测评论有用性的研究之一。此外, 本研究采用了先进的特征融合方法, 更好地利用了提取的特征信息, 并识别了特征之间的互补性。

Access

Year

Last 12 months (646)

Content type

Article (646)
1 – 10 of 646