Search results

1 – 10 of over 5000
Article
Publication date: 21 August 2023

Minghao Wang, Ming Cong, Yu Du, Dong Liu and Xiaojing Tian

The purpose of this study is to solve the problem of an unknown initial position in a multi-robot raster map fusion. The method includes two-dimensional (2D) raster maps and…

Abstract

Purpose

The purpose of this study is to solve the problem of an unknown initial position in a multi-robot raster map fusion. The method includes two-dimensional (2D) raster maps and three-dimensional (3D) point cloud maps.

Design/methodology/approach

A fusion method using multiple algorithms was proposed. For 2D raster maps, this method uses accelerated robust feature detection to extract feature points of multi-raster maps, and then feature points are matched using a two-step algorithm of minimum Euclidean distance and adjacent feature relation. Finally, the random sample consensus algorithm was used for redundant feature fusion. On the basis of 2D raster map fusion, the method of coordinate alignment is used for 3D point cloud map fusion.

Findings

To verify the effectiveness of the algorithm, the segmentation mapping method (2D raster map) and the actual robot mapping method (2D raster map and 3D point cloud map) were used for experimental verification. The experiments demonstrated the stability and reliability of the proposed algorithm.

Originality/value

This algorithm uses a new visual method with coordinate alignment to process the raster map, which can effectively solve the problem of the demand for the initial relative position of robots in traditional methods and be more adaptable to the fusion of 3D maps. In addition, the original data of the map can come from different types of robots, which greatly improves the universality of the algorithm.

Details

Robotic Intelligence and Automation, vol. 43 no. 5
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 14 August 2017

Sudeep Thepade, Rik Das and Saurav Ghosh

Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image…

Abstract

Purpose

Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image databases has been facing increased complexities for designing an efficient feature extraction process. Conventional approaches of image classification with text-based image annotation have faced assorted limitations due to erroneous interpretation of vocabulary and huge time consumption involved due to manual annotation. Content-based image recognition has emerged as an alternative to combat the aforesaid limitations. However, exploring rich feature content in an image with a single technique has lesser probability of extract meaningful signatures compared to multi-technique feature extraction. Therefore, the purpose of this paper is to explore the possibilities of enhanced content-based image recognition by fusion of classification decision obtained using diverse feature extraction techniques.

Design/methodology/approach

Three novel techniques of feature extraction have been introduced in this paper and have been tested with four different classifiers individually. The four classifiers used for performance testing were K nearest neighbor (KNN) classifier, RIDOR classifier, artificial neural network classifier and support vector machine classifier. Thereafter, classification decisions obtained using KNN classifier for different feature extraction techniques have been integrated by Z-score normalization and feature scaling to create fusion-based framework of image recognition. It has been followed by the introduction of a fusion-based retrieval model to validate the retrieval performance with classified query. Earlier works on content-based image identification have adopted fusion-based approach. However, to the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work.

Findings

The proposed fusion techniques have successfully outclassed the state-of-the-art techniques in classification and retrieval performances. Four public data sets, namely, Wang data set, Oliva and Torralba (OT-scene) data set, Corel data set and Caltech data set comprising of 22,615 images on the whole are used for the evaluation purpose.

Originality/value

To the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work. The novel idea of exploring rich image features by fusion of multiple feature extraction techniques has also encouraged further research on dimensionality reduction of feature vectors for enhanced classification results.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 10 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 18 November 2021

Yingjie Zhang, Wentao Yan, Geok Soon Hong, Jerry Fuh Hsi Fuh, Di Wang, Xin Lin and Dongsen Ye

This study aims to develop a data fusion method for powder-bed fusion (PBF) process monitoring based on process image information. The data fusion method can help improve process…

Abstract

Purpose

This study aims to develop a data fusion method for powder-bed fusion (PBF) process monitoring based on process image information. The data fusion method can help improve process condition identification performance, which can provide guidance for further PBF process monitoring and control system development.

Design/methodology/approach

Design of reliable process monitoring systems is an essential approach to solve PBF built quality. A data fusion framework based on support vector machine (SVM), convolutional neural network (CNN) and Dempster-Shafer (D-S) evidence theory are proposed in the study. The process images which include the information of melt pool, plume and spatters were acquired by a high-speed camera. The features were extracted based on an appropriate image processing method. The three feature vectors corresponding to the three objects, respectively, were used as the inputs of SVM classifiers for process condition identification. Moreover, raw images were also used as the input of a CNN classifier for process condition identification. Then, the information fusion of the three SVM classifiers and the CNN classifier by an improved D-S evidence theory was studied.

Findings

The results demonstrate that the sensitivity of information sources is different for different condition identification. The feature fusion based on D-S evidence theory can improve the classification performance, with feature fusion and classifier fusion, the accuracy of condition identification is improved more than 20%.

Originality/value

An improved D-S evidence theory is proposed for PBF process data fusion monitoring, which is promising for the development of reliable PBF process monitoring systems.

Details

Rapid Prototyping Journal, vol. 28 no. 5
Type: Research Article
ISSN: 1355-2546

Keywords

Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 November 2023

Juan Yang, Zhenkun Li and Xu Du

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…

Abstract

Purpose

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.

Design/methodology/approach

A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.

Findings

Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.

Originality/value

The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Article
Publication date: 12 September 2023

Wei Shi, Jing Zhang and Shaoyi He

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as…

113

Abstract

Purpose

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as how to represent the features of different modalities and achieve effective cross-modal feature fusion when analyzing the multi-modal sentiment of Chinese short videos (CSVs).

Design/methodology/approach

This paper aims to propose a sentiment analysis model MSCNN-CPL-CAFF using multi-scale convolutional neural network and cross attention fusion mechanism to analyze the CSVs. The audio-visual and textual data of CSVs themed on “COVID-19, catering industry” are collected from CSV platform Douyin first, and then a comparative analysis is conducted with advanced baseline models.

Findings

The sample number of the weak negative and neutral sentiment is the largest, and the sample number of the positive and weak positive sentiment is relatively small, accounting for only about 11% of the total samples. The MSCNN-CPL-CAFF model has achieved the Acc-2, Acc-3 and F1 score of 85.01%, 74.16 and 84.84%, respectively, which outperforms the highest value of baseline methods in accuracy and achieves competitive computation speed.

Practical implications

This research offers some implications regarding the impact of COVID-19 on catering industry in China by focusing on multi-modal sentiment of CSVs. The methodology can be utilized to analyze the opinions of the general public on social media platform and to categorize them accordingly.

Originality/value

This paper presents a novel deep-learning multimodal sentiment analysis model, which provides a new perspective for public opinion research on the short video platform.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 17 September 2019

Chérif Taouche and Hacene Belhadef

Palmprint recognition is a very interesting and promising area of research. Much work has already been done in this area, but much more needs to be done to make the systems more…

73

Abstract

Purpose

Palmprint recognition is a very interesting and promising area of research. Much work has already been done in this area, but much more needs to be done to make the systems more efficient. In this paper, a multimodal biometrics system based on fusion of left and right palmprints of a person is proposed to overcome limitations of unimodal systems.

Design/methodology/approach

Features are extracted using some proposed multi-block local descriptors in addition to MBLBP. Fusion of extracted features is done at feature level by a simple concatenation of feature vectors. Then, feature selection is performed on the resulting global feature vector using evolutionary algorithms such as genetic algorithms and backtracking search algorithm for a comparison purpose. The benefits of such step selecting the relevant features are known in the literature, such as increasing the recognition accuracy and reducing the feature set size, which results in runtime saving. In matching step, Chi-square similarity measure is used.

Findings

The resulting feature vector length representing a person is compact and the runtime is reduced.

Originality/value

Intensive experiments were done on the publicly available IITD database. Experimental results show a recognition accuracy of 99.17 which prove the effectiveness and robustness of the proposed multimodal biometrics system than other unimodal and multimodal biometrics systems.

Details

Information Discovery and Delivery, vol. 48 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 28 February 2022

Rui Zhang, Na Zhao, Liuhu Fu, Lihu Pan, Xiaolu Bai and Renwang Song

This paper aims to propose a new ultrasonic diagnosis method for stainless steel weld defects based on multi-domain feature fusion to solve two problems in the ultrasonic…

Abstract

Purpose

This paper aims to propose a new ultrasonic diagnosis method for stainless steel weld defects based on multi-domain feature fusion to solve two problems in the ultrasonic diagnosis of austenitic stainless steel weld defects. These are insufficient feature extraction and subjective dependence of diagnosis model parameters.

Design/methodology/approach

To express the richness of the one-dimensional (1D) signal information, the 1D ultrasonic testing signal was derived to the two-dimensional (2D) time-frequency domain. Multi-scale depthwise separable convolution was also designed to optimize the MobileNetV3 network to obtain deep convolution feature information under different receptive fields. At the same time, the time/frequent-domain feature extraction of the defect signals was carried out based on statistical analysis. The defect sensitive features were screened out through visual analysis, and the defect feature set was constructed by cascading fusion with deep convolution feature information. To improve the adaptability and generalization of the diagnostic model, the authors designed and carried out research on the hyperparameter self-optimization of the diagnostic model based on the sparrow search strategy and constructed the optimal hyperparameter combination of the model. Finally, the performance of the ultrasonic diagnosis of stainless steel weld defects was improved comprehensively through the multi-domain feature characterization model of the defect data and diagnosis optimization model.

Findings

The experimental results show that the diagnostic accuracy of the lightweight diagnosis model constructed in this paper can reach 96.55% for the five types of stainless steel weld defects, including cracks, porosity, inclusion, lack of fusion and incomplete penetration. These can meet the needs of practical engineering applications.

Originality/value

This method provides a theoretical basis and technical reference for developing and applying intelligent, efficient and accurate ultrasonic defect diagnosis technology.

Article
Publication date: 10 January 2024

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 7 August 2017

Shenglan Liu, Muxin Sun, Xiaodong Huang, Wei Wang and Feilong Wang

Robot vision is a fundamental device for human–robot interaction and robot complex tasks. In this paper, the authors aim to use Kinect and propose a feature graph fusion (FGF) for…

Abstract

Purpose

Robot vision is a fundamental device for human–robot interaction and robot complex tasks. In this paper, the authors aim to use Kinect and propose a feature graph fusion (FGF) for robot recognition.

Design/methodology/approach

The feature fusion utilizes red green blue (RGB) and depth information to construct fused feature from Kinect. FGF involves multi-Jaccard similarity to compute a robust graph and word embedding method to enhance the recognition results.

Findings

The authors also collect DUT RGB-Depth (RGB-D) face data set and a benchmark data set to evaluate the effectiveness and efficiency of this method. The experimental results illustrate that FGF is robust and effective to face and object data sets in robot applications.

Originality/value

The authors first utilize Jaccard similarity to construct a graph of RGB and depth images, which indicates the similarity of pair-wise images. Then, fusion feature of RGB and depth images can be computed by the Extended Jaccard Graph using word embedding method. The FGF can get better performance and efficiency in RGB-D sensor for robots.

Details

Assembly Automation, vol. 37 no. 3
Type: Research Article
ISSN: 0144-5154

Keywords

1 – 10 of over 5000