Search results

1 – 10 of over 5000
Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 21 August 2023

Minghao Wang, Ming Cong, Yu Du, Dong Liu and Xiaojing Tian

The purpose of this study is to solve the problem of an unknown initial position in a multi-robot raster map fusion. The method includes two-dimensional (2D) raster maps and…

Abstract

Purpose

The purpose of this study is to solve the problem of an unknown initial position in a multi-robot raster map fusion. The method includes two-dimensional (2D) raster maps and three-dimensional (3D) point cloud maps.

Design/methodology/approach

A fusion method using multiple algorithms was proposed. For 2D raster maps, this method uses accelerated robust feature detection to extract feature points of multi-raster maps, and then feature points are matched using a two-step algorithm of minimum Euclidean distance and adjacent feature relation. Finally, the random sample consensus algorithm was used for redundant feature fusion. On the basis of 2D raster map fusion, the method of coordinate alignment is used for 3D point cloud map fusion.

Findings

To verify the effectiveness of the algorithm, the segmentation mapping method (2D raster map) and the actual robot mapping method (2D raster map and 3D point cloud map) were used for experimental verification. The experiments demonstrated the stability and reliability of the proposed algorithm.

Originality/value

This algorithm uses a new visual method with coordinate alignment to process the raster map, which can effectively solve the problem of the demand for the initial relative position of robots in traditional methods and be more adaptable to the fusion of 3D maps. In addition, the original data of the map can come from different types of robots, which greatly improves the universality of the algorithm.

Details

Robotic Intelligence and Automation, vol. 43 no. 5
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 1 November 2023

Juan Yang, Zhenkun Li and Xu Du

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…

Abstract

Purpose

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.

Design/methodology/approach

A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.

Findings

Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.

Originality/value

The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Article
Publication date: 18 November 2021

Yingjie Zhang, Wentao Yan, Geok Soon Hong, Jerry Fuh Hsi Fuh, Di Wang, Xin Lin and Dongsen Ye

This study aims to develop a data fusion method for powder-bed fusion (PBF) process monitoring based on process image information. The data fusion method can help improve process…

Abstract

Purpose

This study aims to develop a data fusion method for powder-bed fusion (PBF) process monitoring based on process image information. The data fusion method can help improve process condition identification performance, which can provide guidance for further PBF process monitoring and control system development.

Design/methodology/approach

Design of reliable process monitoring systems is an essential approach to solve PBF built quality. A data fusion framework based on support vector machine (SVM), convolutional neural network (CNN) and Dempster-Shafer (D-S) evidence theory are proposed in the study. The process images which include the information of melt pool, plume and spatters were acquired by a high-speed camera. The features were extracted based on an appropriate image processing method. The three feature vectors corresponding to the three objects, respectively, were used as the inputs of SVM classifiers for process condition identification. Moreover, raw images were also used as the input of a CNN classifier for process condition identification. Then, the information fusion of the three SVM classifiers and the CNN classifier by an improved D-S evidence theory was studied.

Findings

The results demonstrate that the sensitivity of information sources is different for different condition identification. The feature fusion based on D-S evidence theory can improve the classification performance, with feature fusion and classifier fusion, the accuracy of condition identification is improved more than 20%.

Originality/value

An improved D-S evidence theory is proposed for PBF process data fusion monitoring, which is promising for the development of reliable PBF process monitoring systems.

Details

Rapid Prototyping Journal, vol. 28 no. 5
Type: Research Article
ISSN: 1355-2546

Keywords

Article
Publication date: 19 January 2024

Meng Zhu and Xiaolong Xu

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is…

Abstract

Purpose

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is to extract the information that is important to the intent from the input sentence. However, most of the existing methods use sentence-level intention recognition, which has the risk of error propagation, and the relationship between intention recognition and SF is not explicitly modeled. Aiming at this problem, this paper proposes a collaborative model of ID and SF for intelligent spoken language understanding called ID-SF-Fusion.

Design/methodology/approach

ID-SF-Fusion uses Bidirectional Encoder Representation from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) to extract effective word embedding and context vectors containing the whole sentence information respectively. Fusion layer is used to provide intent–slot fusion information for SF task. In this way, the relationship between ID and SF task is fully explicitly modeled. This layer takes the result of ID and slot context vectors as input to obtain the fusion information which contains both ID result and slot information. Meanwhile, to further reduce error propagation, we use word-level ID for the ID-SF-Fusion model. Finally, two tasks of ID and SF are realized by joint optimization training.

Findings

We conducted experiments on two public datasets, Airline Travel Information Systems (ATIS) and Snips. The results show that the Intent ACC score and Slot F1 score of ID-SF-Fusion on ATIS and Snips are 98.0 per cent and 95.8 per cent, respectively, and the two indicators on Snips dataset are 98.6 per cent and 96.7 per cent, respectively. These models are superior to slot-gated, SF-ID NetWork, stack-Prop and other models. In addition, ablation experiments were performed to further analyze and discuss the proposed model.

Originality/value

This paper uses word-level intent recognition and introduces intent information into the SF process, which is a significant improvement on both data sets.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 8 March 2010

Bo Chen, Jifeng Wang and Shanben Chen

Welding sensor technology is the key technology in welding process, but a single sensor cannot acquire adequate information to describe welding status. This paper addresses arc…

Abstract

Purpose

Welding sensor technology is the key technology in welding process, but a single sensor cannot acquire adequate information to describe welding status. This paper addresses arc sensor and sound sensor to acquire the voltage and sound information of pulsed gas tungsten arc welding (GTAW) simultaneously, and uses multi‐sensor information fusion technology to fuse the information acquired by the two sensors. The purpose of this paper is to explore the feasibility and effectiveness of multi‐sensor information fusion in pulsed GTAW.

Design/methodology/approach

The weld voltage and weld sound information are first acquired by arc sensor and sound sensor, then the features of the two signals are extracted, and the features are fused by weighted mean method to predict the changes of arc length. The weights of each feature are determined by optional distribution method.

Findings

The research findings show that multi‐sensor information fusion technology can effectively utilize the information of different sensors and get better result than single sensor.

Originality/value

The arc sensor and sound sensor are first used at the same time to get information about pulsed GTAW and the fusion result shows its advantages over single sensor; this reveals that multi‐sensor fusion technology is a valuable research area in welding process.

Details

Industrial Robot: An International Journal, vol. 37 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 14 August 2017

Sudeep Thepade, Rik Das and Saurav Ghosh

Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image…

Abstract

Purpose

Current practices in data classification and retrieval have experienced a surge in the use of multimedia content. Identification of desired information from the huge image databases has been facing increased complexities for designing an efficient feature extraction process. Conventional approaches of image classification with text-based image annotation have faced assorted limitations due to erroneous interpretation of vocabulary and huge time consumption involved due to manual annotation. Content-based image recognition has emerged as an alternative to combat the aforesaid limitations. However, exploring rich feature content in an image with a single technique has lesser probability of extract meaningful signatures compared to multi-technique feature extraction. Therefore, the purpose of this paper is to explore the possibilities of enhanced content-based image recognition by fusion of classification decision obtained using diverse feature extraction techniques.

Design/methodology/approach

Three novel techniques of feature extraction have been introduced in this paper and have been tested with four different classifiers individually. The four classifiers used for performance testing were K nearest neighbor (KNN) classifier, RIDOR classifier, artificial neural network classifier and support vector machine classifier. Thereafter, classification decisions obtained using KNN classifier for different feature extraction techniques have been integrated by Z-score normalization and feature scaling to create fusion-based framework of image recognition. It has been followed by the introduction of a fusion-based retrieval model to validate the retrieval performance with classified query. Earlier works on content-based image identification have adopted fusion-based approach. However, to the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work.

Findings

The proposed fusion techniques have successfully outclassed the state-of-the-art techniques in classification and retrieval performances. Four public data sets, namely, Wang data set, Oliva and Torralba (OT-scene) data set, Corel data set and Caltech data set comprising of 22,615 images on the whole are used for the evaluation purpose.

Originality/value

To the best of the authors’ knowledge, fusion-based query classification has been addressed for the first time as a precursor of retrieval in this work. The novel idea of exploring rich image features by fusion of multiple feature extraction techniques has also encouraged further research on dimensionality reduction of feature vectors for enhanced classification results.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 10 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 9 February 2021

Yaolin Zhu, Jiayi Huang, Tong Wu and Xueqin Ren

The purpose of this paper is to select the optimal feature parameters to further improve the identification accuracy of cashmere and wool.

Abstract

Purpose

The purpose of this paper is to select the optimal feature parameters to further improve the identification accuracy of cashmere and wool.

Design/methodology/approach

To increase the accuracy, the authors put forward a method selecting optimal parameters based on the fusion of morphological feature and texture feature. The first step is to acquire the fiber diameter measured by the central axis algorithm. The second step is to acquire the optimal texture feature parameters. This step is mainly achieved by using the variance of secondary statistics of these two texture features to get four statistics and then finding the impact factors of gray level co-occurrence matrix relying on the relationship between the secondary statistic values and the pixel pitch. Finally, the five-dimensional feature vectors extracted from the sample image are fed into the fisher classifier.

Findings

The improvement of identification accuracy can be achieved by determining the optimal feature parameters and fusing two texture features. The average identification accuracy is 96.713% in this paper, which is very helpful to improve the efficiency of detector in the textile industry.

Originality/value

In this paper, a novel identification method which extracts the optimal feature parameter is proposed.

Details

International Journal of Clothing Science and Technology, vol. 34 no. 1
Type: Research Article
ISSN: 0955-6222

Keywords

Article
Publication date: 12 September 2023

Wei Shi, Jing Zhang and Shaoyi He

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as…

113

Abstract

Purpose

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as how to represent the features of different modalities and achieve effective cross-modal feature fusion when analyzing the multi-modal sentiment of Chinese short videos (CSVs).

Design/methodology/approach

This paper aims to propose a sentiment analysis model MSCNN-CPL-CAFF using multi-scale convolutional neural network and cross attention fusion mechanism to analyze the CSVs. The audio-visual and textual data of CSVs themed on “COVID-19, catering industry” are collected from CSV platform Douyin first, and then a comparative analysis is conducted with advanced baseline models.

Findings

The sample number of the weak negative and neutral sentiment is the largest, and the sample number of the positive and weak positive sentiment is relatively small, accounting for only about 11% of the total samples. The MSCNN-CPL-CAFF model has achieved the Acc-2, Acc-3 and F1 score of 85.01%, 74.16 and 84.84%, respectively, which outperforms the highest value of baseline methods in accuracy and achieves competitive computation speed.

Practical implications

This research offers some implications regarding the impact of COVID-19 on catering industry in China by focusing on multi-modal sentiment of CSVs. The methodology can be utilized to analyze the opinions of the general public on social media platform and to categorize them accordingly.

Originality/value

This paper presents a novel deep-learning multimodal sentiment analysis model, which provides a new perspective for public opinion research on the short video platform.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 17 September 2019

Chérif Taouche and Hacene Belhadef

Palmprint recognition is a very interesting and promising area of research. Much work has already been done in this area, but much more needs to be done to make the systems more…

73

Abstract

Purpose

Palmprint recognition is a very interesting and promising area of research. Much work has already been done in this area, but much more needs to be done to make the systems more efficient. In this paper, a multimodal biometrics system based on fusion of left and right palmprints of a person is proposed to overcome limitations of unimodal systems.

Design/methodology/approach

Features are extracted using some proposed multi-block local descriptors in addition to MBLBP. Fusion of extracted features is done at feature level by a simple concatenation of feature vectors. Then, feature selection is performed on the resulting global feature vector using evolutionary algorithms such as genetic algorithms and backtracking search algorithm for a comparison purpose. The benefits of such step selecting the relevant features are known in the literature, such as increasing the recognition accuracy and reducing the feature set size, which results in runtime saving. In matching step, Chi-square similarity measure is used.

Findings

The resulting feature vector length representing a person is compact and the runtime is reduced.

Originality/value

Intensive experiments were done on the publicly available IITD database. Experimental results show a recognition accuracy of 99.17 which prove the effectiveness and robustness of the proposed multimodal biometrics system than other unimodal and multimodal biometrics systems.

Details

Information Discovery and Delivery, vol. 48 no. 1
Type: Research Article
ISSN: 2398-6247

Keywords

1 – 10 of over 5000