Search results

1 – 10 of 184
Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 November 2023

Juan Yang, Zhenkun Li and Xu Du

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…

Abstract

Purpose

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.

Design/methodology/approach

A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.

Findings

Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.

Originality/value

The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Article
Publication date: 3 August 2023

Yandong Hou, Zhengbo Wu, Xinghua Ren, Kaiwen Liu and Zhengquan Chen

High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the…

Abstract

Purpose

High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the semantic segmentation task challenging. In this paper, a bidirectional feature fusion network (BFFNet) is designed to address this challenge, which aims at increasing the accurate recognition of surface objects in order to effectively classify special features.

Design/methodology/approach

There are two main crucial elements in BFFNet. Firstly, the mean-weighted module (MWM) is used to obtain the key features in the main network. Secondly, the proposed polarization enhanced branch network performs feature extraction simultaneously with the main network to obtain different feature information. The authors then fuse these two features in both directions while applying a cross-entropy loss function to monitor the network training process. Finally, BFFNet is validated on two publicly available datasets, Potsdam and Vaihingen.

Findings

In this paper, a quantitative analysis method is used to illustrate that the proposed network achieves superior performance of 2–6%, respectively, compared to other mainstream segmentation networks from experimental results on two datasets. Complete ablation experiments are also conducted to demonstrate the effectiveness of the elements in the network. In summary, BFFNet has proven to be effective in achieving accurate identification of small objects and in reducing the effect of shadows on the segmentation process.

Originality/value

The originality of the paper is the proposal of a BFFNet based on multi-scale and multi-attention strategies to improve the ability to accurately segment high-resolution and complex remote sensing images, especially for small objects and shadow-obscured objects.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Open Access
Article
Publication date: 16 January 2024

Pengyue Guo, Tianyun Shi, Zhen Ma and Jing Wang

The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera…

Abstract

Purpose

The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera to improve the accuracy of object recognition in dark and harsh weather conditions.

Design/methodology/approach

This paper adopts the fusion strategy of radar and camera linkage to achieve focus amplification of long-distance targets and solves the problem of low illumination by laser light filling of the focus point. In order to improve the recognition effect, this paper adopts the YOLOv8 algorithm for multi-scale target recognition. In addition, for the image distortion caused by bad weather, this paper proposes a linkage and tracking fusion strategy to output the correct alarm results.

Findings

Simulated intrusion tests show that the proposed method can effectively detect human intrusion within 0–200 m during the day and night in sunny weather and can achieve more than 80% recognition accuracy for extreme severe weather conditions.

Originality/value

(1) The authors propose a personnel intrusion monitoring scheme based on the fusion of millimeter wave radar and camera, achieving all-weather intrusion monitoring; (2) The authors propose a new multi-level fusion algorithm based on linkage and tracking to achieve intrusion target monitoring under adverse weather conditions; (3) The authors have conducted a large number of innovative simulation experiments to verify the effectiveness of the method proposed in this article.

Details

Railway Sciences, vol. 3 no. 1
Type: Research Article
ISSN: 2755-0907

Keywords

Article
Publication date: 22 January 2024

Jun Liu, Junyuan Dong, Mingming Hu and Xu Lu

Existing Simultaneous Localization and Mapping (SLAM) algorithms have been relatively well developed. However, when in complex dynamic environments, the movement of the dynamic…

Abstract

Purpose

Existing Simultaneous Localization and Mapping (SLAM) algorithms have been relatively well developed. However, when in complex dynamic environments, the movement of the dynamic points on the dynamic objects in the image in the mapping can have an impact on the observation of the system, and thus there will be biases and errors in the position estimation and the creation of map points. The aim of this paper is to achieve more accurate accuracy in SLAM algorithms compared to traditional methods through semantic approaches.

Design/methodology/approach

In this paper, the semantic segmentation of dynamic objects is realized based on U-Net semantic segmentation network, followed by motion consistency detection through motion detection method to determine whether the segmented objects are moving in the current scene or not, and combined with the motion compensation method to eliminate dynamic points and compensate for the current local image, so as to make the system robust.

Findings

Experiments comparing the effect of detecting dynamic points and removing outliers are conducted on a dynamic data set of Technische Universität München, and the results show that the absolute trajectory accuracy of this paper's method is significantly improved compared with ORB-SLAM3 and DS-SLAM.

Originality/value

In this paper, in the semantic segmentation network part, the segmentation mask is combined with the method of dynamic point detection, elimination and compensation, which reduces the influence of dynamic objects, thus effectively improving the accuracy of localization in dynamic environments.

Details

Industrial Robot: the international journal of robotics research and application, vol. 51 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 3 November 2022

Vinod Nistane

Rolling element bearings (REBs) are commonly used in rotating machinery such as pumps, motors, fans and other machineries. The REBs deteriorate over life cycle time. To know the…

Abstract

Purpose

Rolling element bearings (REBs) are commonly used in rotating machinery such as pumps, motors, fans and other machineries. The REBs deteriorate over life cycle time. To know the amount of deteriorate at any time, this paper aims to present a prognostics approach based on integrating optimize health indicator (OHI) and machine learning algorithm.

Design/methodology/approach

Proposed optimum prediction model would be used to evaluate the remaining useful life (RUL) of REBs. Initially, signal raw data are preprocessing through mother wavelet transform; after that, the primary fault features are extracted. Further, these features process to elevate the clarity of features using the random forest algorithm. Based on variable importance of features, the best representation of fault features is selected. Optimize the selected feature by adjusting weight vector using optimization techniques such as genetic algorithm (GA), sequential quadratic optimization (SQO) and multiobjective optimization (MOO). New OHIs are determined and apply to train the network. Finally, optimum predictive models are developed by integrating OHI and artificial neural network (ANN), K-mean clustering (KMC) (i.e. OHI–GA–ANN, OHI–SQO–ANN, OHI–MOO–ANN, OHI–GA–KMC, OHI–SQO–KMC and OHI–MOO–KMC).

Findings

Optimum prediction models performance are recorded and compared with the actual value. Finally, based on error term values best optimum prediction model is proposed for evaluation of RUL of REBs.

Originality/value

Proposed OHI–GA–KMC model is compared in terms of error values with previously published work. RUL predicted by OHI–GA–KMC model is smaller, giving the advantage of this method.

Article
Publication date: 28 March 2023

John Millar, Frank Mueller and Chris Carter

The paper provides a theoretical framework for interdisciplinary accounting scholars interested in performances of accountability in front of live audiences.

Abstract

Purpose

The paper provides a theoretical framework for interdisciplinary accounting scholars interested in performances of accountability in front of live audiences.

Design/methodology/approach

This is a processual case study of “Falkirk in crisis” that covers the period from September 2021 to September 2022. The focus of this paper is two-fan-Q&A sessions held in October 2021 and June 2022. Both are naturally occurring discussions between two groups such as are found in previous research on routine events and accountability. This is a theoretically consequential case study.

Findings

A key insight of the paper is to identify the practical and symbolic dimensions of accountability. The paper demonstrates the need to align these two dimensions when responding to questions: a practical question demands a practical answer and a symbolic question requires a symbolic answer. Second, the paper argues that most fields contain conflicting logics and highlights that a complete performance of accountability needs to cover the different conflicting logics within the field. In this case, this means paying full attention to both the communitarian and results logics. A third finding is that a performance of accountability cannot succeed if the audience rejects attempts to impose an unpalatable definition of the situation. If these three conditions are not met, the performance is bound to fail.

Research limitations/implications

An important theoretical coontribution of the study is the application of Jeffery Alexander’s work on political performance to public performances of accountability.

Practical implications

The phenomenon explored in the paper (what the authors term “grassroots accountability”) has broad applicability to any situation in organizational or civic life where the power apex of an organization is required to engage with a group of informed and committed stakeholders – the “community”. For those who find themselves in the position of the fans in this study, the observations set out in the empirical narrative can serve as a useful practical guide. Attempts to answer a practical complaint with a symbolic answer (or vice versa) should be challenged as evasive.

Social implications

This paper studies an engagement of elite actors with ordinary (or grassroots) actors. The study shows important rules of engagement, including the importance of respecting the power of practical questions and the need to engage with these questions appropriately.

Originality/value

This paper offers a new vista for interdisciplinary accounting by synthesizing the accountability literature with the political performance literature. Specifically, the paper employs Jeffery Alexander’s work on practical and symbolic performance to study the microprocesses underpinning successful and unsuccessful performances of accountability.

Details

Accounting, Auditing & Accountability Journal, vol. 37 no. 2
Type: Research Article
ISSN: 0951-3574

Keywords

Article
Publication date: 14 December 2023

Huaxiang Song, Chai Wei and Zhou Yong

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of…

Abstract

Purpose

The paper aims to tackle the classification of Remote Sensing Images (RSIs), which presents a significant challenge for computer algorithms due to the inherent characteristics of clustered ground objects and noisy backgrounds. Recent research typically leverages larger volume models to achieve advanced performance. However, the operating environments of remote sensing commonly cannot provide unconstrained computational and storage resources. It requires lightweight algorithms with exceptional generalization capabilities.

Design/methodology/approach

This study introduces an efficient knowledge distillation (KD) method to build a lightweight yet precise convolutional neural network (CNN) classifier. This method also aims to substantially decrease the training time expenses commonly linked with traditional KD techniques. This approach entails extensive alterations to both the model training framework and the distillation process, each tailored to the unique characteristics of RSIs. In particular, this study establishes a robust ensemble teacher by independently training two CNN models using a customized, efficient training algorithm. Following this, this study modifies a KD loss function to mitigate the suppression of non-target category predictions, which are essential for capturing the inter- and intra-similarity of RSIs.

Findings

This study validated the student model, termed KD-enhanced network (KDE-Net), obtained through the KD process on three benchmark RSI data sets. The KDE-Net surpasses 42 other state-of-the-art methods in the literature published from 2020 to 2023. Compared to the top-ranked method’s performance on the challenging NWPU45 data set, KDE-Net demonstrated a noticeable 0.4% increase in overall accuracy with a significant 88% reduction in parameters. Meanwhile, this study’s reformed KD framework significantly enhances the knowledge transfer speed by at least three times.

Originality/value

This study illustrates that the logit-based KD technique can effectively develop lightweight CNN classifiers for RSI classification without substantial sacrifices in computation and storage costs. Compared to neural architecture search or other methods aiming to provide lightweight solutions, this study’s KDE-Net, based on the inherent characteristics of RSIs, is currently more efficient in constructing accurate yet lightweight classifiers for RSI classification.

Details

International Journal of Web Information Systems, vol. 20 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 March 2024

Wei-Zhen Wang, Hong-Mei Xiao and Yuan Fang

Nowadays, artificial intelligence (AI) technology has demonstrated extensive applications in the field of art design. Attribute editing is an important means to realize clothing…

Abstract

Purpose

Nowadays, artificial intelligence (AI) technology has demonstrated extensive applications in the field of art design. Attribute editing is an important means to realize clothing style and color design via computer language, which aims to edit and control the garment image based on the specified target attributes while preserving other details from the original image. The current image attribute editing model often generates images containing missing or redundant attributes. To address the problem, this paper aims for a novel design method utilizing the Fashion-attribute generative adversarial network (AttGAN) model was proposed for image attribute editing specifically tailored to women’s blouses.

Design/methodology/approach

The proposed design method primarily focuses on optimizing the feature extraction network and loss function. To enhance the feature extraction capability of the model, an increase in the number of layers in the feature extraction network was implemented, and the structure similarity index measure (SSIM) loss function was employed to ensure the independent attributes of the original image were consistent. The characteristic-preserving virtual try-on network (CP_VTON) dataset was used for train-ing to enable the editing of sleeve length and color specifically for women’s blouse.

Findings

The experimental results demonstrate that the optimization model’s generated outputs have significantly reduced problems related to missing attributes or visual redundancy. Through a comparative analysis of the numerical changes in the SSIM and peak signal-to-noise ratio (PSNR) before and after the model refinement, it was observed that the improved SSIM increased substantially by 27.4%, and the PSNR increased by 2.8%, serving as empirical evidence of the effectiveness of incorporating the SSIM loss function.

Originality/value

The proposed algorithm provides a promising tool for precise image editing of women’s blouses based on the GAN. This introduces a new approach to eliminate semantic expression errors in image editing, thereby contributing to the development of AI in clothing design.

Details

International Journal of Clothing Science and Technology, vol. 36 no. 2
Type: Research Article
ISSN: 0955-6222

Keywords

Article
Publication date: 23 January 2024

Guoyang Wan, Yaocong Hu, Bingyou Liu, Shoujun Bai, Kaisheng Xing and Xiuwen Tao

Presently, 6 Degree of Freedom (6DOF) visual pose measurement methods enjoy popularity in the industrial sector. However, challenges persist in accurately measuring the visual…

Abstract

Purpose

Presently, 6 Degree of Freedom (6DOF) visual pose measurement methods enjoy popularity in the industrial sector. However, challenges persist in accurately measuring the visual pose of blank and rough metal casts. Therefore, this paper introduces a 6DOF pose measurement method utilizing stereo vision, and aims to the 6DOF pose measurement of blank and rough metal casts.

Design/methodology/approach

This paper studies the 6DOF pose measurement of metal casts from three aspects: sample enhancement of industrial objects, optimization of detector and attention mechanism. Virtual reality technology is used for sample enhancement of metal casts, which solves the problem of large-scale sample sampling in industrial application. The method also includes a novel deep learning detector that uses multiple key points on the object surface as regression objects to detect industrial objects with rotation characteristics. By introducing a mixed paths attention module, the detection accuracy of the detector and the convergence speed of the training are improved.

Findings

The experimental results show that the proposed method has a better detection effect for metal casts with smaller size scaling and rotation characteristics.

Originality/value

A method for 6DOF pose measurement of industrial objects is proposed, which realizes the pose measurement and grasping of metal blanks and rough machined casts by industrial robots.

Details

Sensor Review, vol. 44 no. 1
Type: Research Article
ISSN: 0260-2288

Keywords

Access

Year

Last 3 months (184)

Content type

Article (184)
1 – 10 of 184