Search results

1 – 10 of 314
Article
Publication date: 19 October 2023

Huaxiang Song

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…

Abstract

Purpose

Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.

Design/methodology/approach

This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.

Findings

Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.

Originality/value

MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 November 2023

Juan Yang, Zhenkun Li and Xu Du

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…

Abstract

Purpose

Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.

Design/methodology/approach

A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.

Findings

Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.

Originality/value

The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.

Article
Publication date: 3 August 2023

Yandong Hou, Zhengbo Wu, Xinghua Ren, Kaiwen Liu and Zhengquan Chen

High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the…

Abstract

Purpose

High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the semantic segmentation task challenging. In this paper, a bidirectional feature fusion network (BFFNet) is designed to address this challenge, which aims at increasing the accurate recognition of surface objects in order to effectively classify special features.

Design/methodology/approach

There are two main crucial elements in BFFNet. Firstly, the mean-weighted module (MWM) is used to obtain the key features in the main network. Secondly, the proposed polarization enhanced branch network performs feature extraction simultaneously with the main network to obtain different feature information. The authors then fuse these two features in both directions while applying a cross-entropy loss function to monitor the network training process. Finally, BFFNet is validated on two publicly available datasets, Potsdam and Vaihingen.

Findings

In this paper, a quantitative analysis method is used to illustrate that the proposed network achieves superior performance of 2–6%, respectively, compared to other mainstream segmentation networks from experimental results on two datasets. Complete ablation experiments are also conducted to demonstrate the effectiveness of the elements in the network. In summary, BFFNet has proven to be effective in achieving accurate identification of small objects and in reducing the effect of shadows on the segmentation process.

Originality/value

The originality of the paper is the proposal of a BFFNet based on multi-scale and multi-attention strategies to improve the ability to accurately segment high-resolution and complex remote sensing images, especially for small objects and shadow-obscured objects.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 1
Type: Research Article
ISSN: 1756-378X

Keywords

Open Access
Article
Publication date: 16 January 2024

Pengyue Guo, Tianyun Shi, Zhen Ma and Jing Wang

The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera…

Abstract

Purpose

The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera to improve the accuracy of object recognition in dark and harsh weather conditions.

Design/methodology/approach

This paper adopts the fusion strategy of radar and camera linkage to achieve focus amplification of long-distance targets and solves the problem of low illumination by laser light filling of the focus point. In order to improve the recognition effect, this paper adopts the YOLOv8 algorithm for multi-scale target recognition. In addition, for the image distortion caused by bad weather, this paper proposes a linkage and tracking fusion strategy to output the correct alarm results.

Findings

Simulated intrusion tests show that the proposed method can effectively detect human intrusion within 0–200 m during the day and night in sunny weather and can achieve more than 80% recognition accuracy for extreme severe weather conditions.

Originality/value

(1) The authors propose a personnel intrusion monitoring scheme based on the fusion of millimeter wave radar and camera, achieving all-weather intrusion monitoring; (2) The authors propose a new multi-level fusion algorithm based on linkage and tracking to achieve intrusion target monitoring under adverse weather conditions; (3) The authors have conducted a large number of innovative simulation experiments to verify the effectiveness of the method proposed in this article.

Details

Railway Sciences, vol. 3 no. 1
Type: Research Article
ISSN: 2755-0907

Keywords

Article
Publication date: 22 January 2024

Jun Liu, Junyuan Dong, Mingming Hu and Xu Lu

Existing Simultaneous Localization and Mapping (SLAM) algorithms have been relatively well developed. However, when in complex dynamic environments, the movement of the dynamic…

Abstract

Purpose

Existing Simultaneous Localization and Mapping (SLAM) algorithms have been relatively well developed. However, when in complex dynamic environments, the movement of the dynamic points on the dynamic objects in the image in the mapping can have an impact on the observation of the system, and thus there will be biases and errors in the position estimation and the creation of map points. The aim of this paper is to achieve more accurate accuracy in SLAM algorithms compared to traditional methods through semantic approaches.

Design/methodology/approach

In this paper, the semantic segmentation of dynamic objects is realized based on U-Net semantic segmentation network, followed by motion consistency detection through motion detection method to determine whether the segmented objects are moving in the current scene or not, and combined with the motion compensation method to eliminate dynamic points and compensate for the current local image, so as to make the system robust.

Findings

Experiments comparing the effect of detecting dynamic points and removing outliers are conducted on a dynamic data set of Technische Universität München, and the results show that the absolute trajectory accuracy of this paper's method is significantly improved compared with ORB-SLAM3 and DS-SLAM.

Originality/value

In this paper, in the semantic segmentation network part, the segmentation mask is combined with the method of dynamic point detection, elimination and compensation, which reduces the influence of dynamic objects, thus effectively improving the accuracy of localization in dynamic environments.

Details

Industrial Robot: the international journal of robotics research and application, vol. 51 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 3 November 2022

Vinod Nistane

Rolling element bearings (REBs) are commonly used in rotating machinery such as pumps, motors, fans and other machineries. The REBs deteriorate over life cycle time. To know the…

Abstract

Purpose

Rolling element bearings (REBs) are commonly used in rotating machinery such as pumps, motors, fans and other machineries. The REBs deteriorate over life cycle time. To know the amount of deteriorate at any time, this paper aims to present a prognostics approach based on integrating optimize health indicator (OHI) and machine learning algorithm.

Design/methodology/approach

Proposed optimum prediction model would be used to evaluate the remaining useful life (RUL) of REBs. Initially, signal raw data are preprocessing through mother wavelet transform; after that, the primary fault features are extracted. Further, these features process to elevate the clarity of features using the random forest algorithm. Based on variable importance of features, the best representation of fault features is selected. Optimize the selected feature by adjusting weight vector using optimization techniques such as genetic algorithm (GA), sequential quadratic optimization (SQO) and multiobjective optimization (MOO). New OHIs are determined and apply to train the network. Finally, optimum predictive models are developed by integrating OHI and artificial neural network (ANN), K-mean clustering (KMC) (i.e. OHI–GA–ANN, OHI–SQO–ANN, OHI–MOO–ANN, OHI–GA–KMC, OHI–SQO–KMC and OHI–MOO–KMC).

Findings

Optimum prediction models performance are recorded and compared with the actual value. Finally, based on error term values best optimum prediction model is proposed for evaluation of RUL of REBs.

Originality/value

Proposed OHI–GA–KMC model is compared in terms of error values with previously published work. RUL predicted by OHI–GA–KMC model is smaller, giving the advantage of this method.

Article
Publication date: 20 March 2024

Gang Yu, Zhiqiang Li, Ruochen Zeng, Yucong Jin, Min Hu and Vijayan Sugumaran

Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due…

45

Abstract

Purpose

Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due to limitations in utilizing heterogeneous sensing data and domain knowledge as well as insufficient generalizability resulting from limited data samples. This paper integrates implicit and qualitative expert knowledge into quantifiable values in tunnel condition assessment and proposes a tunnel structure prediction algorithm that augments a state-of-the-art attention-based long short-term memory (LSTM) model with expert rating knowledge to achieve robust prediction results to reasonably allocate maintenance resources.

Design/methodology/approach

Through formalizing domain experts' knowledge into quantitative tunnel condition index (TCI) with analytic hierarchy process (AHP), a fusion approach using sequence smoothing and sliding time window techniques is applied to the TCI and time-series sensing data. By incorporating both sensing data and expert ratings, an attention-based LSTM model is developed to improve prediction accuracy and reduce the uncertainty of structural influencing factors.

Findings

The empirical experiment in Dalian Road Tunnel in Shanghai, China showcases the effectiveness of the proposed method, which can comprehensively evaluate the tunnel structure condition and significantly improve prediction performance.

Originality/value

This study proposes a novel structure condition prediction algorithm that augments a state-of-the-art attention-based LSTM model with expert rating knowledge for robust prediction of structure condition of complex projects.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 28 March 2023

John Millar, Frank Mueller and Chris Carter

The paper provides a theoretical framework for interdisciplinary accounting scholars interested in performances of accountability in front of live audiences.

Abstract

Purpose

The paper provides a theoretical framework for interdisciplinary accounting scholars interested in performances of accountability in front of live audiences.

Design/methodology/approach

This is a processual case study of “Falkirk in crisis” that covers the period from September 2021 to September 2022. The focus of this paper is two-fan-Q&A sessions held in October 2021 and June 2022. Both are naturally occurring discussions between two groups such as are found in previous research on routine events and accountability. This is a theoretically consequential case study.

Findings

A key insight of the paper is to identify the practical and symbolic dimensions of accountability. The paper demonstrates the need to align these two dimensions when responding to questions: a practical question demands a practical answer and a symbolic question requires a symbolic answer. Second, the paper argues that most fields contain conflicting logics and highlights that a complete performance of accountability needs to cover the different conflicting logics within the field. In this case, this means paying full attention to both the communitarian and results logics. A third finding is that a performance of accountability cannot succeed if the audience rejects attempts to impose an unpalatable definition of the situation. If these three conditions are not met, the performance is bound to fail.

Research limitations/implications

An important theoretical coontribution of the study is the application of Jeffery Alexander’s work on political performance to public performances of accountability.

Practical implications

The phenomenon explored in the paper (what the authors term “grassroots accountability”) has broad applicability to any situation in organizational or civic life where the power apex of an organization is required to engage with a group of informed and committed stakeholders – the “community”. For those who find themselves in the position of the fans in this study, the observations set out in the empirical narrative can serve as a useful practical guide. Attempts to answer a practical complaint with a symbolic answer (or vice versa) should be challenged as evasive.

Social implications

This paper studies an engagement of elite actors with ordinary (or grassroots) actors. The study shows important rules of engagement, including the importance of respecting the power of practical questions and the need to engage with these questions appropriately.

Originality/value

This paper offers a new vista for interdisciplinary accounting by synthesizing the accountability literature with the political performance literature. Specifically, the paper employs Jeffery Alexander’s work on practical and symbolic performance to study the microprocesses underpinning successful and unsuccessful performances of accountability.

Details

Accounting, Auditing & Accountability Journal, vol. 37 no. 2
Type: Research Article
ISSN: 0951-3574

Keywords

Article
Publication date: 8 April 2024

Hu Luo, Haobin Ruan and Dawei Tu

The purpose of this paper is to propose a whole set of methods for underwater target detection, because most underwater objects have small samples, low quality underwater images…

Abstract

Purpose

The purpose of this paper is to propose a whole set of methods for underwater target detection, because most underwater objects have small samples, low quality underwater images problems such as detail loss, low contrast and color distortion, and verify the feasibility of the proposed methods through experiments.

Design/methodology/approach

The improved RGHS algorithm to enhance the original underwater target image is proposed, and then the YOLOv4 deep learning network for underwater small sample targets detection is improved based on the combination of traditional data expansion method and Mosaic algorithm, expanding the feature extraction capability with SPP (Spatial Pyramid Pooling) module after each feature extraction layer to extract richer feature information.

Findings

The experimental results, using the official dataset, reveal a 3.5% increase in average detection accuracy for three types of underwater biological targets compared to the traditional YOLOv4 algorithm. In underwater robot application testing, the proposed method achieves an impressive 94.73% average detection accuracy for the three types of underwater biological targets.

Originality/value

Underwater target detection is an important task for underwater robot application. However, most underwater targets have the characteristics of small samples, and the detection of small sample targets is a comprehensive problem because it is affected by the quality of underwater images. This paper provides a whole set of methods to solve the problems, which is of great significance to the application of underwater robot.

Details

Robotic Intelligence and Automation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2754-6969

Keywords

Article
Publication date: 24 January 2024

Chung-Ming Lo

An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their…

48

Abstract

Purpose

An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their requirements using an image query. Nevertheless, determining whether the retrieval system can provide convenient operation and relevant retrieval results is challenging. A CBIR system based on deep learning features was proposed in this study to effectively search and navigate images in digital articles.

Design/methodology/approach

Convolutional neural networks (CNNs) were used as the feature extractors in the author's experiments. Using pretrained parameters, the training time and retrieval time were reduced. Different CNN features were extracted from the constructed image databases consisting of images taken from the National Palace Museum Journals Archive and were compared in the CBIR system.

Findings

DenseNet201 achieved the best performance, with a top-10 mAP of 89% and a query time of 0.14 s.

Practical implications

The CBIR homepage displayed image categories showing the content of the database and provided the default query images. After retrieval, the result showed the metadata of the retrieved images and links back to the original pages.

Originality/value

With the interface and retrieval demonstration, a novel image-based reading mode can be established via the CBIR and links to the original images and contextual descriptions.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of 314