Search results
1 – 10 of 314Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition…
Abstract
Purpose
Classification of remote sensing images (RSI) is a challenging task in computer vision. Recently, researchers have proposed a variety of creative methods for automatic recognition of RSI, and feature fusion is a research hotspot for its great potential to boost performance. However, RSI has a unique imaging condition and cluttered scenes with complicated backgrounds. This larger difference from nature images has made the previous feature fusion methods present insignificant performance improvements.
Design/methodology/approach
This work proposed a two-convolutional neural network (CNN) fusion method named main and branch CNN fusion network (MBC-Net) as an improved solution for classifying RSI. In detail, the MBC-Net employs an EfficientNet-B3 as its main CNN stream and an EfficientNet-B0 as a branch, named MC-B3 and BC-B0, respectively. In particular, MBC-Net includes a long-range derivation (LRD) module, which is specially designed to learn the dependence of different features. Meanwhile, MBC-Net also uses some unique ideas to tackle the problems coming from the two-CNN fusion and the inherent nature of RSI.
Findings
Extensive experiments on three RSI sets prove that MBC-Net outperforms the other 38 state-of-the-art (STOA) methods published from 2020 to 2023, with a noticeable increase in overall accuracy (OA) values. MBC-Net not only presents a 0.7% increased OA value on the most confusing NWPU set but also has 62% fewer parameters compared to the leading approach that ranks first in the literature.
Originality/value
MBC-Net is a more effective and efficient feature fusion approach compared to other STOA methods in the literature. Given the visualizations of grad class activation mapping (Grad-CAM), it reveals that MBC-Net can learn the long-range dependence of features that a single CNN cannot. Based on the tendency stochastic neighbor embedding (t-SNE) results, it demonstrates that the feature representation of MBC-Net is more effective than other methods. In addition, the ablation tests indicate that MBC-Net is effective and efficient for fusing features from two CNNs.
Details
Keywords
Juan Yang, Zhenkun Li and Xu Du
Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their…
Abstract
Purpose
Although numerous signal modalities are available for emotion recognition, audio and visual modalities are the most common and predominant forms for human beings to express their emotional states in daily communication. Therefore, how to achieve automatic and accurate audiovisual emotion recognition is significantly important for developing engaging and empathetic human–computer interaction environment. However, two major challenges exist in the field of audiovisual emotion recognition: (1) how to effectively capture representations of each single modality and eliminate redundant features and (2) how to efficiently integrate information from these two modalities to generate discriminative representations.
Design/methodology/approach
A novel key-frame extraction-based attention fusion network (KE-AFN) is proposed for audiovisual emotion recognition. KE-AFN attempts to integrate key-frame extraction with multimodal interaction and fusion to enhance audiovisual representations and reduce redundant computation, filling the research gaps of existing approaches. Specifically, the local maximum–based content analysis is designed to extract key-frames from videos for the purpose of eliminating data redundancy. Two modules, including “Multi-head Attention-based Intra-modality Interaction Module” and “Multi-head Attention-based Cross-modality Interaction Module”, are proposed to mine and capture intra- and cross-modality interactions for further reducing data redundancy and producing more powerful multimodal representations.
Findings
Extensive experiments on two benchmark datasets (i.e. RAVDESS and CMU-MOSEI) demonstrate the effectiveness and rationality of KE-AFN. Specifically, (1) KE-AFN is superior to state-of-the-art baselines for audiovisual emotion recognition. (2) Exploring the supplementary and complementary information of different modalities can provide more emotional clues for better emotion recognition. (3) The proposed key-frame extraction strategy can enhance the performance by more than 2.79 per cent on accuracy. (4) Both exploring intra- and cross-modality interactions and employing attention-based audiovisual fusion can lead to better prediction performance.
Originality/value
The proposed KE-AFN can support the development of engaging and empathetic human–computer interaction environment.
Details
Keywords
Yandong Hou, Zhengbo Wu, Xinghua Ren, Kaiwen Liu and Zhengquan Chen
High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the…
Abstract
Purpose
High-resolution remote sensing images possess a wealth of semantic information. However, these images often contain objects of different sizes and distributions, which make the semantic segmentation task challenging. In this paper, a bidirectional feature fusion network (BFFNet) is designed to address this challenge, which aims at increasing the accurate recognition of surface objects in order to effectively classify special features.
Design/methodology/approach
There are two main crucial elements in BFFNet. Firstly, the mean-weighted module (MWM) is used to obtain the key features in the main network. Secondly, the proposed polarization enhanced branch network performs feature extraction simultaneously with the main network to obtain different feature information. The authors then fuse these two features in both directions while applying a cross-entropy loss function to monitor the network training process. Finally, BFFNet is validated on two publicly available datasets, Potsdam and Vaihingen.
Findings
In this paper, a quantitative analysis method is used to illustrate that the proposed network achieves superior performance of 2–6%, respectively, compared to other mainstream segmentation networks from experimental results on two datasets. Complete ablation experiments are also conducted to demonstrate the effectiveness of the elements in the network. In summary, BFFNet has proven to be effective in achieving accurate identification of small objects and in reducing the effect of shadows on the segmentation process.
Originality/value
The originality of the paper is the proposal of a BFFNet based on multi-scale and multi-attention strategies to improve the ability to accurately segment high-resolution and complex remote sensing images, especially for small objects and shadow-obscured objects.
Details
Keywords
Pengyue Guo, Tianyun Shi, Zhen Ma and Jing Wang
The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera…
Abstract
Purpose
The paper aims to solve the problem of personnel intrusion identification within the limits of high-speed railways. It adopts the fusion method of millimeter wave radar and camera to improve the accuracy of object recognition in dark and harsh weather conditions.
Design/methodology/approach
This paper adopts the fusion strategy of radar and camera linkage to achieve focus amplification of long-distance targets and solves the problem of low illumination by laser light filling of the focus point. In order to improve the recognition effect, this paper adopts the YOLOv8 algorithm for multi-scale target recognition. In addition, for the image distortion caused by bad weather, this paper proposes a linkage and tracking fusion strategy to output the correct alarm results.
Findings
Simulated intrusion tests show that the proposed method can effectively detect human intrusion within 0–200 m during the day and night in sunny weather and can achieve more than 80% recognition accuracy for extreme severe weather conditions.
Originality/value
(1) The authors propose a personnel intrusion monitoring scheme based on the fusion of millimeter wave radar and camera, achieving all-weather intrusion monitoring; (2) The authors propose a new multi-level fusion algorithm based on linkage and tracking to achieve intrusion target monitoring under adverse weather conditions; (3) The authors have conducted a large number of innovative simulation experiments to verify the effectiveness of the method proposed in this article.
Details
Keywords
Jun Liu, Junyuan Dong, Mingming Hu and Xu Lu
Existing Simultaneous Localization and Mapping (SLAM) algorithms have been relatively well developed. However, when in complex dynamic environments, the movement of the dynamic…
Abstract
Purpose
Existing Simultaneous Localization and Mapping (SLAM) algorithms have been relatively well developed. However, when in complex dynamic environments, the movement of the dynamic points on the dynamic objects in the image in the mapping can have an impact on the observation of the system, and thus there will be biases and errors in the position estimation and the creation of map points. The aim of this paper is to achieve more accurate accuracy in SLAM algorithms compared to traditional methods through semantic approaches.
Design/methodology/approach
In this paper, the semantic segmentation of dynamic objects is realized based on U-Net semantic segmentation network, followed by motion consistency detection through motion detection method to determine whether the segmented objects are moving in the current scene or not, and combined with the motion compensation method to eliminate dynamic points and compensate for the current local image, so as to make the system robust.
Findings
Experiments comparing the effect of detecting dynamic points and removing outliers are conducted on a dynamic data set of Technische Universität München, and the results show that the absolute trajectory accuracy of this paper's method is significantly improved compared with ORB-SLAM3 and DS-SLAM.
Originality/value
In this paper, in the semantic segmentation network part, the segmentation mask is combined with the method of dynamic point detection, elimination and compensation, which reduces the influence of dynamic objects, thus effectively improving the accuracy of localization in dynamic environments.
Details
Keywords
Rolling element bearings (REBs) are commonly used in rotating machinery such as pumps, motors, fans and other machineries. The REBs deteriorate over life cycle time. To know the…
Abstract
Purpose
Rolling element bearings (REBs) are commonly used in rotating machinery such as pumps, motors, fans and other machineries. The REBs deteriorate over life cycle time. To know the amount of deteriorate at any time, this paper aims to present a prognostics approach based on integrating optimize health indicator (OHI) and machine learning algorithm.
Design/methodology/approach
Proposed optimum prediction model would be used to evaluate the remaining useful life (RUL) of REBs. Initially, signal raw data are preprocessing through mother wavelet transform; after that, the primary fault features are extracted. Further, these features process to elevate the clarity of features using the random forest algorithm. Based on variable importance of features, the best representation of fault features is selected. Optimize the selected feature by adjusting weight vector using optimization techniques such as genetic algorithm (GA), sequential quadratic optimization (SQO) and multiobjective optimization (MOO). New OHIs are determined and apply to train the network. Finally, optimum predictive models are developed by integrating OHI and artificial neural network (ANN), K-mean clustering (KMC) (i.e. OHI–GA–ANN, OHI–SQO–ANN, OHI–MOO–ANN, OHI–GA–KMC, OHI–SQO–KMC and OHI–MOO–KMC).
Findings
Optimum prediction models performance are recorded and compared with the actual value. Finally, based on error term values best optimum prediction model is proposed for evaluation of RUL of REBs.
Originality/value
Proposed OHI–GA–KMC model is compared in terms of error values with previously published work. RUL predicted by OHI–GA–KMC model is smaller, giving the advantage of this method.
Details
Keywords
Gang Yu, Zhiqiang Li, Ruochen Zeng, Yucong Jin, Min Hu and Vijayan Sugumaran
Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due…
Abstract
Purpose
Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due to limitations in utilizing heterogeneous sensing data and domain knowledge as well as insufficient generalizability resulting from limited data samples. This paper integrates implicit and qualitative expert knowledge into quantifiable values in tunnel condition assessment and proposes a tunnel structure prediction algorithm that augments a state-of-the-art attention-based long short-term memory (LSTM) model with expert rating knowledge to achieve robust prediction results to reasonably allocate maintenance resources.
Design/methodology/approach
Through formalizing domain experts' knowledge into quantitative tunnel condition index (TCI) with analytic hierarchy process (AHP), a fusion approach using sequence smoothing and sliding time window techniques is applied to the TCI and time-series sensing data. By incorporating both sensing data and expert ratings, an attention-based LSTM model is developed to improve prediction accuracy and reduce the uncertainty of structural influencing factors.
Findings
The empirical experiment in Dalian Road Tunnel in Shanghai, China showcases the effectiveness of the proposed method, which can comprehensively evaluate the tunnel structure condition and significantly improve prediction performance.
Originality/value
This study proposes a novel structure condition prediction algorithm that augments a state-of-the-art attention-based LSTM model with expert rating knowledge for robust prediction of structure condition of complex projects.
Details
Keywords
John Millar, Frank Mueller and Chris Carter
The paper provides a theoretical framework for interdisciplinary accounting scholars interested in performances of accountability in front of live audiences.
Abstract
Purpose
The paper provides a theoretical framework for interdisciplinary accounting scholars interested in performances of accountability in front of live audiences.
Design/methodology/approach
This is a processual case study of “Falkirk in crisis” that covers the period from September 2021 to September 2022. The focus of this paper is two-fan-Q&A sessions held in October 2021 and June 2022. Both are naturally occurring discussions between two groups such as are found in previous research on routine events and accountability. This is a theoretically consequential case study.
Findings
A key insight of the paper is to identify the practical and symbolic dimensions of accountability. The paper demonstrates the need to align these two dimensions when responding to questions: a practical question demands a practical answer and a symbolic question requires a symbolic answer. Second, the paper argues that most fields contain conflicting logics and highlights that a complete performance of accountability needs to cover the different conflicting logics within the field. In this case, this means paying full attention to both the communitarian and results logics. A third finding is that a performance of accountability cannot succeed if the audience rejects attempts to impose an unpalatable definition of the situation. If these three conditions are not met, the performance is bound to fail.
Research limitations/implications
An important theoretical coontribution of the study is the application of Jeffery Alexander’s work on political performance to public performances of accountability.
Practical implications
The phenomenon explored in the paper (what the authors term “grassroots accountability”) has broad applicability to any situation in organizational or civic life where the power apex of an organization is required to engage with a group of informed and committed stakeholders – the “community”. For those who find themselves in the position of the fans in this study, the observations set out in the empirical narrative can serve as a useful practical guide. Attempts to answer a practical complaint with a symbolic answer (or vice versa) should be challenged as evasive.
Social implications
This paper studies an engagement of elite actors with ordinary (or grassroots) actors. The study shows important rules of engagement, including the importance of respecting the power of practical questions and the need to engage with these questions appropriately.
Originality/value
This paper offers a new vista for interdisciplinary accounting by synthesizing the accountability literature with the political performance literature. Specifically, the paper employs Jeffery Alexander’s work on practical and symbolic performance to study the microprocesses underpinning successful and unsuccessful performances of accountability.
Details
Keywords
Hu Luo, Haobin Ruan and Dawei Tu
The purpose of this paper is to propose a whole set of methods for underwater target detection, because most underwater objects have small samples, low quality underwater images…
Abstract
Purpose
The purpose of this paper is to propose a whole set of methods for underwater target detection, because most underwater objects have small samples, low quality underwater images problems such as detail loss, low contrast and color distortion, and verify the feasibility of the proposed methods through experiments.
Design/methodology/approach
The improved RGHS algorithm to enhance the original underwater target image is proposed, and then the YOLOv4 deep learning network for underwater small sample targets detection is improved based on the combination of traditional data expansion method and Mosaic algorithm, expanding the feature extraction capability with SPP (Spatial Pyramid Pooling) module after each feature extraction layer to extract richer feature information.
Findings
The experimental results, using the official dataset, reveal a 3.5% increase in average detection accuracy for three types of underwater biological targets compared to the traditional YOLOv4 algorithm. In underwater robot application testing, the proposed method achieves an impressive 94.73% average detection accuracy for the three types of underwater biological targets.
Originality/value
Underwater target detection is an important task for underwater robot application. However, most underwater targets have the characteristics of small samples, and the detection of small sample targets is a comprehensive problem because it is affected by the quality of underwater images. This paper provides a whole set of methods to solve the problems, which is of great significance to the application of underwater robot.
Details
Keywords
An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their…
Abstract
Purpose
An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their requirements using an image query. Nevertheless, determining whether the retrieval system can provide convenient operation and relevant retrieval results is challenging. A CBIR system based on deep learning features was proposed in this study to effectively search and navigate images in digital articles.
Design/methodology/approach
Convolutional neural networks (CNNs) were used as the feature extractors in the author's experiments. Using pretrained parameters, the training time and retrieval time were reduced. Different CNN features were extracted from the constructed image databases consisting of images taken from the National Palace Museum Journals Archive and were compared in the CBIR system.
Findings
DenseNet201 achieved the best performance, with a top-10 mAP of 89% and a query time of 0.14 s.
Practical implications
The CBIR homepage displayed image categories showing the content of the database and provided the default query images. After retrieval, the result showed the metadata of the retrieved images and links back to the original pages.
Originality/value
With the interface and retrieval demonstration, a novel image-based reading mode can be established via the CBIR and links to the original images and contextual descriptions.
Details