Search results

1 – 10 of 256

View access options

Article

Publication date: 12 September 2023

Understanding public opinions on Chinese short video platform by multimodal sentiment analysis using deep learning-based techniques

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as…

HTML

PDF (1 MB)

Downloads

116

Abstract

Purpose

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as how to represent the features of different modalities and achieve effective cross-modal feature fusion when analyzing the multi-modal sentiment of Chinese short videos (CSVs).

Design/methodology/approach

This paper aims to propose a sentiment analysis model MSCNN-CPL-CAFF using multi-scale convolutional neural network and cross attention fusion mechanism to analyze the CSVs. The audio-visual and textual data of CSVs themed on “COVID-19, catering industry” are collected from CSV platform Douyin first, and then a comparative analysis is conducted with advanced baseline models.

Findings

The sample number of the weak negative and neutral sentiment is the largest, and the sample number of the positive and weak positive sentiment is relatively small, accounting for only about 11% of the total samples. The MSCNN-CPL-CAFF model has achieved the Acc-2, Acc-3 and F1 score of 85.01%, 74.16 and 84.84%, respectively, which outperforms the highest value of baseline methods in accuracy and achieves competitive computation speed.

Practical implications

This research offers some implications regarding the impact of COVID-19 on catering industry in China by focusing on multi-modal sentiment of CSVs. The methodology can be utilized to analyze the opinions of the general public on social media platform and to categorize them accordingly.

Originality/value

This paper presents a novel deep-learning multimodal sentiment analysis model, which provides a new perspective for public opinion research on the short video platform.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 10 January 2024

On the differences between CNNs and vision transformers for COVID-19 diagnosis using CT and chest x-ray mono- and multimodality

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

HTML

PDF (1.4 MB)

Downloads

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 31 October 2023

Intelligent inspection of appearance quality for precast concrete components based on improved YOLO model and multi-source data

Yangze Liang and Zhao Xu

Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components…

HTML

PDF (5.5 MB)

Downloads

183

Abstract

Purpose

Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components during the construction phase is predominantly done manually, resulting in low efficiency and hindering the progress of intelligent construction. This paper presents an intelligent inspection method for assessing the appearance quality of PC components, utilizing an enhanced you look only once (YOLO) model and multi-source data. The aim of this research is to achieve automated management of the appearance quality of precast components in the prefabricated construction process through digital means.

Design/methodology/approach

The paper begins by establishing an improved YOLO model and an image dataset for evaluating appearance quality. Through object detection in the images, a preliminary and efficient assessment of the precast components' appearance quality is achieved. Moreover, the detection results are mapped onto the point cloud for high-precision quality inspection. In the case of precast components with quality defects, precise quality inspection is conducted by combining the three-dimensional model data obtained from forward design conversion with the captured point cloud data through registration. Additionally, the paper proposes a framework for an automated inspection platform dedicated to assessing appearance quality in prefabricated buildings, encompassing the platform's hardware network.

Findings

The improved YOLO model achieved a best mean average precision of 85.02% on the VOC2007 dataset, surpassing the performance of most similar models. After targeted training, the model exhibits excellent recognition capabilities for the four common appearance quality defects. When mapped onto the point cloud, the accuracy of quality inspection based on point cloud data and forward design is within 0.1 mm. The appearance quality inspection platform enables feedback and optimization of quality issues.

Originality/value

The proposed method in this study enables high-precision, visualized and automated detection of the appearance quality of PC components. It effectively meets the demand for quality inspection of precast components on construction sites of prefabricated buildings, providing technological support for the development of intelligent construction. The design of the appearance quality inspection platform's logic and framework facilitates the integration of the method, laying the foundation for efficient quality management in the future.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 19 December 2023

Manifold embedded global and local discriminative features selection for single-shot multi-categories clothing recognition and retrieval

Jinchao Huang

Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based…

HTML

PDF (3.2 MB)

Downloads

Abstract

Purpose

Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based on RGBD clothing images often suffer from high-dimensional feature representations, leading to compromised performance and efficiency.

Design/methodology/approach

To address this issue, this paper proposes a novel method called Manifold Embedded Discriminative Feature Selection (MEDFS) to select global and local features, thereby reducing the dimensionality of the feature representation and improving performance. Specifically, by combining three global features and three local features, a low-dimensional embedding is constructed to capture the correlations between features and categories. The MEDFS method designs an optimization framework utilizing manifold mapping and sparse regularization to achieve feature selection. The optimization objective is solved using an alternating iterative strategy, ensuring convergence.

Findings

Empirical studies conducted on a publicly available RGBD clothing image dataset demonstrate that the proposed MEDFS method achieves highly competitive clothing classification performance while maintaining efficiency in clothing recognition and retrieval.

Originality/value

This paper introduces a novel approach for multi-category clothing recognition and retrieval, incorporating the selection of global and local features. The proposed method holds potential for practical applications in real-world clothing scenarios.

Details

International Journal of Intelligent Computing and Cybernetics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 28 December 2023

A transformer-based deep learning method for automatic pixel-level crack detection and feature quantification

Ankang Ji, Xiaolong Xue, Limao Zhang, Xiaowei Luo and Qingpeng Man

Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack…

HTML

PDF (7.1 MB)

Downloads

139

Abstract

Purpose

Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack contributes to establishing an appropriate road maintenance and repair strategy from the promptly informed managers but still remaining a significant challenge. This research seeks to propose practical solutions for targeting the automatic crack detection from images with efficient productivity and cost-effectiveness, thereby improving the pavement performance.

Design/methodology/approach

This research applies a novel deep learning method named TransUnet for crack detection, which is structured based on Transformer, combined with convolutional neural networks as encoder by leveraging a global self-attention mechanism to better extract features for enhancing automatic identification. Afterward, the detected cracks are used to quantify morphological features from five indicators, such as length, mean width, maximum width, area and ratio. Those analyses can provide valuable information for engineers to assess the pavement condition with efficient productivity.

Findings

In the training process, the TransUnet is fed by a crack dataset generated by the data augmentation with a resolution of 224 × 224 pixels. Subsequently, a test set containing 80 new images is used for crack detection task based on the best selected TransUnet with a learning rate of 0.01 and a batch size of 1, achieving an accuracy of 0.8927, a precision of 0.8813, a recall of 0.8904, an F1-measure and dice of 0.8813, and a Mean Intersection over Union of 0.8082, respectively. Comparisons with several state-of-the-art methods indicate that the developed approach in this research outperforms with greater efficiency and higher reliability.

Originality/value

The developed approach combines TransUnet with an integrated quantification algorithm for crack detection and quantification, performing excellently in terms of comparisons and evaluation metrics, which can provide solutions with potentially serving as the basis for an automated, cost-effective pavement condition assessment scheme.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 28 November 2023

Tourism demand forecasting: a deep learning model based on spatial-temporal transformer

Jiaying Chen, Cheng Li, Liyao Huang and Weimin Zheng

Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep…

HTML

PDF (1.1 MB)

Downloads

174

Abstract

Purpose

Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep learning model for capturing dynamic spatial effects.

Design/methodology/approach

A novel deep learning model founded on the transformer architecture, called the spatiotemporal transformer network, is presented. This model has three components: the temporal transformer, spatial transformer and spatiotemporal fusion modules. The dynamic temporal dependencies of each attraction are extracted efficiently by the temporal transformer module. The dynamic spatial correlations between attractions are extracted efficiently by the spatial transformer module. The extracted dynamic temporal and spatial features are fused in a learnable manner in the spatiotemporal fusion module. Convolutional operations are implemented to generate the final forecasts.

Findings

The results indicate that the proposed model performs better in forecasting accuracy than some popular benchmark models, demonstrating its significant forecasting performance. Incorporating dynamic spatiotemporal features is an effective strategy for improving forecasting. It can provide an important reference to related studies.

Practical implications

The proposed model leverages high-frequency data to achieve accurate predictions at the micro level by incorporating dynamic spatial effects. Destination managers should fully consider the dynamic spatial effects of attractions when planning and marketing to promote tourism resources.

Originality/value

This study incorporates dynamic spatial effects into tourism demand forecasting models by using a transformer neural network. It advances the development of methodologies in related fields.

目的

纳入动态空间效应在提高旅游需求预测的准确性方面具有相当大的潜力。本研究提出了一种捕捉动态空间效应的创新型深度学习模型。

设计/方法/途径

本研究提出了一种基于变压器架构的新型深度学习模型, 称为时空变压器网络。该模型由三个部分组成：时空转换器、空间转换器和时空融合模块。时空转换器模块可有效提取每个景点的动态时间依赖关系。空间转换器模块可有效提取景点之间的动态空间相关性。提取的动态时间和空间特征在时空融合模块中以可学习的方式进行融合。通过卷积运算生成最终预测结果。

研究结果

结果表明, 与一些流行的基准模型相比, 所提出的模型在预测准确性方面表现更好, 证明了其显著的预测性能。纳入动态时空特征是改进预测的有效策略。它可为相关研究提供重要参考。

实践意义

所提出的模型利用高频数据, 通过纳入动态空间效应, 在微观层面上实现了准确预测。旅游目的地管理者在规划和营销推广旅游资源时, 应充分考虑景点的动态空间效应。

原创性/价值

本研究通过使用变压器神经网络, 将动态空间效应纳入旅游需求预测模型。它推动了相关领域方法论的发展。

Objetivo

La incorporación de efectos espaciales dinámicos ofrece un considerable potencial para mejorar la precisión de la previsión de la demanda turística. Este estudio propone un modelo innovador de aprendizaje profundo para capturar los efectos espaciales dinámicos.

Diseño/metodología/enfoque

Se presenta un novedoso modelo de aprendizaje profundo basado en la arquitectura transformadora, denominado red de transformador espaciotemporal. Este modelo tiene tres componentes: el transformador temporal, el transformador espacial y los módulos de fusión espaciotemporal. El módulo transformador temporal extrae de manera eficiente las dependencias temporales dinámicas de cada atracción. El módulo transformador espacial extrae eficientemente las correlaciones espaciales dinámicas entre las atracciones. Las características dinámicas temporales y espaciales extraídas se fusionan de manera que se puede aprender en el módulo de fusión espaciotemporal. Se aplican operaciones convolucionales para generar las previsiones finales.

Conclusiones

Los resultados indican que el modelo propuesto obtiene mejores resultados en la precisión de las previsiones que algunos modelos de referencia conocidos, lo que demuestra su importante capacidad de previsión. La incorporación de características espaciotemporales dinámicas supone una estrategia eficaz para mejorar las previsiones. Esto puede proporcionar una referencia importante para estudios afines.

Implicaciones prácticas

El modelo propuesto aprovecha los datos de alta frecuencia para lograr predicciones precisas a nivel micro incorporando efectos espaciales dinámicos. Los gestores de destinos deberían tener plenamente en cuenta los efectos espaciales dinámicos de las atracciones en la planificación y marketing para la promoción de los recursos turísticos.

Originalidad/valor

Este estudio incorpora efectos espaciales dinámicos a los modelos de previsión de la demanda turística mediante el empleo de una red neuronal transformadora. Supone un avance en el desarrollo de metodologías en campos afines.

Details

Tourism Review, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1660-5373

Keywords

View access options

Article

Publication date: 30 April 2024

Managing safety of the human on the factory floor: a computer vision fusion approach

Jacqueline Humphries, Pepijn Van de Ven, Nehal Amer, Nitin Nandeshwar and Alan Ryan

Maintaining the safety of the human is a major concern in factories where humans co-exist with robots and other physical tools. Typically, the area around the robots is monitored…

HTML

PDF (5 MB)

Downloads

Abstract

Purpose

Maintaining the safety of the human is a major concern in factories where humans co-exist with robots and other physical tools. Typically, the area around the robots is monitored using lasers. However, lasers cannot distinguish between human and non-human objects in the robot’s path. Stopping or slowing down the robot when non-human objects approach is unproductive. This research contribution addresses that inefficiency by showing how computer-vision techniques can be used instead of lasers which improve up-time of the robot.

Design/methodology/approach

A computer-vision safety system is presented. Image segmentation, 3D point clouds, face recognition, hand gesture recognition, speed and trajectory tracking and a digital twin are used. Using speed and separation, the robot’s speed is controlled based on the nearest location of humans accurate to their body shape. The computer-vision safety system is compared to a traditional laser measure. The system is evaluated in a controlled test, and in the field.

Findings

Computer-vision and lasers are shown to be equivalent by a measure of relationship and measure of agreement. R² is given as 0.999983. The two methods are systematically producing similar results, as the bias is close to zero, at 0.060 mm. Using Bland–Altman analysis, 95% of the differences lie within the limits of maximum acceptable differences.

Originality/value

In this paper an original model for future computer-vision safety systems is described which is equivalent to existing laser systems, identifies and adapts to particular humans and reduces the need to slow and stop systems thereby improving efficiency. The implication is that computer-vision can be used to substitute lasers and permit adaptive robotic control in human–robot collaboration systems.

Details

Technological Sustainability, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2754-1312

Keywords

View access options

Article

Publication date: 20 March 2024

Predicting structure performance of urban critical infrastructure: an augmented attention-based LSTM model

Gang Yu, Zhiqiang Li, Ruochen Zeng, Yucong Jin, Min Hu and Vijayan Sugumaran

Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due…

HTML

PDF (3 MB)

Downloads

Abstract

Purpose

Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due to limitations in utilizing heterogeneous sensing data and domain knowledge as well as insufficient generalizability resulting from limited data samples. This paper integrates implicit and qualitative expert knowledge into quantifiable values in tunnel condition assessment and proposes a tunnel structure prediction algorithm that augments a state-of-the-art attention-based long short-term memory (LSTM) model with expert rating knowledge to achieve robust prediction results to reasonably allocate maintenance resources.

Design/methodology/approach

Through formalizing domain experts' knowledge into quantitative tunnel condition index (TCI) with analytic hierarchy process (AHP), a fusion approach using sequence smoothing and sliding time window techniques is applied to the TCI and time-series sensing data. By incorporating both sensing data and expert ratings, an attention-based LSTM model is developed to improve prediction accuracy and reduce the uncertainty of structural influencing factors.

Findings

The empirical experiment in Dalian Road Tunnel in Shanghai, China showcases the effectiveness of the proposed method, which can comprehensively evaluate the tunnel structure condition and significantly improve prediction performance.

Originality/value

This study proposes a novel structure condition prediction algorithm that augments a state-of-the-art attention-based LSTM model with expert rating knowledge for robust prediction of structure condition of complex projects.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 15 January 2024

Multi-layers deep learning model with feature selection for automated detection and classification of highway pavement cracks

Faris Elghaish, Sandra Matarneh, Essam Abdellatef, Farzad Rahimian, M. Reza Hosseini and Ahmed Farouk Kineber

Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly…

HTML

PDF (4.1 MB)

Downloads

113

Abstract

Purpose

Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements.

Design/methodology/approach

To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models.

Findings

The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance.

Practical implications

With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation.

Originality/value

The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.

Details

Smart and Sustainable Built Environment, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2046-6099

Keywords

View access options

Article

Publication date: 24 January 2024

Multimedia information retrieval using content-based image retrieval and context link for Chinese cultural artifacts

Chung-Ming Lo

An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their…

HTML

PDF (3 MB)

Downloads

Abstract

Purpose

An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their requirements using an image query. Nevertheless, determining whether the retrieval system can provide convenient operation and relevant retrieval results is challenging. A CBIR system based on deep learning features was proposed in this study to effectively search and navigate images in digital articles.

Design/methodology/approach

Convolutional neural networks (CNNs) were used as the feature extractors in the author's experiments. Using pretrained parameters, the training time and retrieval time were reduced. Different CNN features were extracted from the constructed image databases consisting of images taken from the National Palace Museum Journals Archive and were compared in the CBIR system.

Findings

DenseNet201 achieved the best performance, with a top-10 mAP of 89% and a query time of 0.14 s.

Practical implications

The CBIR homepage displayed image categories showing the content of the database and provided the default query images. After retrieval, the result showed the metadata of the retrieved images and links back to the original pages.

Originality/value

With the interface and retrieval demonstration, a novel image-based reading mode can be established via the CBIR and links to the original images and contextual descriptions.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

Access

Year

Content type

Earlycite article (256)

1 – 10 of 256