Search results
1 – 10 of 256Wei Shi, Jing Zhang and Shaoyi He
With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as…
Abstract
Purpose
With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as how to represent the features of different modalities and achieve effective cross-modal feature fusion when analyzing the multi-modal sentiment of Chinese short videos (CSVs).
Design/methodology/approach
This paper aims to propose a sentiment analysis model MSCNN-CPL-CAFF using multi-scale convolutional neural network and cross attention fusion mechanism to analyze the CSVs. The audio-visual and textual data of CSVs themed on “COVID-19, catering industry” are collected from CSV platform Douyin first, and then a comparative analysis is conducted with advanced baseline models.
Findings
The sample number of the weak negative and neutral sentiment is the largest, and the sample number of the positive and weak positive sentiment is relatively small, accounting for only about 11% of the total samples. The MSCNN-CPL-CAFF model has achieved the Acc-2, Acc-3 and F1 score of 85.01%, 74.16 and 84.84%, respectively, which outperforms the highest value of baseline methods in accuracy and achieves competitive computation speed.
Practical implications
This research offers some implications regarding the impact of COVID-19 on catering industry in China by focusing on multi-modal sentiment of CSVs. The methodology can be utilized to analyze the opinions of the general public on social media platform and to categorize them accordingly.
Originality/value
This paper presents a novel deep-learning multimodal sentiment analysis model, which provides a new perspective for public opinion research on the short video platform.
Details
Keywords
Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán
COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…
Abstract
Purpose
COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).
Design/methodology/approach
This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.
Findings
Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.
Originality/value
Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.
Details
Keywords
Yangze Liang and Zhao Xu
Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components…
Abstract
Purpose
Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components during the construction phase is predominantly done manually, resulting in low efficiency and hindering the progress of intelligent construction. This paper presents an intelligent inspection method for assessing the appearance quality of PC components, utilizing an enhanced you look only once (YOLO) model and multi-source data. The aim of this research is to achieve automated management of the appearance quality of precast components in the prefabricated construction process through digital means.
Design/methodology/approach
The paper begins by establishing an improved YOLO model and an image dataset for evaluating appearance quality. Through object detection in the images, a preliminary and efficient assessment of the precast components' appearance quality is achieved. Moreover, the detection results are mapped onto the point cloud for high-precision quality inspection. In the case of precast components with quality defects, precise quality inspection is conducted by combining the three-dimensional model data obtained from forward design conversion with the captured point cloud data through registration. Additionally, the paper proposes a framework for an automated inspection platform dedicated to assessing appearance quality in prefabricated buildings, encompassing the platform's hardware network.
Findings
The improved YOLO model achieved a best mean average precision of 85.02% on the VOC2007 dataset, surpassing the performance of most similar models. After targeted training, the model exhibits excellent recognition capabilities for the four common appearance quality defects. When mapped onto the point cloud, the accuracy of quality inspection based on point cloud data and forward design is within 0.1 mm. The appearance quality inspection platform enables feedback and optimization of quality issues.
Originality/value
The proposed method in this study enables high-precision, visualized and automated detection of the appearance quality of PC components. It effectively meets the demand for quality inspection of precast components on construction sites of prefabricated buildings, providing technological support for the development of intelligent construction. The design of the appearance quality inspection platform's logic and framework facilitates the integration of the method, laying the foundation for efficient quality management in the future.
Details
Keywords
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based…
Abstract
Purpose
Single-shot multi-category clothing recognition and retrieval play a crucial role in online searching and offline settlement scenarios. Existing clothing recognition methods based on RGBD clothing images often suffer from high-dimensional feature representations, leading to compromised performance and efficiency.
Design/methodology/approach
To address this issue, this paper proposes a novel method called Manifold Embedded Discriminative Feature Selection (MEDFS) to select global and local features, thereby reducing the dimensionality of the feature representation and improving performance. Specifically, by combining three global features and three local features, a low-dimensional embedding is constructed to capture the correlations between features and categories. The MEDFS method designs an optimization framework utilizing manifold mapping and sparse regularization to achieve feature selection. The optimization objective is solved using an alternating iterative strategy, ensuring convergence.
Findings
Empirical studies conducted on a publicly available RGBD clothing image dataset demonstrate that the proposed MEDFS method achieves highly competitive clothing classification performance while maintaining efficiency in clothing recognition and retrieval.
Originality/value
This paper introduces a novel approach for multi-category clothing recognition and retrieval, incorporating the selection of global and local features. The proposed method holds potential for practical applications in real-world clothing scenarios.
Details
Keywords
Ankang Ji, Xiaolong Xue, Limao Zhang, Xiaowei Luo and Qingpeng Man
Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack…
Abstract
Purpose
Crack detection of pavement is a critical task in the periodic survey. Efficient, effective and consistent tracking of the road conditions by identifying and locating crack contributes to establishing an appropriate road maintenance and repair strategy from the promptly informed managers but still remaining a significant challenge. This research seeks to propose practical solutions for targeting the automatic crack detection from images with efficient productivity and cost-effectiveness, thereby improving the pavement performance.
Design/methodology/approach
This research applies a novel deep learning method named TransUnet for crack detection, which is structured based on Transformer, combined with convolutional neural networks as encoder by leveraging a global self-attention mechanism to better extract features for enhancing automatic identification. Afterward, the detected cracks are used to quantify morphological features from five indicators, such as length, mean width, maximum width, area and ratio. Those analyses can provide valuable information for engineers to assess the pavement condition with efficient productivity.
Findings
In the training process, the TransUnet is fed by a crack dataset generated by the data augmentation with a resolution of 224 × 224 pixels. Subsequently, a test set containing 80 new images is used for crack detection task based on the best selected TransUnet with a learning rate of 0.01 and a batch size of 1, achieving an accuracy of 0.8927, a precision of 0.8813, a recall of 0.8904, an F1-measure and dice of 0.8813, and a Mean Intersection over Union of 0.8082, respectively. Comparisons with several state-of-the-art methods indicate that the developed approach in this research outperforms with greater efficiency and higher reliability.
Originality/value
The developed approach combines TransUnet with an integrated quantification algorithm for crack detection and quantification, performing excellently in terms of comparisons and evaluation metrics, which can provide solutions with potentially serving as the basis for an automated, cost-effective pavement condition assessment scheme.
Details
Keywords
Jiaying Chen, Cheng Li, Liyao Huang and Weimin Zheng
Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep…
Abstract
Purpose
Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep learning model for capturing dynamic spatial effects.
Design/methodology/approach
A novel deep learning model founded on the transformer architecture, called the spatiotemporal transformer network, is presented. This model has three components: the temporal transformer, spatial transformer and spatiotemporal fusion modules. The dynamic temporal dependencies of each attraction are extracted efficiently by the temporal transformer module. The dynamic spatial correlations between attractions are extracted efficiently by the spatial transformer module. The extracted dynamic temporal and spatial features are fused in a learnable manner in the spatiotemporal fusion module. Convolutional operations are implemented to generate the final forecasts.
Findings
The results indicate that the proposed model performs better in forecasting accuracy than some popular benchmark models, demonstrating its significant forecasting performance. Incorporating dynamic spatiotemporal features is an effective strategy for improving forecasting. It can provide an important reference to related studies.
Practical implications
The proposed model leverages high-frequency data to achieve accurate predictions at the micro level by incorporating dynamic spatial effects. Destination managers should fully consider the dynamic spatial effects of attractions when planning and marketing to promote tourism resources.
Originality/value
This study incorporates dynamic spatial effects into tourism demand forecasting models by using a transformer neural network. It advances the development of methodologies in related fields.
目的
纳入动态空间效应在提高旅游需求预测的准确性方面具有相当大的潜力。本研究提出了一种捕捉动态空间效应的创新型深度学习模型。
设计/方法/途径
本研究提出了一种基于变压器架构的新型深度学习模型, 称为时空变压器网络。该模型由三个部分组成:时空转换器、空间转换器和时空融合模块。时空转换器模块可有效提取每个景点的动态时间依赖关系。空间转换器模块可有效提取景点之间的动态空间相关性。提取的动态时间和空间特征在时空融合模块中以可学习的方式进行融合。通过卷积运算生成最终预测结果。
研究结果
结果表明, 与一些流行的基准模型相比, 所提出的模型在预测准确性方面表现更好, 证明了其显著的预测性能。纳入动态时空特征是改进预测的有效策略。它可为相关研究提供重要参考。
实践意义
所提出的模型利用高频数据, 通过纳入动态空间效应, 在微观层面上实现了准确预测。旅游目的地管理者在规划和营销推广旅游资源时, 应充分考虑景点的动态空间效应。
原创性/价值
本研究通过使用变压器神经网络, 将动态空间效应纳入旅游需求预测模型。它推动了相关领域方法论的发展。
Objetivo
La incorporación de efectos espaciales dinámicos ofrece un considerable potencial para mejorar la precisión de la previsión de la demanda turística. Este estudio propone un modelo innovador de aprendizaje profundo para capturar los efectos espaciales dinámicos.
Diseño/metodología/enfoque
Se presenta un novedoso modelo de aprendizaje profundo basado en la arquitectura transformadora, denominado red de transformador espaciotemporal. Este modelo tiene tres componentes: el transformador temporal, el transformador espacial y los módulos de fusión espaciotemporal. El módulo transformador temporal extrae de manera eficiente las dependencias temporales dinámicas de cada atracción. El módulo transformador espacial extrae eficientemente las correlaciones espaciales dinámicas entre las atracciones. Las características dinámicas temporales y espaciales extraídas se fusionan de manera que se puede aprender en el módulo de fusión espaciotemporal. Se aplican operaciones convolucionales para generar las previsiones finales.
Conclusiones
Los resultados indican que el modelo propuesto obtiene mejores resultados en la precisión de las previsiones que algunos modelos de referencia conocidos, lo que demuestra su importante capacidad de previsión. La incorporación de características espaciotemporales dinámicas supone una estrategia eficaz para mejorar las previsiones. Esto puede proporcionar una referencia importante para estudios afines.
Implicaciones prácticas
El modelo propuesto aprovecha los datos de alta frecuencia para lograr predicciones precisas a nivel micro incorporando efectos espaciales dinámicos. Los gestores de destinos deberían tener plenamente en cuenta los efectos espaciales dinámicos de las atracciones en la planificación y marketing para la promoción de los recursos turísticos.
Originalidad/valor
Este estudio incorpora efectos espaciales dinámicos a los modelos de previsión de la demanda turística mediante el empleo de una red neuronal transformadora. Supone un avance en el desarrollo de metodologías en campos afines.
Details
Keywords
Jacqueline Humphries, Pepijn Van de Ven, Nehal Amer, Nitin Nandeshwar and Alan Ryan
Maintaining the safety of the human is a major concern in factories where humans co-exist with robots and other physical tools. Typically, the area around the robots is monitored…
Abstract
Purpose
Maintaining the safety of the human is a major concern in factories where humans co-exist with robots and other physical tools. Typically, the area around the robots is monitored using lasers. However, lasers cannot distinguish between human and non-human objects in the robot’s path. Stopping or slowing down the robot when non-human objects approach is unproductive. This research contribution addresses that inefficiency by showing how computer-vision techniques can be used instead of lasers which improve up-time of the robot.
Design/methodology/approach
A computer-vision safety system is presented. Image segmentation, 3D point clouds, face recognition, hand gesture recognition, speed and trajectory tracking and a digital twin are used. Using speed and separation, the robot’s speed is controlled based on the nearest location of humans accurate to their body shape. The computer-vision safety system is compared to a traditional laser measure. The system is evaluated in a controlled test, and in the field.
Findings
Computer-vision and lasers are shown to be equivalent by a measure of relationship and measure of agreement. R2 is given as 0.999983. The two methods are systematically producing similar results, as the bias is close to zero, at 0.060 mm. Using Bland–Altman analysis, 95% of the differences lie within the limits of maximum acceptable differences.
Originality/value
In this paper an original model for future computer-vision safety systems is described which is equivalent to existing laser systems, identifies and adapts to particular humans and reduces the need to slow and stop systems thereby improving efficiency. The implication is that computer-vision can be used to substitute lasers and permit adaptive robotic control in human–robot collaboration systems.
Details
Keywords
Gang Yu, Zhiqiang Li, Ruochen Zeng, Yucong Jin, Min Hu and Vijayan Sugumaran
Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due…
Abstract
Purpose
Accurate prediction of the structural condition of urban critical infrastructure is crucial for predictive maintenance. However, the existing prediction methods lack precision due to limitations in utilizing heterogeneous sensing data and domain knowledge as well as insufficient generalizability resulting from limited data samples. This paper integrates implicit and qualitative expert knowledge into quantifiable values in tunnel condition assessment and proposes a tunnel structure prediction algorithm that augments a state-of-the-art attention-based long short-term memory (LSTM) model with expert rating knowledge to achieve robust prediction results to reasonably allocate maintenance resources.
Design/methodology/approach
Through formalizing domain experts' knowledge into quantitative tunnel condition index (TCI) with analytic hierarchy process (AHP), a fusion approach using sequence smoothing and sliding time window techniques is applied to the TCI and time-series sensing data. By incorporating both sensing data and expert ratings, an attention-based LSTM model is developed to improve prediction accuracy and reduce the uncertainty of structural influencing factors.
Findings
The empirical experiment in Dalian Road Tunnel in Shanghai, China showcases the effectiveness of the proposed method, which can comprehensively evaluate the tunnel structure condition and significantly improve prediction performance.
Originality/value
This study proposes a novel structure condition prediction algorithm that augments a state-of-the-art attention-based LSTM model with expert rating knowledge for robust prediction of structure condition of complex projects.
Details
Keywords
Faris Elghaish, Sandra Matarneh, Essam Abdellatef, Farzad Rahimian, M. Reza Hosseini and Ahmed Farouk Kineber
Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly…
Abstract
Purpose
Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements.
Design/methodology/approach
To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models.
Findings
The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance.
Practical implications
With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation.
Originality/value
The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.
Details
Keywords
An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their…
Abstract
Purpose
An increasing number of images are generated daily, and images are gradually becoming a search target. Content-based image retrieval (CBIR) is helpful for users to express their requirements using an image query. Nevertheless, determining whether the retrieval system can provide convenient operation and relevant retrieval results is challenging. A CBIR system based on deep learning features was proposed in this study to effectively search and navigate images in digital articles.
Design/methodology/approach
Convolutional neural networks (CNNs) were used as the feature extractors in the author's experiments. Using pretrained parameters, the training time and retrieval time were reduced. Different CNN features were extracted from the constructed image databases consisting of images taken from the National Palace Museum Journals Archive and were compared in the CBIR system.
Findings
DenseNet201 achieved the best performance, with a top-10 mAP of 89% and a query time of 0.14 s.
Practical implications
The CBIR homepage displayed image categories showing the content of the database and provided the default query images. After retrieval, the result showed the metadata of the retrieved images and links back to the original pages.
Originality/value
With the interface and retrieval demonstration, a novel image-based reading mode can be established via the CBIR and links to the original images and contextual descriptions.
Details