Search results

1 – 10 of 308

View access options

Article

Publication date: 19 January 2024

ID-SF-Fusion: a cooperative model of intent detection and slot filling for natural language understanding

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is…

HTML

PDF (444 KB)

Downloads

Abstract

Purpose

Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is to extract the information that is important to the intent from the input sentence. However, most of the existing methods use sentence-level intention recognition, which has the risk of error propagation, and the relationship between intention recognition and SF is not explicitly modeled. Aiming at this problem, this paper proposes a collaborative model of ID and SF for intelligent spoken language understanding called ID-SF-Fusion.

Design/methodology/approach

ID-SF-Fusion uses Bidirectional Encoder Representation from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) to extract effective word embedding and context vectors containing the whole sentence information respectively. Fusion layer is used to provide intent–slot fusion information for SF task. In this way, the relationship between ID and SF task is fully explicitly modeled. This layer takes the result of ID and slot context vectors as input to obtain the fusion information which contains both ID result and slot information. Meanwhile, to further reduce error propagation, we use word-level ID for the ID-SF-Fusion model. Finally, two tasks of ID and SF are realized by joint optimization training.

Findings

We conducted experiments on two public datasets, Airline Travel Information Systems (ATIS) and Snips. The results show that the Intent ACC score and Slot F1 score of ID-SF-Fusion on ATIS and Snips are 98.0 per cent and 95.8 per cent, respectively, and the two indicators on Snips dataset are 98.6 per cent and 96.7 per cent, respectively. These models are superior to slot-gated, SF-ID NetWork, stack-Prop and other models. In addition, ablation experiments were performed to further analyze and discuss the proposed model.

Originality/value

This paper uses word-level intent recognition and introduces intent information into the SF process, which is a significant improvement on both data sets.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 12 September 2023

Understanding public opinions on Chinese short video platform by multimodal sentiment analysis using deep learning-based techniques

Wei Shi, Jing Zhang and Shaoyi He

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as…

HTML

PDF (1 MB)

Downloads

116

Abstract

Purpose

With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as how to represent the features of different modalities and achieve effective cross-modal feature fusion when analyzing the multi-modal sentiment of Chinese short videos (CSVs).

Design/methodology/approach

This paper aims to propose a sentiment analysis model MSCNN-CPL-CAFF using multi-scale convolutional neural network and cross attention fusion mechanism to analyze the CSVs. The audio-visual and textual data of CSVs themed on “COVID-19, catering industry” are collected from CSV platform Douyin first, and then a comparative analysis is conducted with advanced baseline models.

Findings

The sample number of the weak negative and neutral sentiment is the largest, and the sample number of the positive and weak positive sentiment is relatively small, accounting for only about 11% of the total samples. The MSCNN-CPL-CAFF model has achieved the Acc-2, Acc-3 and F1 score of 85.01%, 74.16 and 84.84%, respectively, which outperforms the highest value of baseline methods in accuracy and achieves competitive computation speed.

Practical implications

This research offers some implications regarding the impact of COVID-19 on catering industry in China by focusing on multi-modal sentiment of CSVs. The methodology can be utilized to analyze the opinions of the general public on social media platform and to categorize them accordingly.

Originality/value

This paper presents a novel deep-learning multimodal sentiment analysis model, which provides a new perspective for public opinion research on the short video platform.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 10 January 2024

On the differences between CNNs and vision transformers for COVID-19 diagnosis using CT and chest x-ray mono- and multimodality

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

HTML

PDF (1.4 MB)

Downloads

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 8 July 2022

Energy efficient multi-tasking for edge computing using federated learning

Mukesh Soni, Nihar Ranjan Nayak, Ashima Kalra, Sheshang Degadwala, Nikhil Kumar Singh and Shweta Singh

The purpose of this paper is to improve the existing paradigm of edge computing to maintain a balanced energy usage.

HTML

PDF (1.4 MB)

Downloads

Abstract

Purpose

The purpose of this paper is to improve the existing paradigm of edge computing to maintain a balanced energy usage.

Design/methodology/approach

The new greedy algorithm is proposed to balance the energy consumption in edge computing.

Findings

The new greedy algorithm can balance energy more efficiently than the random approach by an average of 66.59 percent.

Originality/value

The results are shown in this paper which are better as compared to existing algorithms.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 16 August 2022

IoT-based multimodal liveness detection using the fusion of ECG and fingerprint

Anil Kumar Gona and Subramoniam M.

Biometric scans using fingerprints are widely used for security purposes. Eventually, for authentication purposes, fingerprint scans are not very reliable because they can be…

HTML

PDF (1.7 MB)

Downloads

Abstract

Purpose

Biometric scans using fingerprints are widely used for security purposes. Eventually, for authentication purposes, fingerprint scans are not very reliable because they can be faked by obtaining a sample of the fingerprint of the person. There are a few spoof detection techniques available to reduce the incidence of spoofing of the biometric system. Among them, the most commonly used is the binary classification technique that detects real or fake fingerprints based on the fingerprint samples provided during training. However, this technique fails when it is provided with samples formed using other spoofing techniques that are different from the spoofing techniques covered in the training samples. This paper aims to improve the liveness detection accuracy by fusing electrocardiogram (ECG) and fingerprint.

Design/methodology/approach

In this paper, to avoid this limitation, an efficient liveness detection algorithm is developed using the fusion of ECG signals captured from the fingertips and fingerprint data in Internet of Things (IoT) environment. The ECG signal will ensure the detection of real fingerprint samples from fake ones.

Findings

Single model fingerprint methods have some disadvantages, such as noisy data and position of the fingerprint. To overcome this, fusion of both ECG and fingerprint is done so that the combined data improves the detection accuracy.

Originality/value

System security is improved in this approach, and the fingerprint recognition rate is also improved. IoT-based approach is used in this work to reduce the computation burden of data processing systems.

Details

International Journal of Pervasive Computing and Communications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1742-7371

Keywords

View access options

Article

Publication date: 28 November 2023

Tourism demand forecasting: a deep learning model based on spatial-temporal transformer

Jiaying Chen, Cheng Li, Liyao Huang and Weimin Zheng

Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep…

HTML

PDF (1.1 MB)

Downloads

174

Abstract

Purpose

Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep learning model for capturing dynamic spatial effects.

Design/methodology/approach

A novel deep learning model founded on the transformer architecture, called the spatiotemporal transformer network, is presented. This model has three components: the temporal transformer, spatial transformer and spatiotemporal fusion modules. The dynamic temporal dependencies of each attraction are extracted efficiently by the temporal transformer module. The dynamic spatial correlations between attractions are extracted efficiently by the spatial transformer module. The extracted dynamic temporal and spatial features are fused in a learnable manner in the spatiotemporal fusion module. Convolutional operations are implemented to generate the final forecasts.

Findings

The results indicate that the proposed model performs better in forecasting accuracy than some popular benchmark models, demonstrating its significant forecasting performance. Incorporating dynamic spatiotemporal features is an effective strategy for improving forecasting. It can provide an important reference to related studies.

Practical implications

The proposed model leverages high-frequency data to achieve accurate predictions at the micro level by incorporating dynamic spatial effects. Destination managers should fully consider the dynamic spatial effects of attractions when planning and marketing to promote tourism resources.

Originality/value

This study incorporates dynamic spatial effects into tourism demand forecasting models by using a transformer neural network. It advances the development of methodologies in related fields.

目的

纳入动态空间效应在提高旅游需求预测的准确性方面具有相当大的潜力。本研究提出了一种捕捉动态空间效应的创新型深度学习模型。

设计/方法/途径

本研究提出了一种基于变压器架构的新型深度学习模型, 称为时空变压器网络。该模型由三个部分组成：时空转换器、空间转换器和时空融合模块。时空转换器模块可有效提取每个景点的动态时间依赖关系。空间转换器模块可有效提取景点之间的动态空间相关性。提取的动态时间和空间特征在时空融合模块中以可学习的方式进行融合。通过卷积运算生成最终预测结果。

研究结果

结果表明, 与一些流行的基准模型相比, 所提出的模型在预测准确性方面表现更好, 证明了其显著的预测性能。纳入动态时空特征是改进预测的有效策略。它可为相关研究提供重要参考。

实践意义

所提出的模型利用高频数据, 通过纳入动态空间效应, 在微观层面上实现了准确预测。旅游目的地管理者在规划和营销推广旅游资源时, 应充分考虑景点的动态空间效应。

原创性/价值

本研究通过使用变压器神经网络, 将动态空间效应纳入旅游需求预测模型。它推动了相关领域方法论的发展。

Objetivo

La incorporación de efectos espaciales dinámicos ofrece un considerable potencial para mejorar la precisión de la previsión de la demanda turística. Este estudio propone un modelo innovador de aprendizaje profundo para capturar los efectos espaciales dinámicos.

Diseño/metodología/enfoque

Se presenta un novedoso modelo de aprendizaje profundo basado en la arquitectura transformadora, denominado red de transformador espaciotemporal. Este modelo tiene tres componentes: el transformador temporal, el transformador espacial y los módulos de fusión espaciotemporal. El módulo transformador temporal extrae de manera eficiente las dependencias temporales dinámicas de cada atracción. El módulo transformador espacial extrae eficientemente las correlaciones espaciales dinámicas entre las atracciones. Las características dinámicas temporales y espaciales extraídas se fusionan de manera que se puede aprender en el módulo de fusión espaciotemporal. Se aplican operaciones convolucionales para generar las previsiones finales.

Conclusiones

Los resultados indican que el modelo propuesto obtiene mejores resultados en la precisión de las previsiones que algunos modelos de referencia conocidos, lo que demuestra su importante capacidad de previsión. La incorporación de características espaciotemporales dinámicas supone una estrategia eficaz para mejorar las previsiones. Esto puede proporcionar una referencia importante para estudios afines.

Implicaciones prácticas

El modelo propuesto aprovecha los datos de alta frecuencia para lograr predicciones precisas a nivel micro incorporando efectos espaciales dinámicos. Los gestores de destinos deberían tener plenamente en cuenta los efectos espaciales dinámicos de las atracciones en la planificación y marketing para la promoción de los recursos turísticos.

Originalidad/valor

Este estudio incorpora efectos espaciales dinámicos a los modelos de previsión de la demanda turística mediante el empleo de una red neuronal transformadora. Supone un avance en el desarrollo de metodologías en campos afines.

Details

Tourism Review, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1660-5373

Keywords

Open Access

Article

Publication date: 6 December 2022

Detecting and staging diabetic retinopathy in retinal images using multi-branch CNN

Worapan Kusakunniran, Sarattha Karnjanapreechakorn, Pitipol Choopong, Thanongchai Siriapisith, Nattaporn Tesavibul, Nopasak Phasukkijwatana, Supalert Prakhunhungsit and Sutasinee Boonsopon

This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could…

HTML

PDF (1.5 MB)

Downloads

1244

Abstract

Purpose

This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could classify input retinal images into a normal class or an abnormal class, which would be further split into four stages of abnormalities automatically.

Design/methodology/approach

The proposed solution is developed based on a newly proposed CNN architecture, namely, DeepRoot. It consists of one main branch, which is connected by two side branches. The main branch is responsible for the primary feature extractor of both high-level and low-level features of retinal images. Then, the side branches further extract more complex and detailed features from the features outputted from the main branch. They are designed to capture details of small traces of DR in retinal images, using modified zoom-in/zoom-out and attention layers.

Findings

The proposed method is trained, validated and tested on the Kaggle dataset. The regularization of the trained model is evaluated using unseen data samples, which were self-collected from a real scenario from a hospital. It achieves a promising performance with a sensitivity of 98.18% under the two classes scenario.

Originality/value

The new CNN-based architecture (i.e. DeepRoot) is introduced with the concept of a multi-branch network. It could assist in solving a problem of an unbalanced dataset, especially when there are common characteristics across different classes (i.e. four stages of DR). Different classes could be outputted at different depths of the network.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 18 March 2022

Neural networks for anatomical therapeutic chemical (ATC) classification

Loris Nanni, Alessandra Lumini and Sheryl Brahnam

Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's…

HTML

PDF (706 KB)

Downloads

739

Abstract

Purpose

Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's therapeutic and chemical characteristics in terms of how it affects multiple organs and physiological systems makes automatic ATC classification a vital yet challenging multilabel problem. The aim of this paper is to experimentally derive an ensemble of different feature descriptors and classifiers for ATC classification that outperforms the state-of-the-art.

Design/methodology/approach

The proposed method is an ensemble generated by the fusion of neural networks (i.e. a tabular model and long short-term memory networks (LSTM)) and multilabel classifiers based on multiple linear regression (hMuLab). All classifiers are trained on three sets of descriptors. Features extracted from the trained LSTMs are also fed into hMuLab. Evaluations of ensembles are compared on a benchmark data set of 3883 ATC-coded pharmaceuticals taken from KEGG, a publicly available drug databank.

Findings

Experiments demonstrate the power of the authors’ best ensemble, EnsATC, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the fast.ai research group. The MATLAB source code of the authors’ system is freely available to the public at https://github.com/LorisNanni/Neural-networks-for-anatomical-therapeutic-chemical-ATC-classification.

Originality/value

This study demonstrates the power of extracting LSTM features and combining them with ATC descriptors in ensembles for ATC classification.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 31 October 2023

Intelligent inspection of appearance quality for precast concrete components based on improved YOLO model and multi-source data

Yangze Liang and Zhao Xu

Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components…

HTML

PDF (5.5 MB)

Downloads

183

Abstract

Purpose

Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components during the construction phase is predominantly done manually, resulting in low efficiency and hindering the progress of intelligent construction. This paper presents an intelligent inspection method for assessing the appearance quality of PC components, utilizing an enhanced you look only once (YOLO) model and multi-source data. The aim of this research is to achieve automated management of the appearance quality of precast components in the prefabricated construction process through digital means.

Design/methodology/approach

The paper begins by establishing an improved YOLO model and an image dataset for evaluating appearance quality. Through object detection in the images, a preliminary and efficient assessment of the precast components' appearance quality is achieved. Moreover, the detection results are mapped onto the point cloud for high-precision quality inspection. In the case of precast components with quality defects, precise quality inspection is conducted by combining the three-dimensional model data obtained from forward design conversion with the captured point cloud data through registration. Additionally, the paper proposes a framework for an automated inspection platform dedicated to assessing appearance quality in prefabricated buildings, encompassing the platform's hardware network.

Findings

The improved YOLO model achieved a best mean average precision of 85.02% on the VOC2007 dataset, surpassing the performance of most similar models. After targeted training, the model exhibits excellent recognition capabilities for the four common appearance quality defects. When mapped onto the point cloud, the accuracy of quality inspection based on point cloud data and forward design is within 0.1 mm. The appearance quality inspection platform enables feedback and optimization of quality issues.

Originality/value

The proposed method in this study enables high-precision, visualized and automated detection of the appearance quality of PC components. It effectively meets the demand for quality inspection of precast components on construction sites of prefabricated buildings, providing technological support for the development of intelligent construction. The design of the appearance quality inspection platform's logic and framework facilitates the integration of the method, laying the foundation for efficient quality management in the future.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 12 October 2023

Fall-portent detection for construction sites based on computer vision and machine learning

Xiaoyu Liu, Feng Xu, Zhipeng Zhang and Kaiyu Sun

Fall accidents can cause casualties and economic losses in the construction industry. Fall portents, such as loss of balance (LOB) and sudden sways, can result in fatal, nonfatal…

HTML

PDF (4.1 MB)

Downloads

185

Abstract

Purpose

Fall accidents can cause casualties and economic losses in the construction industry. Fall portents, such as loss of balance (LOB) and sudden sways, can result in fatal, nonfatal or attempted fall accidents. All of them are worthy of studying to take measures to prevent future accidents. Detecting fall portents can proactively and comprehensively help managers assess the risk to workers as well as in the construction environment and further prevent fall accidents.

Design/methodology/approach

This study focused on the postures of workers and aimed to directly detect fall portents using a computer vision (CV)-based noncontact approach. Firstly, a joint coordinate matrix generated from a three-dimensional pose estimation model is employed, and then the matrix is preprocessed by principal component analysis, K-means and pre-experiments. Finally, a modified fusion K-nearest neighbor-based machine learning model is built to fuse information from the x, y and z axes and output the worker's pose status into three stages.

Findings

The proposed model can output the worker's pose status into three stages (steady–unsteady–fallen) and provide corresponding confidence probabilities for each category. Experiments conducted to evaluate the approach show that the model accuracy reaches 85.02% with threshold-based postprocessing. The proposed fall-portent detection approach can extract the fall risk of workers in the both pre- and post-event phases based on noncontact approach.

Research limitations/implications

First, three-dimensional (3D) pose estimation needs sufficient information, which means it may not perform well when applied in complicated environments or when the shooting distance is extremely large. Second, solely focusing on fall-related factors may not be comprehensive enough. Future studies can incorporate the results of this research as an indicator into the risk assessment system to achieve a more comprehensive and accurate evaluation of worker and site risk.

Practical implications

The proposed machine learning model determines whether the worker is in a status of steady, unsteady or fallen using a CV-based approach. From the perspective of construction management, when detecting fall-related actions on construction sites, the noncontact approach based on CV has irreplaceable advantages of no interruption to workers and low cost. It can make use of the surveillance cameras on construction sites to recognize both preceding events and happened accidents. The detection of fall portents can help worker risk assessment and safety management.

Originality/value

Existing studies using sensor-based approaches are high-cost and invasive for construction workers, and others using CV-based approaches either oversimplify by binary classification of the non-entire fall process or indirectly achieve fall-portent detection. Instead, this study aims to detect fall portents directly by worker's posture and divide the entire fall process into three stages using a CV-based noncontact approach. It can help managers carry out more comprehensive risk assessment and develop preventive measures.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

Access

Year

Content type

Earlycite article (308)

1 – 10 of 308