Search results
1 – 10 of 308Meng Zhu and Xiaolong Xu
Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is…
Abstract
Purpose
Intent detection (ID) and slot filling (SF) are two important tasks in natural language understanding. ID is to identify the main intent of a paragraph of text. The goal of SF is to extract the information that is important to the intent from the input sentence. However, most of the existing methods use sentence-level intention recognition, which has the risk of error propagation, and the relationship between intention recognition and SF is not explicitly modeled. Aiming at this problem, this paper proposes a collaborative model of ID and SF for intelligent spoken language understanding called ID-SF-Fusion.
Design/methodology/approach
ID-SF-Fusion uses Bidirectional Encoder Representation from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) to extract effective word embedding and context vectors containing the whole sentence information respectively. Fusion layer is used to provide intent–slot fusion information for SF task. In this way, the relationship between ID and SF task is fully explicitly modeled. This layer takes the result of ID and slot context vectors as input to obtain the fusion information which contains both ID result and slot information. Meanwhile, to further reduce error propagation, we use word-level ID for the ID-SF-Fusion model. Finally, two tasks of ID and SF are realized by joint optimization training.
Findings
We conducted experiments on two public datasets, Airline Travel Information Systems (ATIS) and Snips. The results show that the Intent ACC score and Slot F1 score of ID-SF-Fusion on ATIS and Snips are 98.0 per cent and 95.8 per cent, respectively, and the two indicators on Snips dataset are 98.6 per cent and 96.7 per cent, respectively. These models are superior to slot-gated, SF-ID NetWork, stack-Prop and other models. In addition, ablation experiments were performed to further analyze and discuss the proposed model.
Originality/value
This paper uses word-level intent recognition and introduces intent information into the SF process, which is a significant improvement on both data sets.
Details
Keywords
Wei Shi, Jing Zhang and Shaoyi He
With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as…
Abstract
Purpose
With the rapid development of short videos in China, the public has become accustomed to using short videos to express their opinions. This paper aims to solve problems such as how to represent the features of different modalities and achieve effective cross-modal feature fusion when analyzing the multi-modal sentiment of Chinese short videos (CSVs).
Design/methodology/approach
This paper aims to propose a sentiment analysis model MSCNN-CPL-CAFF using multi-scale convolutional neural network and cross attention fusion mechanism to analyze the CSVs. The audio-visual and textual data of CSVs themed on “COVID-19, catering industry” are collected from CSV platform Douyin first, and then a comparative analysis is conducted with advanced baseline models.
Findings
The sample number of the weak negative and neutral sentiment is the largest, and the sample number of the positive and weak positive sentiment is relatively small, accounting for only about 11% of the total samples. The MSCNN-CPL-CAFF model has achieved the Acc-2, Acc-3 and F1 score of 85.01%, 74.16 and 84.84%, respectively, which outperforms the highest value of baseline methods in accuracy and achieves competitive computation speed.
Practical implications
This research offers some implications regarding the impact of COVID-19 on catering industry in China by focusing on multi-modal sentiment of CSVs. The methodology can be utilized to analyze the opinions of the general public on social media platform and to categorize them accordingly.
Originality/value
This paper presents a novel deep-learning multimodal sentiment analysis model, which provides a new perspective for public opinion research on the short video platform.
Details
Keywords
Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán
COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…
Abstract
Purpose
COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).
Design/methodology/approach
This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.
Findings
Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.
Originality/value
Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.
Details
Keywords
Mukesh Soni, Nihar Ranjan Nayak, Ashima Kalra, Sheshang Degadwala, Nikhil Kumar Singh and Shweta Singh
The purpose of this paper is to improve the existing paradigm of edge computing to maintain a balanced energy usage.
Abstract
Purpose
The purpose of this paper is to improve the existing paradigm of edge computing to maintain a balanced energy usage.
Design/methodology/approach
The new greedy algorithm is proposed to balance the energy consumption in edge computing.
Findings
The new greedy algorithm can balance energy more efficiently than the random approach by an average of 66.59 percent.
Originality/value
The results are shown in this paper which are better as compared to existing algorithms.
Details
Keywords
Anil Kumar Gona and Subramoniam M.
Biometric scans using fingerprints are widely used for security purposes. Eventually, for authentication purposes, fingerprint scans are not very reliable because they can be…
Abstract
Purpose
Biometric scans using fingerprints are widely used for security purposes. Eventually, for authentication purposes, fingerprint scans are not very reliable because they can be faked by obtaining a sample of the fingerprint of the person. There are a few spoof detection techniques available to reduce the incidence of spoofing of the biometric system. Among them, the most commonly used is the binary classification technique that detects real or fake fingerprints based on the fingerprint samples provided during training. However, this technique fails when it is provided with samples formed using other spoofing techniques that are different from the spoofing techniques covered in the training samples. This paper aims to improve the liveness detection accuracy by fusing electrocardiogram (ECG) and fingerprint.
Design/methodology/approach
In this paper, to avoid this limitation, an efficient liveness detection algorithm is developed using the fusion of ECG signals captured from the fingertips and fingerprint data in Internet of Things (IoT) environment. The ECG signal will ensure the detection of real fingerprint samples from fake ones.
Findings
Single model fingerprint methods have some disadvantages, such as noisy data and position of the fingerprint. To overcome this, fusion of both ECG and fingerprint is done so that the combined data improves the detection accuracy.
Originality/value
System security is improved in this approach, and the fingerprint recognition rate is also improved. IoT-based approach is used in this work to reduce the computation burden of data processing systems.
Details
Keywords
Jiaying Chen, Cheng Li, Liyao Huang and Weimin Zheng
Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep…
Abstract
Purpose
Incorporating dynamic spatial effects exhibits considerable potential in improving the accuracy of forecasting tourism demands. This study aims to propose an innovative deep learning model for capturing dynamic spatial effects.
Design/methodology/approach
A novel deep learning model founded on the transformer architecture, called the spatiotemporal transformer network, is presented. This model has three components: the temporal transformer, spatial transformer and spatiotemporal fusion modules. The dynamic temporal dependencies of each attraction are extracted efficiently by the temporal transformer module. The dynamic spatial correlations between attractions are extracted efficiently by the spatial transformer module. The extracted dynamic temporal and spatial features are fused in a learnable manner in the spatiotemporal fusion module. Convolutional operations are implemented to generate the final forecasts.
Findings
The results indicate that the proposed model performs better in forecasting accuracy than some popular benchmark models, demonstrating its significant forecasting performance. Incorporating dynamic spatiotemporal features is an effective strategy for improving forecasting. It can provide an important reference to related studies.
Practical implications
The proposed model leverages high-frequency data to achieve accurate predictions at the micro level by incorporating dynamic spatial effects. Destination managers should fully consider the dynamic spatial effects of attractions when planning and marketing to promote tourism resources.
Originality/value
This study incorporates dynamic spatial effects into tourism demand forecasting models by using a transformer neural network. It advances the development of methodologies in related fields.
目的
纳入动态空间效应在提高旅游需求预测的准确性方面具有相当大的潜力。本研究提出了一种捕捉动态空间效应的创新型深度学习模型。
设计/方法/途径
本研究提出了一种基于变压器架构的新型深度学习模型, 称为时空变压器网络。该模型由三个部分组成:时空转换器、空间转换器和时空融合模块。时空转换器模块可有效提取每个景点的动态时间依赖关系。空间转换器模块可有效提取景点之间的动态空间相关性。提取的动态时间和空间特征在时空融合模块中以可学习的方式进行融合。通过卷积运算生成最终预测结果。
研究结果
结果表明, 与一些流行的基准模型相比, 所提出的模型在预测准确性方面表现更好, 证明了其显著的预测性能。纳入动态时空特征是改进预测的有效策略。它可为相关研究提供重要参考。
实践意义
所提出的模型利用高频数据, 通过纳入动态空间效应, 在微观层面上实现了准确预测。旅游目的地管理者在规划和营销推广旅游资源时, 应充分考虑景点的动态空间效应。
原创性/价值
本研究通过使用变压器神经网络, 将动态空间效应纳入旅游需求预测模型。它推动了相关领域方法论的发展。
Objetivo
La incorporación de efectos espaciales dinámicos ofrece un considerable potencial para mejorar la precisión de la previsión de la demanda turística. Este estudio propone un modelo innovador de aprendizaje profundo para capturar los efectos espaciales dinámicos.
Diseño/metodología/enfoque
Se presenta un novedoso modelo de aprendizaje profundo basado en la arquitectura transformadora, denominado red de transformador espaciotemporal. Este modelo tiene tres componentes: el transformador temporal, el transformador espacial y los módulos de fusión espaciotemporal. El módulo transformador temporal extrae de manera eficiente las dependencias temporales dinámicas de cada atracción. El módulo transformador espacial extrae eficientemente las correlaciones espaciales dinámicas entre las atracciones. Las características dinámicas temporales y espaciales extraídas se fusionan de manera que se puede aprender en el módulo de fusión espaciotemporal. Se aplican operaciones convolucionales para generar las previsiones finales.
Conclusiones
Los resultados indican que el modelo propuesto obtiene mejores resultados en la precisión de las previsiones que algunos modelos de referencia conocidos, lo que demuestra su importante capacidad de previsión. La incorporación de características espaciotemporales dinámicas supone una estrategia eficaz para mejorar las previsiones. Esto puede proporcionar una referencia importante para estudios afines.
Implicaciones prácticas
El modelo propuesto aprovecha los datos de alta frecuencia para lograr predicciones precisas a nivel micro incorporando efectos espaciales dinámicos. Los gestores de destinos deberían tener plenamente en cuenta los efectos espaciales dinámicos de las atracciones en la planificación y marketing para la promoción de los recursos turísticos.
Originalidad/valor
Este estudio incorpora efectos espaciales dinámicos a los modelos de previsión de la demanda turística mediante el empleo de una red neuronal transformadora. Supone un avance en el desarrollo de metodologías en campos afines.
Details
Keywords
Worapan Kusakunniran, Sarattha Karnjanapreechakorn, Pitipol Choopong, Thanongchai Siriapisith, Nattaporn Tesavibul, Nopasak Phasukkijwatana, Supalert Prakhunhungsit and Sutasinee Boonsopon
This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could…
Abstract
Purpose
This paper aims to propose a solution for detecting and grading diabetic retinopathy (DR) in retinal images using a convolutional neural network (CNN)-based approach. It could classify input retinal images into a normal class or an abnormal class, which would be further split into four stages of abnormalities automatically.
Design/methodology/approach
The proposed solution is developed based on a newly proposed CNN architecture, namely, DeepRoot. It consists of one main branch, which is connected by two side branches. The main branch is responsible for the primary feature extractor of both high-level and low-level features of retinal images. Then, the side branches further extract more complex and detailed features from the features outputted from the main branch. They are designed to capture details of small traces of DR in retinal images, using modified zoom-in/zoom-out and attention layers.
Findings
The proposed method is trained, validated and tested on the Kaggle dataset. The regularization of the trained model is evaluated using unseen data samples, which were self-collected from a real scenario from a hospital. It achieves a promising performance with a sensitivity of 98.18% under the two classes scenario.
Originality/value
The new CNN-based architecture (i.e. DeepRoot) is introduced with the concept of a multi-branch network. It could assist in solving a problem of an unbalanced dataset, especially when there are common characteristics across different classes (i.e. four stages of DR). Different classes could be outputted at different depths of the network.
Details
Keywords
Loris Nanni, Alessandra Lumini and Sheryl Brahnam
Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's…
Abstract
Purpose
Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's therapeutic and chemical characteristics in terms of how it affects multiple organs and physiological systems makes automatic ATC classification a vital yet challenging multilabel problem. The aim of this paper is to experimentally derive an ensemble of different feature descriptors and classifiers for ATC classification that outperforms the state-of-the-art.
Design/methodology/approach
The proposed method is an ensemble generated by the fusion of neural networks (i.e. a tabular model and long short-term memory networks (LSTM)) and multilabel classifiers based on multiple linear regression (hMuLab). All classifiers are trained on three sets of descriptors. Features extracted from the trained LSTMs are also fed into hMuLab. Evaluations of ensembles are compared on a benchmark data set of 3883 ATC-coded pharmaceuticals taken from KEGG, a publicly available drug databank.
Findings
Experiments demonstrate the power of the authors’ best ensemble, EnsATC, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the fast.ai research group. The MATLAB source code of the authors’ system is freely available to the public at https://github.com/LorisNanni/Neural-networks-for-anatomical-therapeutic-chemical-ATC-classification.
Originality/value
This study demonstrates the power of extracting LSTM features and combining them with ATC descriptors in ensembles for ATC classification.
Details
Keywords
Yangze Liang and Zhao Xu
Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components…
Abstract
Purpose
Monitoring of the quality of precast concrete (PC) components is crucial for the success of prefabricated construction projects. Currently, quality monitoring of PC components during the construction phase is predominantly done manually, resulting in low efficiency and hindering the progress of intelligent construction. This paper presents an intelligent inspection method for assessing the appearance quality of PC components, utilizing an enhanced you look only once (YOLO) model and multi-source data. The aim of this research is to achieve automated management of the appearance quality of precast components in the prefabricated construction process through digital means.
Design/methodology/approach
The paper begins by establishing an improved YOLO model and an image dataset for evaluating appearance quality. Through object detection in the images, a preliminary and efficient assessment of the precast components' appearance quality is achieved. Moreover, the detection results are mapped onto the point cloud for high-precision quality inspection. In the case of precast components with quality defects, precise quality inspection is conducted by combining the three-dimensional model data obtained from forward design conversion with the captured point cloud data through registration. Additionally, the paper proposes a framework for an automated inspection platform dedicated to assessing appearance quality in prefabricated buildings, encompassing the platform's hardware network.
Findings
The improved YOLO model achieved a best mean average precision of 85.02% on the VOC2007 dataset, surpassing the performance of most similar models. After targeted training, the model exhibits excellent recognition capabilities for the four common appearance quality defects. When mapped onto the point cloud, the accuracy of quality inspection based on point cloud data and forward design is within 0.1 mm. The appearance quality inspection platform enables feedback and optimization of quality issues.
Originality/value
The proposed method in this study enables high-precision, visualized and automated detection of the appearance quality of PC components. It effectively meets the demand for quality inspection of precast components on construction sites of prefabricated buildings, providing technological support for the development of intelligent construction. The design of the appearance quality inspection platform's logic and framework facilitates the integration of the method, laying the foundation for efficient quality management in the future.
Details
Keywords
Xiaoyu Liu, Feng Xu, Zhipeng Zhang and Kaiyu Sun
Fall accidents can cause casualties and economic losses in the construction industry. Fall portents, such as loss of balance (LOB) and sudden sways, can result in fatal, nonfatal…
Abstract
Purpose
Fall accidents can cause casualties and economic losses in the construction industry. Fall portents, such as loss of balance (LOB) and sudden sways, can result in fatal, nonfatal or attempted fall accidents. All of them are worthy of studying to take measures to prevent future accidents. Detecting fall portents can proactively and comprehensively help managers assess the risk to workers as well as in the construction environment and further prevent fall accidents.
Design/methodology/approach
This study focused on the postures of workers and aimed to directly detect fall portents using a computer vision (CV)-based noncontact approach. Firstly, a joint coordinate matrix generated from a three-dimensional pose estimation model is employed, and then the matrix is preprocessed by principal component analysis, K-means and pre-experiments. Finally, a modified fusion K-nearest neighbor-based machine learning model is built to fuse information from the x, y and z axes and output the worker's pose status into three stages.
Findings
The proposed model can output the worker's pose status into three stages (steady–unsteady–fallen) and provide corresponding confidence probabilities for each category. Experiments conducted to evaluate the approach show that the model accuracy reaches 85.02% with threshold-based postprocessing. The proposed fall-portent detection approach can extract the fall risk of workers in the both pre- and post-event phases based on noncontact approach.
Research limitations/implications
First, three-dimensional (3D) pose estimation needs sufficient information, which means it may not perform well when applied in complicated environments or when the shooting distance is extremely large. Second, solely focusing on fall-related factors may not be comprehensive enough. Future studies can incorporate the results of this research as an indicator into the risk assessment system to achieve a more comprehensive and accurate evaluation of worker and site risk.
Practical implications
The proposed machine learning model determines whether the worker is in a status of steady, unsteady or fallen using a CV-based approach. From the perspective of construction management, when detecting fall-related actions on construction sites, the noncontact approach based on CV has irreplaceable advantages of no interruption to workers and low cost. It can make use of the surveillance cameras on construction sites to recognize both preceding events and happened accidents. The detection of fall portents can help worker risk assessment and safety management.
Originality/value
Existing studies using sensor-based approaches are high-cost and invasive for construction workers, and others using CV-based approaches either oversimplify by binary classification of the non-entire fall process or indirectly achieve fall-portent detection. Instead, this study aims to detect fall portents directly by worker's posture and divide the entire fall process into three stages using a CV-based noncontact approach. It can help managers carry out more comprehensive risk assessment and develop preventive measures.
Details