Search results

1 – 10 of 392
Article
Publication date: 10 January 2024

Sara El-Ateif, Ali Idri and José Luis Fernández-Alemán

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT…

Abstract

Purpose

COVID-19 continues to spread, and cause increasing deaths. Physicians diagnose COVID-19 using not only real-time polymerase chain reaction but also the computed tomography (CT) and chest x-ray (CXR) modalities, depending on the stage of infection. However, with so many patients and so few doctors, it has become difficult to keep abreast of the disease. Deep learning models have been developed in order to assist in this respect, and vision transformers are currently state-of-the-art methods, but most techniques currently focus only on one modality (CXR).

Design/methodology/approach

This work aims to leverage the benefits of both CT and CXR to improve COVID-19 diagnosis. This paper studies the differences between using convolutional MobileNetV2, ViT DeiT and Swin Transformer models when training from scratch and pretraining on the MedNIST medical dataset rather than the ImageNet dataset of natural images. The comparison is made by reporting six performance metrics, the Scott–Knott Effect Size Difference, Wilcoxon statistical test and the Borda Count method. We also use the Grad-CAM algorithm to study the model's interpretability. Finally, the model's robustness is tested by evaluating it on Gaussian noised images.

Findings

Although pretrained MobileNetV2 was the best model in terms of performance, the best model in terms of performance, interpretability, and robustness to noise is the trained from scratch Swin Transformer using the CXR (accuracy = 93.21 per cent) and CT (accuracy = 94.14 per cent) modalities.

Originality/value

Models compared are pretrained on MedNIST and leverage both the CT and CXR modalities.

Details

Data Technologies and Applications, vol. 58 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 15 July 2021

Chanattra Ammatmanee and Lu Gan

Because of the fast-growing digital image collections on online platforms and the transfer learning ability of deep learning technology, image classification could be improved and…

Abstract

Purpose

Because of the fast-growing digital image collections on online platforms and the transfer learning ability of deep learning technology, image classification could be improved and implemented for the hostel domain, which has complex clusters of image contents. This paper aims to test the potential of 11 pretrained convolutional neural network (CNN) with transfer learning for hostel image classification on the first hostel image database to advance the knowledge and fill the gap academically, as well as to suggest an alternative solution in optimal image classification with less labour cost and human errors to those who manage hostel image collections.

Design/methodology/approach

The hostel image database is first created with data pre-processing steps, data selection and data augmentation. Then, the systematic and comprehensive investigation is divided into seven experiments to test 11 pretrained CNNs which transfer learning was applied and parameters were fine-tuned to match this newly created hostel image dataset. All experiments were conducted in Google Colaboratory environment using PyTorch.

Findings

The 7,350 hostel image database is created and labelled into seven classes. Furthermore, its experiment results highlight that DenseNet 121 and DenseNet 201 have the greatest potential for hostel image classification as they outperform other CNNs in terms of accuracy and training time.

Originality/value

The fact that there is no existing academic work dedicating to test pretrained CNNs with transfer learning for hostel image classification and no existing hostel image-only database have made this paper a novel contribution.

Details

Data Technologies and Applications, vol. 56 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 6 November 2020

Wenjuan Shen and Xiaoling Li

recent years, facial expression recognition has been widely used in human machine interaction, clinical medicine and safe driving. However, there is a limitation that conventional…

Abstract

Purpose

recent years, facial expression recognition has been widely used in human machine interaction, clinical medicine and safe driving. However, there is a limitation that conventional recurrent neural networks can only learn the time-series characteristics of expressions based on one-way propagation information.

Design/methodology/approach

To solve such limitation, this paper proposes a novel model based on bidirectional gated recurrent unit networks (Bi-GRUs) with two-way propagations, and the theory of identity mapping residuals is adopted to effectively prevent the problem of gradient disappearance caused by the depth of the introduced network. Since the Inception-V3 network model for spatial feature extraction has too many parameters, it is prone to overfitting during training. This paper proposes a novel facial expression recognition model to add two reduction modules to reduce parameters, so as to obtain an Inception-W network with better generalization.

Findings

Finally, the proposed model is pretrained to determine the best settings and selections. Then, the pretrained model is experimented on two facial expression data sets of CK+ and Oulu- CASIA, and the recognition performance and efficiency are compared with the existing methods. The highest recognition rate is 99.6%, which shows that the method has good recognition accuracy in a certain range.

Originality/value

By using the proposed model for the applications of facial expression, the high recognition accuracy and robust recognition results with lower time consumption will help to build more sophisticated applications in real world.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 12 April 2024

Youwei Li and Jian Qu

The purpose of this research is to achieve multi-task autonomous driving by adjusting the network architecture of the model. Meanwhile, after achieving multi-task autonomous…

Abstract

Purpose

The purpose of this research is to achieve multi-task autonomous driving by adjusting the network architecture of the model. Meanwhile, after achieving multi-task autonomous driving, the authors found that the trained neural network model performs poorly in untrained scenarios. Therefore, the authors proposed to improve the transfer efficiency of the model for new scenarios through transfer learning.

Design/methodology/approach

First, the authors achieved multi-task autonomous driving by training a model combining convolutional neural network and different structured long short-term memory (LSTM) layers. Second, the authors achieved fast transfer of neural network models in new scenarios by cross-model transfer learning. Finally, the authors combined data collection and data labeling to improve the efficiency of deep learning. Furthermore, the authors verified that the model has good robustness through light and shadow test.

Findings

This research achieved road tracking, real-time acceleration–deceleration, obstacle avoidance and left/right sign recognition. The model proposed by the authors (UniBiCLSTM) outperforms the existing models tested with model cars in terms of autonomous driving performance. Furthermore, the CMTL-UniBiCL-RL model trained by the authors through cross-model transfer learning improves the efficiency of model adaptation to new scenarios. Meanwhile, this research proposed an automatic data annotation method, which can save 1/4 of the time for deep learning.

Originality/value

This research provided novel solutions in the achievement of multi-task autonomous driving and neural network model scenario for transfer learning. The experiment was achieved on a single camera with an embedded chip and a scale model car, which is expected to simplify the hardware for autonomous driving.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 28 March 2023

Antonijo Marijić and Marina Bagić Babac

Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions…

Abstract

Purpose

Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage.

Design/methodology/approach

The main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT).

Findings

The authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer.

Originality/value

This study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

Book part
Publication date: 13 March 2023

Jochen Hartmann and Oded Netzer

The increasing importance and proliferation of text data provide a unique opportunity and novel lens to study human communication across a myriad of business and marketing…

Abstract

The increasing importance and proliferation of text data provide a unique opportunity and novel lens to study human communication across a myriad of business and marketing applications. For example, consumers compare and review products online, individuals interact with their voice assistants to search, shop, and express their needs, investors seek to extract signals from firms' press releases to improve their investment decisions, and firms analyze sales call transcripts to increase customer satisfaction and conversions. However, extracting meaningful information from unstructured text data is a nontrivial task. In this chapter, we review established natural language processing (NLP) methods for traditional tasks (e.g., LDA for topic modeling and lexicons for sentiment analysis and writing style extraction) and provide an outlook into the future of NLP in marketing, covering recent embedding-based approaches, pretrained language models, and transfer learning for novel tasks such as automated text generation and multi-modal representation learning. These emerging approaches allow the field to improve its ability to perform certain tasks that we have been using for more than a decade (e.g., text classification). But more importantly, they unlock entirely new types of tasks that bring about novel research opportunities (e.g., text summarization, and generative question answering). We conclude with a roadmap and research agenda for promising NLP applications in marketing and provide supplementary code examples to help interested scholars to explore opportunities related to NLP in marketing.

Article
Publication date: 25 January 2024

Siming Cao, Hongfeng Wang, Yingjie Guo, Weidong Zhu and Yinglin Ke

In a dual-robot system, the relative position error is a superposition of errors from each mono-robot, resulting in deteriorated coordination accuracy. This study aims to enhance…

Abstract

Purpose

In a dual-robot system, the relative position error is a superposition of errors from each mono-robot, resulting in deteriorated coordination accuracy. This study aims to enhance relative accuracy of the dual-robot system through direct compensation of relative errors. To achieve this, a novel calibration-driven transfer learning method is proposed for relative error prediction in dual-robot systems.

Design/methodology/approach

A novel local product of exponential (POE) model with minimal parameters is proposed for error modeling. And a two-step method is presented to identify both geometric and nongeometric parameters for the mono-robots. Using the identified parameters, two calibrated models are established and combined as one dual-robot model, generating error data between the nominal and calibrated models’ outputs. Subsequently, the calibration-driven transfer, involving pretraining a neural network with sufficient generated error data and fine-tuning with a small measured data set, is introduced, enabling knowledge transfer and thereby obtaining a high-precision relative error predictor.

Findings

Experimental validation is conducted, and the results demonstrate that the proposed method has reduced the maximum and average relative errors by 45.1% and 30.6% compared with the calibrated model, yielding the values of 0.594 mm and 0.255 mm, respectively.

Originality/value

First, the proposed calibration-driven transfer method innovatively adopts the calibrated model as a data generator to address the issue of real data scarcity. It achieves high-accuracy relative error prediction with only a small measured data set, significantly enhancing error compensation efficiency. Second, the proposed local POE model achieves model minimality without the need for complex redundant parameter partitioning operations, ensuring stability and robustness in parameter identification.

Details

Industrial Robot: the international journal of robotics research and application, vol. 51 no. 2
Type: Research Article
ISSN: 0143-991X

Keywords

Article
Publication date: 8 September 2023

Oussama Ayoub, Christophe Rodrigues and Nicolas Travers

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…

Abstract

Purpose

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.

Design/methodology/approach

To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.

Findings

The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.

Originality/value

In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.

Details

International Journal of Web Information Systems, vol. 19 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 5 May 2023

Ying Yu and Jing Ma

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee…

Abstract

Purpose

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.

Design/methodology/approach

To tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.

Findings

The proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.

Originality/value

(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.

Details

Data Technologies and Applications, vol. 58 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 16 December 2022

Kinjal Bhargavkumar Mistree, Devendra Thakor and Brijesh Bhatt

According to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper…

Abstract

Purpose

According to the Indian Sign Language Research and Training Centre (ISLRTC), India has approximately 300 certified human interpreters to help people with hearing loss. This paper aims to address the issue of Indian Sign Language (ISL) sentence recognition and translation into semantically equivalent English text in a signer-independent mode.

Design/methodology/approach

This study presents an approach that translates ISL sentences into English text using the MobileNetV2 model and Neural Machine Translation (NMT). The authors have created an ISL corpus from the Brown corpus using ISL grammar rules to perform machine translation. The authors’ approach converts ISL videos of the newly created dataset into ISL gloss sequences using the MobileNetV2 model and the recognized ISL gloss sequence is then fed to a machine translation module that generates an English sentence for each ISL sentence.

Findings

As per the experimental results, pretrained MobileNetV2 model was proven the best-suited model for the recognition of ISL sentences and NMT provided better results than Statistical Machine Translation (SMT) to convert ISL text into English text. The automatic and human evaluation of the proposed approach yielded accuracies of 83.3 and 86.1%, respectively.

Research limitations/implications

It can be seen that the neural machine translation systems produced translations with repetitions of other translated words, strange translations when the total number of words per sentence is increased and one or more unexpected terms that had no relation to the source text on occasion. The most common type of error is the mistranslation of places, numbers and dates. Although this has little effect on the overall structure of the translated sentence, it indicates that the embedding learned for these few words could be improved.

Originality/value

Sign language recognition and translation is a crucial step toward improving communication between the deaf and the rest of society. Because of the shortage of human interpreters, an alternative approach is desired to help people achieve smooth communication with the Deaf. To motivate research in this field, the authors generated an ISL corpus of 13,720 sentences and a video dataset of 47,880 ISL videos. As there is no public dataset available for ISl videos incorporating signs released by ISLRTC, the authors created a new video dataset and ISL corpus.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

1 – 10 of 392