Search results

1 – 10 of 33
Open Access
Article
Publication date: 18 March 2022

Loris Nanni, Alessandra Lumini and Sheryl Brahnam

Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's…

Abstract

Purpose

Automatic anatomical therapeutic chemical (ATC) classification is progressing at a rapid pace because of its potential in drug development. Predicting an unknown compound's therapeutic and chemical characteristics in terms of how it affects multiple organs and physiological systems makes automatic ATC classification a vital yet challenging multilabel problem. The aim of this paper is to experimentally derive an ensemble of different feature descriptors and classifiers for ATC classification that outperforms the state-of-the-art.

Design/methodology/approach

The proposed method is an ensemble generated by the fusion of neural networks (i.e. a tabular model and long short-term memory networks (LSTM)) and multilabel classifiers based on multiple linear regression (hMuLab). All classifiers are trained on three sets of descriptors. Features extracted from the trained LSTMs are also fed into hMuLab. Evaluations of ensembles are compared on a benchmark data set of 3883 ATC-coded pharmaceuticals taken from KEGG, a publicly available drug databank.

Findings

Experiments demonstrate the power of the authors’ best ensemble, EnsATC, which is shown to outperform the best methods reported in the literature, including the state-of-the-art developed by the fast.ai research group. The MATLAB source code of the authors’ system is freely available to the public at https://github.com/LorisNanni/Neural-networks-for-anatomical-therapeutic-chemical-ATC-classification.

Originality/value

This study demonstrates the power of extracting LSTM features and combining them with ATC descriptors in ensembles for ATC classification.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Abstract

Details

Big Data Analytics for the Prediction of Tourist Preferences Worldwide
Type: Book
ISBN: 978-1-83549-339-7

Article
Publication date: 22 November 2019

Shuo Xu and Xin An

Image classification is becoming a supporting technology in several image-processing tasks. Due to rich semantic information contained in the images, it is very popular for an…

Abstract

Purpose

Image classification is becoming a supporting technology in several image-processing tasks. Due to rich semantic information contained in the images, it is very popular for an image to have several labels or tags. This paper aims to develop a novel multi-label classification approach with superior performance.

Design/methodology/approach

Many multi-label classification problems share two main characteristics: label correlations and label imbalance. However, most of current methods are devoted to either model label relationship or to only deal with unbalanced problem with traditional single-label methods. In this paper, multi-label classification problem is regarded as an unbalanced multi-task learning problem. Multi-task least-squares support vector machine (MTLS-SVM) is generalized for this problem, renamed as multi-label LS-SVM (ML2S-SVM).

Findings

Experimental results on the emotions, scene, yeast and bibtex data sets indicate that the ML2S-SVM is competitive with respect to the state-of-the-art methods in terms of Hamming loss and instance-based F1 score. The values of resulting parameters largely influence the performance of ML2S-SVM, so it is necessary for users to identify proper parameters in advance.

Originality/value

On the basis of MTLS-SVM, a novel multi-label classification approach, ML2S-SVM, is put forward. This method can overcome the unbalanced problem but also explicitly models arbitrary order correlations among labels by allowing multiple labels to share a subspace. In addition, the multi-label classification approach has a wider range of applications. That is to say, it is not limited to the field of image classification.

Details

The Electronic Library, vol. 37 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 February 2022

Yaotan Xie and Fei Xiang

This study aimed to adapt existing text-mining techniques and propose a novel topic recognition approach for textual patient reviews.

Abstract

Purpose

This study aimed to adapt existing text-mining techniques and propose a novel topic recognition approach for textual patient reviews.

Design/methodology/approach

The authors first transformed multilabel samples for adapting model training forms. Then, an improved method was proposed based on dynamic mixed sampling and transfer learning to improve the learning problem caused by imbalanced samples. Specifically, the training of our model was based on the framework of a convolutional neural network and self-trained Word2Vector on large-scale corpora.

Findings

Compared with the SVM and other CNN-based models, the CNN+ DMS + TL model proposed in this study has made significant improvement in F1 score.

Originality/value

The improved methods based on dynamic mixed sampling and transfer learning can adequately manage the learning problem caused by the skewed distribution of samples and achieve the effective and automatic topic recognition of textual patient reviews.

Peer review

The peer-review history for this article is available at: https://publons.com/publon/10.1108/OIR-01-2021-0059.

Details

Online Information Review, vol. 46 no. 6
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 21 December 2021

Shanling Han, Shoudong Zhang, Yong Li and Long Chen

Intelligent diagnosis of equipment faults can effectively avoid the shutdown caused by equipment faults and improve the safety of the equipment. At present, the diagnosis of…

Abstract

Purpose

Intelligent diagnosis of equipment faults can effectively avoid the shutdown caused by equipment faults and improve the safety of the equipment. At present, the diagnosis of various kinds of bearing fault information, such as the occurrence, location and degree of fault, can be carried out by machine learning and deep learning and realized through the multiclassification method. However, the multiclassification method is not perfect in distinguishing similar fault categories and visual representation of fault information. To improve the above shortcomings, an end-to-end fault multilabel classification model is proposed for bearing fault diagnosis.

Design/methodology/approach

In this model, the labels of each bearing are binarized by using the binary relevance method. Then, the integrated convolutional neural network and gated recurrent unit (CNN-GRU) is employed to classify faults. Different from the general CNN networks, the CNN-GRU network adds multiple GRU layers after the convolutional layers and the pool layers.

Findings

The Paderborn University bearing dataset is utilized to demonstrate the practicability of the model. The experimental results show that the average accuracy in test set is 99.7%, and the proposed network is better than multilayer perceptron and CNN in fault diagnosis of bearing, and the multilabel classification method is superior to the multiclassification method. Consequently, the model can intuitively classify faults with higher accuracy.

Originality/value

The fault labels of each bearing are labeled according to the failure or not, the fault location, the damage mode and the damage degree, and then the binary value is obtained. The multilabel problem is transformed into a binary classification problem of each fault label by the binary relevance method, and the predicted probability value of each fault label is directly output in the output layer, which visually distinguishes different fault conditions.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 12 October 2023

Erk Hacıhasanoğlu, Ömer Faruk Ünlüsoy and Fatma Selen Madenoğlu

The sustainable development goals (SDGs) are introduced to guide achieving the sustainable goals and tackle the global problems. United Nations members may perform activities to…

Abstract

Purpose

The sustainable development goals (SDGs) are introduced to guide achieving the sustainable goals and tackle the global problems. United Nations members may perform activities to achieve the predetermined goals and report on their SDG activities. The comprehension and commitment of several stakeholders are essential for the effective implementation of the SDGs. Countries encourage their stakeholders to perform and report their activities to meet the SDGs. The purpose of this study is to investigate the extent to which corporations’ annual reports address the SDGs to assess and comprehend their level of commitment to, priority of and integration of SDGs within their reporting structure. This research makes it easier to evaluate corporations’ sustainability performance and contributions to global sustainability goals by looking at the extent to which they address the SDGs.

Design/methodology/approach

In the study, it is revealed to what extent the reports meet the SDGs with the multilabel text classification approach. The SDG classification is carried out by examining the report with the help of a text analysis tool based on an enhanced version of gradient boosting. The implementation of a machine learning-based model allowed it to determine which SDGs are associated with the company’s operations without the requirement for the report’s authors to perform so. Therefore, instead of reading the texts to seek for “SDG” evidence as typically occurs in the literature, SDG proof was searched in relevant texts.

Findings

To show the feasibility of the study, the annual reports of the leading companies in Turkey are examined, and the results are interpreted. The study produced results including insights into the sustainable practices of businesses, priority SDG selection, benchmarking and business comparison, gaps and improvement opportunities identification and representation of the SDGs’ importance.

Originality/value

The findings of the analysis of annual reports indicate which SDGs they are concerned about. A gap in the literature can be noticed in the analysis of annual reports of companies that fall under a particular framework. In addition, it has sparked the idea of conducting research on a global scale and in a time series. With the aid of this research, decision-making procedures can be guided, and advancements toward the SDGs can be achieved.

Details

Corporate Governance: The International Journal of Business in Society, vol. 24 no. 3
Type: Research Article
ISSN: 1472-0701

Keywords

Article
Publication date: 3 April 2017

Hei-Chia Wang, Che-Tsung Yang and Yi-Hao Yen

Community question answering (CQA) websites provide an open and free way to share knowledge about general topics on the internet. However, inquirers may not obtain useful answers…

1349

Abstract

Purpose

Community question answering (CQA) websites provide an open and free way to share knowledge about general topics on the internet. However, inquirers may not obtain useful answers and those who are qualified to provide answers may also miss opportunities to share their expertise without any notice. To address this problem, the purpose of this paper is to provide the means for inquirers to access archived answers and to identify effective subject matter experts for target questions.

Design/methodology/approach

This paper presents a question answering promoter, called QAP, for the CQA services. The proposed QAP facilitates the use of filtered archived answers regarded as explicit knowledge and recommended experts regarded as sources of implicit knowledge for the given target questions.

Findings

The experimental results indicate that QAP can leverage knowledge sharing by refining archived answers upon creditability and distributing raised questions to qualified potential experts.

Research limitations/implications

This proposed method is designed for the traditional Chinese corpus.

Originality/value

This paper proposed an integrated framework of answer selection and expert finding uses the bottom-up multipath evaluation algorithm, an underlying voting model, the agglomerative hierarchical clustering technique and feature approaches of answer trustworthiness measuring, identification of satisfied learners and credibility of repliers. The experiments using the corpus crawled from Yahoo! Knowledge Plus under designed scenarios are conducted and results are shown in fine details.

Article
Publication date: 2 February 2022

Deepak Suresh Asudani, Naresh Kumar Nagwani and Pradeep Singh

Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature…

372

Abstract

Purpose

Classifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature vector form for processing is the most difficult challenge in email categorization. The purpose of this paper is to examine the effectiveness of the pre-trained embedding model for the classification of emails using deep learning classifiers such as the long short-term memory (LSTM) model and convolutional neural network (CNN) model.

Design/methodology/approach

In this paper, global vectors (GloVe) and Bidirectional Encoder Representations Transformers (BERT) pre-trained word embedding are used to identify relationships between words, which helps to classify emails into their relevant categories using machine learning and deep learning models. Two benchmark datasets, SpamAssassin and Enron, are used in the experimentation.

Findings

In the first set of experiments, machine learning classifiers, the support vector machine (SVM) model, perform better than other machine learning methodologies. The second set of experiments compares the deep learning model performance without embedding, GloVe and BERT embedding. The experiments show that GloVe embedding can be helpful for faster execution with better performance on large-sized datasets.

Originality/value

The experiment reveals that the CNN model with GloVe embedding gives slightly better accuracy than the model with BERT embedding and traditional machine learning algorithms to classify an email as ham or spam. It is concluded that the word embedding models improve email classifiers accuracy.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 24 July 2020

Angelica Lo Duca and Andrea Marchetti

Ship route prediction (SRP) is a quite complicated task, which enables the determination of the next position of a ship after a given period of time, given its current position…

Abstract

Purpose

Ship route prediction (SRP) is a quite complicated task, which enables the determination of the next position of a ship after a given period of time, given its current position. This paper aims to describe a study, which compares five families of multiclass classification algorithms to perform SRP.

Design/methodology/approach

Tested algorithm families include: Naive Bayes (NB), nearest neighbors, decision trees, linear algorithms and extension from binary. A common structure for all the algorithm families was implemented and adapted to the specific case, according to the test to be done. The tests were done on one month of real data extracted from automatic identification system messages, collected around the island of Malta.

Findings

Experiments show that K-nearest neighbors and decision trees algorithms outperform all the other algorithms. Experiments also demonstrate that linear algorithms and NB have a very poor performance.

Research limitations/implications

This study is limited to the area surrounding Malta. Thus, findings cannot be generalized to every context. However, the methodology presented is general and can help other researchers in this area to choose appropriate methods for their problems.

Practical implications

The results of this study can be exploited by applications for maritime surveillance to build decision support systems to monitor and predict ship routes in a given area. For example, to protect the marine environment, the use of SRP techniques could be used to protect areas at risk such as marine protected areas, from illegal fishing.

Originality/value

The paper proposes a solid methodology to perform tests on SRP, based on a series of important machine learning algorithms for the prediction.

Details

Journal of Systems and Information Technology, vol. 22 no. 3
Type: Research Article
ISSN: 1328-7265

Keywords

Article
Publication date: 15 July 2020

Wenpei Xu and Ting-Kwei Wang

This study provides a safety prewarning mechanism, which includes a comprehensive risk assessment model and a safety prewarning system. The comprehensive risk assessment model is…

Abstract

Purpose

This study provides a safety prewarning mechanism, which includes a comprehensive risk assessment model and a safety prewarning system. The comprehensive risk assessment model is capable of assessing nine safety indicators, which can be categorised into workers’ behaviour, environment and machine-related safety indicators, and the model is embedded in the safety prewarning system. The safety prewarning system can automatically extract safety information from surveillance cameras based on computer vision, assess risks based on the embedded comprehensive risk assessment model, categorise risks into five levels and provide timely suggestions.

Design/methodology/approach

Firstly, the comprehensive risk assessment model is constructed by adopting grey multihierarchical analysis method. The method combines the Analytic Hierarchy Process (AHP) and the grey clustering evaluation in the grey theory. Expert knowledge, obtained through the questionnaire approach, contributes to set weights of risk indicators and evaluate risks. Secondly, a safety prewarning system is developed, including data acquisition layer, data processing layer and prewarning layer. Computer vision is applied in the system to automatically extract real-time safety information from the surveillance cameras. The safety information is then processed through the comprehensive risk assessment model and categorized into five risk levels. A case study is presented to verify the proposed mechanism.

Findings

Through a case study, the result shows that the proposed mechanism is capable of analyzing integrated human-machine-environment risk, timely categorising risks into five risk levels and providing potential suggestions.

Originality/value

The comprehensive risk assessment model is capable of assessing nine risk indicators, identifying three types of entities, workers, environment and machine on the construction site, presenting the integrated risk based on nine indicators. The proposed mechanism, which adopts expert knowledge through Building Information Modeling (BIM) safety simulation and extracts safety information based on computer vision, can perform a dynamic real-time risk analysis, categorize risks into five risk levels and provide potential suggestions to corresponding risk owners. The proposed mechanism can allow the project manager to take timely actions.

Details

Engineering, Construction and Architectural Management, vol. 27 no. 8
Type: Research Article
ISSN: 0969-9988

Keywords

1 – 10 of 33