Search results

1 – 10 of over 4000
Open Access
Article
Publication date: 22 November 2022

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

Open Access
Article
Publication date: 8 December 2020

Matjaž Kragelj and Mirjana Kljajić Borštnar

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

2885

Abstract

Purpose

The purpose of this study is to develop a model for automated classification of old digitised texts to the Universal Decimal Classification (UDC), using machine-learning methods.

Design/methodology/approach

The general research approach is inherent to design science research, in which the problem of UDC assignment of the old, digitised texts is addressed by developing a machine-learning classification model. A corpus of 70,000 scholarly texts, fully bibliographically processed by librarians, was used to train and test the model, which was used for classification of old texts on a corpus of 200,000 items. Human experts evaluated the performance of the model.

Findings

Results suggest that machine-learning models can correctly assign the UDC at some level for almost any scholarly text. Furthermore, the model can be recommended for the UDC assignment of older texts. Ten librarians corroborated this on 150 randomly selected texts.

Research limitations/implications

The main limitations of this study were unavailability of labelled older texts and the limited availability of librarians.

Practical implications

The classification model can provide a recommendation to the librarians during their classification work; furthermore, it can be implemented as an add-on to full-text search in the library databases.

Social implications

The proposed methodology supports librarians by recommending UDC classifiers, thus saving time in their daily work. By automatically classifying older texts, digital libraries can provide a better user experience by enabling structured searches. These contribute to making knowledge more widely available and useable.

Originality/value

These findings contribute to the field of automated classification of bibliographical information with the usage of full texts, especially in cases in which the texts are old, unstructured and in which archaic language and vocabulary are used.

Details

Journal of Documentation, vol. 77 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 30 July 2020

Alaa Tharwat

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of…

32605

Abstract

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.

Details

Applied Computing and Informatics, vol. 17 no. 1
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 7 July 2021

Habib Shah

Breast cancer is an important medical disorder, which is not a single disease but a cluster more than 200 different serious medical complications.

Abstract

Purpose

Breast cancer is an important medical disorder, which is not a single disease but a cluster more than 200 different serious medical complications.

Design/methodology/approach

The new artificial bee colony (ABC) implementation has been applied to probabilistic neural network (PNN) for training and testing purpose to classify the breast cancer data set.

Findings

The new ABC algorithm along with PNN has been successfully applied to breast cancers data set for prediction purpose with minimum iteration consuming.

Originality/value

The new implementation of ABC along PNN can be easily applied to times series problems for accurate prediction or classification.

Details

Frontiers in Engineering and Built Environment, vol. 1 no. 2
Type: Research Article
ISSN: 2634-2499

Keywords

Open Access
Article
Publication date: 19 August 2022

Marlon Santiago Viñán-Ludeña and Luis M. de Campos

The main purpose of this paper is to analyze a tourist destination using sentiment analysis techniques with data from Twitter and Instagram to find the most representative…

3106

Abstract

Purpose

The main purpose of this paper is to analyze a tourist destination using sentiment analysis techniques with data from Twitter and Instagram to find the most representative entities (or places) and perceptions (or aspects) of the users.

Design/methodology/approach

The authors used 90,725 Instagram posts and 235,755 Twitter tweets to analyze tourism in Granada (Spain) to identify the important places and perceptions mentioned by travelers on both social media sites. The authors used several approaches for sentiment classification for English and Spanish texts, including deep learning models.

Findings

The best results in a test set were obtained using a bidirectional encoder representations from transformers (BERT) model for Spanish texts and Tweeteval for English texts, and these were subsequently used to analyze the data sets. It was then possible to identify the most important entities and aspects, and this, in turn, provided interesting insights for researchers, practitioners, travelers and tourism managers so that services could be improved and better marketing strategies formulated.

Research limitations/implications

The authors propose a Spanish-Tourism-BERT model for performing sentiment classification together with a process to find places through hashtags and to reveal the important negative aspects of each place.

Practical implications

The study enables managers and practitioners to implement the Spanish-BERT model with our Spanish Tourism data set that the authors released for adoption in applications to find both positive and negative perceptions.

Originality/value

This study presents a novel approach on how to apply sentiment analysis in the tourism domain. First, the way to evaluate the different existing models and tools is presented; second, a model is trained using BERT (deep learning model); third, an approach of how to identify the acceptance of the places of a destination through hashtags is presented and, finally, the evaluation of why the users express positivity (negativity) through the identification of entities and aspects.

研究目的

这项工作的主要目的是使用情感分析技术和来自 Twitter 和 Instagram 的数据来分析旅游目的地, 以便找到最具代表性的实体(或地点)和用户的感知(或方面)。

研究设计/方法/途径

我们使用 90,725 个 Instagram 帖子和 235,755 个 Twitter 推文来分析格拉纳达(西班牙)的旅游业, 以确定旅行者在两个社交媒体网站上提到的重要地点和看法。我们使用了几种方法对英语和西班牙语文本进行情感分类, 包括深度学习模型。

研究发现

测试集中的最佳结果是使用来自Transformers (BERT) 模型的双向编码器表示 (BERT) 用于西班牙语文本和Tweeteval 用于英语文本, 这些结果随后用于分析我们的数据集。然后可以确定最重要的实体和方面, 这反过来又为研究人员、从业人员、旅行者和旅游管理者提供了有趣的见解, 从而可以改进服务并制定更好的营销策略。

研究局限性

我们提出了一个用于执行情感分类的西班牙旅游 BERT 模型, 以及通过主题标签找到地点并揭示每个地点的重要负面方面的过程。

实践意义

该研究使管理人员和从业人员能够使用我们发布的西班牙旅游数据集实施西班牙-BERT 模型, 以便在应用程序中采用该数据集, 以找到正面和负面的看法。

研究原创性

本研究提出了一种如何在旅游领域应用情感分析的新方法。首先, 介绍了评估不同现有模型和工具的方法; 其次, 使用 BERT(深度学习模型)训练模型; 第三, 提出了如何通过标签识别目的地地点的接受度的方法, 最后通过实体和方面的识别来评估用户表达积极性(消极性)的原因。

Details

Journal of Hospitality and Tourism Technology, vol. 13 no. 5
Type: Research Article
ISSN: 1757-9880

Keywords

Open Access
Article
Publication date: 3 July 2017

Rahila Umer, Teo Susnjak, Anuradha Mathrani and Suriadi Suriadi

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses…

6230

Abstract

Purpose

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses (MOOCs). It investigates the impact of various machine learning techniques in combination with process mining features to measure effectiveness of these techniques.

Design/methodology/approach

Student’s data (e.g. assessment grades, demographic information) and weekly interaction data based on event logs (e.g. video lecture interaction, solution submission time, time spent weekly) have guided this design. This study evaluates four machine learning classification techniques used in the literature (logistic regression (LR), Naïve Bayes (NB), random forest (RF) and K-nearest neighbor) to monitor weekly progression of students’ performance and to predict their overall performance outcome. Two data sets – one, with traditional features and second, with features obtained from process conformance testing – have been used.

Findings

The results show that techniques used in the study are able to make predictions on the performance of students. Overall accuracy (F1-score, area under curve) of machine learning techniques can be improved by integrating process mining features with standard features. Specifically, the use of LR and NB classifiers outperforms other techniques in a statistical significant way.

Practical implications

Although MOOCs provide a platform for learning in highly scalable and flexible manner, they are prone to early dropout and low completion rate. This study outlines a data-driven approach to improve students’ learning experience and decrease the dropout rate.

Social implications

Early predictions based on individual’s participation can help educators provide support to students who are struggling in the course.

Originality/value

This study outlines the innovative use of process mining techniques in education data mining to help educators gather data-driven insight on student performances in the enrolled courses.

Details

Journal of Research in Innovative Teaching & Learning, vol. 10 no. 2
Type: Research Article
ISSN: 2397-7604

Keywords

Open Access
Article
Publication date: 29 March 2021

Hamad Al Jassmi, Mahmoud Al Ahmad and Soha Ahmed

The first step toward developing an automated construction workers performance monitoring system is to initially establish a complete and competent activity recognition solution…

1683

Abstract

Purpose

The first step toward developing an automated construction workers performance monitoring system is to initially establish a complete and competent activity recognition solution, which is still lacking. This study aims to propose a novel approach of using labor physiological data collected through wearable sensors as means of remote and automatic activity recognition.

Design/methodology/approach

A pilot study is conducted against three pre-fabrication stone construction workers throughout three full working shifts to test the ability of automatically recognizing the type of activities they perform in-site through their lively measured physiological signals (i.e. blood volume pulse, respiration rate, heart rate, galvanic skin response and skin temperature). The physiological data are broadcasted from wearable sensors to a tablet application developed for this particular purpose, and are therefore used to train and assess the performance of various machine-learning classifiers.

Findings

A promising result of up to 88% accuracy level for activity recognition was achieved by using an artificial neural network classifier. Nonetheless, special care needs to be taken for some activities that evoke similar physiological patterns. It is expected that blending this method with other currently developed camera-based or kinetic-based methods would yield higher activity recognition accuracy levels.

Originality/value

The proposed method complements previously proposed labor tracking methods that focused on monitoring labor trajectories and postures, by using additional rich source of information from labors physiology, for real-time and remote activity recognition. Ultimately, this paves for an automated and comprehensive solution with which construction managers could monitor, control and collect rich real-time data about workers performance remotely.

Details

Construction Innovation , vol. 21 no. 4
Type: Research Article
ISSN: 1471-4175

Keywords

Open Access
Article
Publication date: 10 April 2023

Simon Andersson

This study aims to identify problems connected to information classification in theory and to put those problems into the context of experiences from practice.

1251

Abstract

Purpose

This study aims to identify problems connected to information classification in theory and to put those problems into the context of experiences from practice.

Design/methodology/approach

Five themes describing problems are discussed in an empirical study, having informants represented from both a public and a private sector organization.

Findings

The reasons for problems to occur in information classification are exemplified by the informants’ experiences. The study concludes with directions for future research.

Originality/value

Information classification sustains the basics of security measures. The human–organizational challenges are evident in the activities but have received little attention in research.

Details

Information & Computer Security, vol. 31 no. 4
Type: Research Article
ISSN: 2056-4961

Keywords

Open Access
Article
Publication date: 28 July 2020

Noura AlNuaimi, Mohammad Mehedy Masud, Mohamed Adel Serhani and Nazar Zaki

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time…

3574

Abstract

Organizations in many domains generate a considerable amount of heterogeneous data every day. Such data can be processed to enhance these organizations’ decisions in real time. However, storing and processing large and varied datasets (known as big data) is challenging to do in real time. In machine learning, streaming feature selection has always been considered a superior technique for selecting the relevant subset features from highly dimensional data and thus reducing learning complexity. In the relevant literature, streaming feature selection refers to the features that arrive consecutively over time; despite a lack of exact figure on the number of features, numbers of instances are well-established. Many scholars in the field have proposed streaming-feature-selection algorithms in attempts to find the proper solution to this problem. This paper presents an exhaustive and methodological introduction of these techniques. This study provides a review of the traditional feature-selection algorithms and then scrutinizes the current algorithms that use streaming feature selection to determine their strengths and weaknesses. The survey also sheds light on the ongoing challenges in big-data research.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 15 June 2021

Leila Ismail and Huned Materwala

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine…

2123

Abstract

Purpose

Machine Learning is an intelligent methodology used for prediction and has shown promising results in predictive classifications. One of the critical areas in which machine learning can save lives is diabetes prediction. Diabetes is a chronic disease and one of the 10 causes of death worldwide. It is expected that the total number of diabetes will be 700 million in 2045; a 51.18% increase compared to 2019. These are alarming figures, and therefore, it becomes an emergency to provide an accurate diabetes prediction.

Design/methodology/approach

Health professionals and stakeholders are striving for classification models to support prognosis of diabetes and formulate strategies for prevention. The authors conduct literature review of machine models and propose an intelligent framework for diabetes prediction.

Findings

The authors provide critical analysis of machine learning models, propose and evaluate an intelligent machine learning-based architecture for diabetes prediction. The authors implement and evaluate the decision tree (DT)-based random forest (RF) and support vector machine (SVM) learning models for diabetes prediction as the mostly used approaches in the literature using our framework.

Originality/value

This paper provides novel intelligent diabetes mellitus prediction framework (IDMPF) using machine learning. The framework is the result of a critical examination of prediction models in the literature and their application to diabetes. The authors identify the training methodologies, models evaluation strategies, the challenges in diabetes prediction and propose solutions within the framework. The research results can be used by health professionals, stakeholders, students and researchers working in the diabetes prediction area.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

1 – 10 of over 4000