Search results

1 – 10 of 553
Open Access
Article
Publication date: 11 October 2023

Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…

Abstract

Purpose

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.

Design/methodology/approach

The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.

Findings

The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.

Originality/value

This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.

Details

Asian Association of Open Universities Journal, vol. 18 no. 3
Type: Research Article
ISSN: 1858-3431

Keywords

Open Access
Article
Publication date: 21 June 2023

Sudhaman Parthasarathy and S.T. Padmapriya

Algorithm bias refers to repetitive computer program errors that give some users more weight than others. The aim of this article is to provide a deeper insight of algorithm bias…

1129

Abstract

Purpose

Algorithm bias refers to repetitive computer program errors that give some users more weight than others. The aim of this article is to provide a deeper insight of algorithm bias in AI-enabled ERP software customization. Although algorithmic bias in machine learning models has uneven, unfair and unjust impacts, research on it is mostly anecdotal and scattered.

Design/methodology/approach

As guided by the previous research (Akter et al., 2022), this study presents the possible design bias (model, data and method) one may experience with enterprise resource planning (ERP) software customization algorithm. This study then presents the artificial intelligence (AI) version of ERP customization algorithm using k-nearest neighbours algorithm.

Findings

This study illustrates the possible bias when the prioritized requirements customization estimation (PRCE) algorithm available in the ERP literature is executed without any AI. Then, the authors present their newly developed AI version of the PRCE algorithm that uses ML techniques. The authors then discuss its adjoining algorithmic bias with an illustration. Further, the authors also draw a roadmap for managing algorithmic bias during ERP customization in practice.

Originality/value

To the best of the authors’ knowledge, no prior research has attempted to understand the algorithmic bias that occurs during the execution of the ERP customization algorithm (with or without AI).

Details

Journal of Ethics in Entrepreneurship and Technology, vol. 3 no. 2
Type: Research Article
ISSN: 2633-7436

Keywords

Open Access
Article
Publication date: 30 July 2020

Alaa Tharwat

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of…

34133

Abstract

Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.

Details

Applied Computing and Informatics, vol. 17 no. 1
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1242

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 4 August 2020

Kanak Meena, Devendra K. Tayal, Oscar Castillo and Amita Jain

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the…

751

Abstract

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the uneven distribution of attributes occurs, and it can cause a severe load imbalance problem. When database join operations are applied to these datasets, skewness occurs exponentially. All the algorithms developed to date for the implementation of database joins are highly skew sensitive. This paper presents a new approach for handling data-skewness in a character- based string similarity join using the MapReduce framework. In the literature, no such work exists to handle data skewness in character-based string similarity join, although work for set based string similarity joins exists. Proposed work has been divided into three stages, and every stage is further divided into mapper and reducer phases, which are dedicated to a specific task. The first stage is dedicated to finding the length of strings from a dataset. For valid candidate pair generation, MR-Pass Join framework has been suggested in the second stage. MRFA concepts are incorporated for string similarity join, which is named as “MRFA-SSJ” (MapReduce Frequency Adaptive – String Similarity Join) in the third stage which is further divided into four MapReduce phases. Hence, MRFA-SSJ has been proposed to handle skewness in the string similarity join. The experiments have been implemented on three different datasets namely: DBLP, Query log and a real dataset of IP addresses & Cookies by deploying Hadoop framework. The proposed algorithm has been compared with three known algorithms and it has been noticed that all these algorithms fail when data is highly skewed, whereas our proposed method handles highly skewed data without any problem. A set-up of the 15-node cluster has been used in this experiment, and we are following the Zipf distribution law for the analysis of skewness factor. Also, a comparison among existing and proposed techniques has been shown. Existing techniques survived till Zipf factor 0.5 whereas the proposed algorithm survives up to Zipf factor 1. Hence the proposed algorithm is skew insensitive and ensures scalability with a reasonable query processing time for string similarity database join. It also ensures the even distribution of attributes.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 29 June 2018

Jonathan Simões Freitas, Jéssica Castilho Andrade Ferreira, André Azevedo Rennó Campos, Júlio Cézar Fonseca de Melo, Lin Chih Cheng and Carlos Alberto Gonçalves

This paper aims to map the creation and evolution of centering resonance analysis (CRA). This method was an innovative approach developed to conduct textual content analysis in a…

1319

Abstract

Purpose

This paper aims to map the creation and evolution of centering resonance analysis (CRA). This method was an innovative approach developed to conduct textual content analysis in a semi-automatic, theory-informed and analytically rigorous way. Nevertheless, despite its robust procedures to analyze documents and interviews, CRA is still broadly unknown and scarcely used in management research.

Design/methodology/approach

To track CRA’s development, the roadmapping approach was properly adapted. The traditional time-based multi-layered map format was customized to depict, graphically, the results obtained from a systematic literature review of the main CRA publications.

Findings

In total, 19 papers were reviewed, from the method’s introduction in 2002 to its last tracked methodological development. In all, 26 types of CRA analysis were identified and grouped in five categories. The most innovative procedures in each group were discussed and exemplified. Finally, a CRA methodological roadmap was presented, including a layered typology of the publications, in terms of their focus and innovativeness; the number of analysis conducted in each publication; references for further CRA development; a segmentation and description of the main publication periods; main turning points; citation-based relationships; and four possible future scenarios for CRA as a method.

Originality/value

This paper offers a unique and comprehensive review of CRA’s development, favoring its broader use in management research. In addition, it develops an adapted version of the roadmapping approach, customized for mapping methodological innovations over time.

Details

RAUSP Management Journal, vol. 53 no. 3
Type: Research Article
ISSN: 2531-0488

Keywords

Open Access
Article
Publication date: 20 September 2022

Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be…

3043

Abstract

Purpose

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment.

Design/methodology/approach

Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions.

Findings

This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution.

Originality/value

Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.

Details

International Journal of Web Information Systems, vol. 18 no. 2/3
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 21 April 2022

Warot Moungsouy, Thanawat Tawanbunjerd, Nutcha Liamsomboon and Worapan Kusakunniran

This paper proposes a solution for recognizing human faces under mask-wearing. The lower part of human face is occluded and could not be used in the learning process of face…

2674

Abstract

Purpose

This paper proposes a solution for recognizing human faces under mask-wearing. The lower part of human face is occluded and could not be used in the learning process of face recognition. So, the proposed solution is developed to recognize human faces on any available facial components which could be varied depending on wearing or not wearing a mask.

Design/methodology/approach

The proposed solution is developed based on the FaceNet framework, aiming to modify the existing facial recognition model to improve the performance of both scenarios of mask-wearing and without mask-wearing. Then, simulated masked-face images are computed on top of the original face images, to be used in the learning process of face recognition. In addition, feature heatmaps are also drawn out to visualize majority of parts of facial images that are significant in recognizing faces under mask-wearing.

Findings

The proposed method is validated using several scenarios of experiments. The result shows an outstanding accuracy of 99.2% on a scenario of mask-wearing faces. The feature heatmaps also show that non-occluded components including eyes and nose become more significant for recognizing human faces, when compared with the lower part of human faces which could be occluded under masks.

Originality/value

The convolutional neural network based solution is tuned up for recognizing human faces under a scenario of mask-wearing. The simulated masks on original face images are augmented for training the face recognition model. The heatmaps are then computed to prove that features generated from the top half of face images are correctly chosen for the face recognition.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 21 September 2022

Chowdhury Noushin Novera, Zobayer Ahmed, Rafsanjany Kushol, Peter Wanke and Md. Abul Kalam Azad

Although there has been a significant amount of research on Smart Tourism, the articles have not yet been combined into a thorough literature review that can examine research…

5262

Abstract

Purpose

Although there has been a significant amount of research on Smart Tourism, the articles have not yet been combined into a thorough literature review that can examine research streams and the scope of future research. The purpose of this study is to examine the literature on the impact of deploying the Internet of Things (IoT) in tourism sector development to attract more visitors using a text mining technique and citation based bibliometric analysis for the first time.

Design/methodology/approach

This study uses R programming to do a full-text analysis of 36 publications on IoT in tourism and visualization of similarities viewer software to conduct a bibliometric citation analysis of 469 papers from the Scopus database. Aside from that, the documents were subjected to a longitudinal study using Excel and word frequency using a trending topic using the R-tool.

Findings

Results from the bibliometric study revealed the networks that exist in the literature of Tourism Management. With the use of log-likelihood, the findings from text mining identified nine theme models on the basis of relevancy, which is presented alongside an overview of the existing papers and a list of the primary authors with posterior probability using latent Dirichlet allocation.

Originality/value

This study examines tourism literature in which IoT plays a significant role. To the best of the authors’ knowledge, this study is the first to combine text mining with a bibliometric review. It significantly analyzes and discusses the impact of technology in the tourism sector development on attracting tourists while presenting the most important and frequently discussed topics and research in these writings. These findings provide researchers, tourism managers and technology professionals with a complete understanding of e-tourism and to provide smart devices to attract tourists.

Propósito

Aunque ha habido un número importante de estudios sobre el turismo inteligente, todavía no se dispone de una revisión bibliográfica exhaustiva que permita examinar las corrientes de investigación y las sugerencias de investigación futuras. Este estudio examina la literatura sobre el impacto del Internet de las cosas en el desarrollo del sector turístico para atraer más visitantes utilizando una técnica de minería de textos y un análisis bibliométrico basado en citas.

Metodología

Este estudio utiliza la programación R para hacer un análisis de texto completo de 36 publicaciones sobre IoT en el turismo y el software de visualización de similitudes (VOS) para realizar un análisis bibliométrico de citas de 469 documentos de la base de datos Scopus. Además, los documentos fueron sometidos a un estudio longitudinal mediante Excel y a la frecuencia de palabras mediante un tema de tendencia utilizando la herramienta R.

Resultados

Los resultados del estudio bibliométrico revelaron las redes existentes en la literatura de la Gestión Turística. Con el uso de la log-verosimilitud, los resultados de la minería de textos identificaron nueve modelos temáticos sobre la base de la relevancia, que se presentan junto con una visión general de los documentos existentes y una lista de los autores principales con probabilidad posterior utilizando la asignación latente de dirichlets.

Originalidad

Este estudio examina la literatura sobre turismo en la que la IoT desempeña un papel importante. Este estudio es el primero que combina la minería de textos con una revisión bibliométrica. Analiza y discute de forma significativa el impacto de la tecnología en el desarrollo del sector turístico para atraer a los turistas, a la vez que presenta los temas e investigaciones más importantes y más frecuentemente discutidos en estos escritos. Estos resultados proporcionan a los investigadores, gestores turísticos y profesionales de la tecnología una comprensión integral del turismo electrónico y los dispositivos inteligentes para atraer a los turistas.

目的

虽然已经有大量关于智慧旅游的研究, 但这些文章尚未整合成一个全面的文献综述, 可以检阅目前的研究流和未来研究的范畴。本研究首次使用文本挖掘技术和基于引文的文献计量分析, 来研究有关在旅游业发展中部署物联网对吸引更多游客的影响的文献。

方法

本研究使用R编程对36篇关于旅游业物联网的文章进行全文分析, 并使用相似性可视化(VOS)查看器软件对Scopus数据库中的469篇论文进行文献计量引文分析。除此之外, 还利用Excel对这些文献进行了纵向研究, 并使用R工具对趋势主题进行了词频分析。

结果

文献计量研究的结果揭示了旅游管理文献中现有的网络。通过使用对数似然, 文本挖掘的结果根据相关性确定了9个主题模型, 这些模型与现有论文的概述和主要作者名单在使用潜在狄里奇分配(LDA)的后验概率一起呈现。

原创性

本研究对旅游物联网相关文献进行了分析研究, 它首次将文本挖掘与文献计量学审查相结合。这项研究着重分析和讨论了技术在旅游行业发展中对吸引游客的影响, 同时介绍了这些文章中最重要和经常讨论的主题和研究。这些发现为研究人员、旅游管理者和技术专家提供了对科技与旅游的全面了解, 并提供关于智能设备来吸引游客的建议。

Open Access
Article
Publication date: 6 July 2020

Basma Makhlouf Shabou, Julien Tièche, Julien Knafou and Arnaud Gaudinat

This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the…

4293

Abstract

Purpose

This paper aims to describe an interdisciplinary and innovative research conducted in Switzerland, at the Geneva School of Business Administration HES-SO and supported by the State Archives of Neuchâtel (Office des archives de l'État de Neuchâtel, OAEN). The problem to be addressed is one of the most classical ones: how to extract and discriminate relevant data in a huge amount of diversified and complex data record formats and contents. The goal of this study is to provide a framework and a proof of concept for a software that helps taking defensible decisions on the retention and disposal of records and data proposed to the OAEN. For this purpose, the authors designed two axes: the archival axis, to propose archival metrics for the appraisal of structured and unstructured data, and the data mining axis to propose algorithmic methods as complementary or/and additional metrics for the appraisal process.

Design/methodology/approach

Based on two axes, this exploratory study designs and tests the feasibility of archival metrics that are paired to data mining metrics, to advance, as much as possible, the digital appraisal process in a systematic or even automatic way. Under Axis 1, the authors have initiated three steps: first, the design of a conceptual framework to records data appraisal with a detailed three-dimensional approach (trustworthiness, exploitability, representativeness). In addition, the authors defined the main principles and postulates to guide the operationalization of the conceptual dimensions. Second, the operationalization proposed metrics expressed in terms of variables supported by a quantitative method for their measurement and scoring. Third, the authors shared this conceptual framework proposing the dimensions and operationalized variables (metrics) with experienced professionals to validate them. The expert’s feedback finally gave the authors an idea on: the relevance and the feasibility of these metrics. Those two aspects may demonstrate the acceptability of such method in a real-life archival practice. In parallel, Axis 2 proposes functionalities to cover not only macro analysis for data but also the algorithmic methods to enable the computation of digital archival and data mining metrics. Based on that, three use cases were proposed to imagine plausible and illustrative scenarios for the application of such a solution.

Findings

The main results demonstrate the feasibility of measuring the value of data and records with a reproducible method. More specifically, for Axis 1, the authors applied the metrics in a flexible and modular way. The authors defined also the main principles needed to enable computational scoring method. The results obtained through the expert’s consultation on the relevance of 42 metrics indicate an acceptance rate above 80%. In addition, the results show that 60% of all metrics can be automated. Regarding Axis 2, 33 functionalities were developed and proposed under six main types: macro analysis, microanalysis, statistics, retrieval, administration and, finally, the decision modeling and machine learning. The relevance of metrics and functionalities is based on the theoretical validity and computational character of their method. These results are largely satisfactory and promising.

Originality/value

This study offers a valuable aid to improve the validity and performance of archival appraisal processes and decision-making. Transferability and applicability of these archival and data mining metrics could be considered for other types of data. An adaptation of this method and its metrics could be tested on research data, medical data or banking data.

Details

Records Management Journal, vol. 30 no. 2
Type: Research Article
ISSN: 0956-5698

Keywords

1 – 10 of 553