Search results

1 – 10 of 46
Open Access
Article
Publication date: 11 October 2023

Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…

Abstract

Purpose

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.

Design/methodology/approach

The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.

Findings

The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.

Originality/value

This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.

Details

Asian Association of Open Universities Journal, vol. 18 no. 3
Type: Research Article
ISSN: 1858-3431

Keywords

Open Access
Article
Publication date: 4 September 2017

Yuqin Wang, Bing Liang, Wen Ji, Shiwei Wang and Yiqiang Chen

In the past few years, millions of people started to acquire knowledge from the Massive Open Online Courses (MOOCs). MOOCs contain massive video courses produced by instructors…

2427

Abstract

Purpose

In the past few years, millions of people started to acquire knowledge from the Massive Open Online Courses (MOOCs). MOOCs contain massive video courses produced by instructors, and learners all over the world can get access to these courses via the internet. However, faced with massive courses, learners often waste much time finding courses they like. This paper aims to explore the problem that how to make accurate personalized recommendations for MOOC users.

Design/methodology/approach

This paper proposes a multi-attribute weight algorithm based on collaborative filtering (CF) to select a recommendation set of courses for target MOOC users.

Findings

The recall of the proposed algorithm in this paper is higher than both the traditional CF and a CF-based algorithm – uncertain neighbors’ collaborative filtering recommendation algorithm. The higher the recall is, the more accurate the recommendation result is.

Originality/value

This paper reflects the target users’ preferences for the first time by calculating separately the weight of the attributes and the weight of attribute values of the courses.

Details

International Journal of Crowd Science, vol. 1 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 14 August 2018

Xuemei Li, Ya Zhang and Kedong Yin

The traditional grey relational models directly describe the behavioural characteristics of the systems based on the sample point connections. Few grey relational models can…

Abstract

Purpose

The traditional grey relational models directly describe the behavioural characteristics of the systems based on the sample point connections. Few grey relational models can measure the dynamic periodic fluctuation rules of the objects, and most of these models do not have affinities, which results in instabilities of the relational results because of sequence translation. The paper aims to discuss these issues.

Design/methodology/approach

Fourier transform functions are used to fit the system behaviour curves, redefine the area difference between the curves and construct a grey relational model based on discrete Fourier transform (DFTGRA).

Findings

To verify its validity, feasibility and superiority, DFTGRA is applied to research on the correlation between macroeconomic growth and marine economic growth in China coastal areas. It is proved that DFTGRA has the superior properties of affinity, symmetry, uniqueness, etc., and wide applicability.

Originality/value

DFTGRA can not only be applied to equidistant and equal time sequences but also be adopted for non-equidistant and unequal time sequences. DFTGRA can measure both the global relational degree and the dynamic correlation of the variable cyclical fluctuation between sequences.

Details

Marine Economics and Management, vol. 1 no. 1
Type: Research Article
ISSN: 2516-158X

Keywords

Open Access
Article
Publication date: 21 June 2021

Bufei Xing, Haonan Yin, Zhijun Yan and Jiachen Wang

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and…

Abstract

Purpose

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and sharing.

Design/methodology/approach

This paper proposes a hybrid approach to combining domain knowledge similarity and topic similarity to retrieve similar questions in online health communities. The domain knowledge similarity can evaluate the domain distance between different questions. And the topic similarity measures questions’ relationship base on the extracted latent topics.

Findings

The experiment results show that the proposed method outperforms the baseline methods.

Originality/value

This method conquers the problem of word mismatch and considers the named entities included in questions, which most of existing studies did not.

Details

International Journal of Crowd Science, vol. 5 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 22 November 2022

Kedong Yin, Yun Cao, Shiwei Zhou and Xinman Lv

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems…

Abstract

Purpose

The purposes of this research are to study the theory and method of multi-attribute index system design and establish a set of systematic, standardized, scientific index systems for the design optimization and inspection process. The research may form the basis for a rational, comprehensive evaluation and provide the most effective way of improving the quality of management decision-making. It is of practical significance to improve the rationality and reliability of the index system and provide standardized, scientific reference standards and theoretical guidance for the design and construction of the index system.

Design/methodology/approach

Using modern methods such as complex networks and machine learning, a system for the quality diagnosis of index data and the classification and stratification of index systems is designed. This guarantees the quality of the index data, realizes the scientific classification and stratification of the index system, reduces the subjectivity and randomness of the design of the index system, enhances its objectivity and rationality and lays a solid foundation for the optimal design of the index system.

Findings

Based on the ideas of statistics, system theory, machine learning and data mining, the focus in the present research is on “data quality diagnosis” and “index classification and stratification” and clarifying the classification standards and data quality characteristics of index data; a data-quality diagnosis system of “data review – data cleaning – data conversion – data inspection” is established. Using a decision tree, explanatory structural model, cluster analysis, K-means clustering and other methods, classification and hierarchical method system of indicators is designed to reduce the redundancy of indicator data and improve the quality of the data used. Finally, the scientific and standardized classification and hierarchical design of the index system can be realized.

Originality/value

The innovative contributions and research value of the paper are reflected in three aspects. First, a method system for index data quality diagnosis is designed, and multi-source data fusion technology is adopted to ensure the quality of multi-source, heterogeneous and mixed-frequency data of the index system. The second is to design a systematic quality-inspection process for missing data based on the systematic thinking of the whole and the individual. Aiming at the accuracy, reliability, and feasibility of the patched data, a quality-inspection method of patched data based on inversion thought and a unified representation method of data fusion based on a tensor model are proposed. The third is to use the modern method of unsupervised learning to classify and stratify the index system, which reduces the subjectivity and randomness of the design of the index system and enhances its objectivity and rationality.

Details

Marine Economics and Management, vol. 5 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

Open Access
Article
Publication date: 4 August 2020

Kanak Meena, Devendra K. Tayal, Oscar Castillo and Amita Jain

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the…

737

Abstract

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the uneven distribution of attributes occurs, and it can cause a severe load imbalance problem. When database join operations are applied to these datasets, skewness occurs exponentially. All the algorithms developed to date for the implementation of database joins are highly skew sensitive. This paper presents a new approach for handling data-skewness in a character- based string similarity join using the MapReduce framework. In the literature, no such work exists to handle data skewness in character-based string similarity join, although work for set based string similarity joins exists. Proposed work has been divided into three stages, and every stage is further divided into mapper and reducer phases, which are dedicated to a specific task. The first stage is dedicated to finding the length of strings from a dataset. For valid candidate pair generation, MR-Pass Join framework has been suggested in the second stage. MRFA concepts are incorporated for string similarity join, which is named as “MRFA-SSJ” (MapReduce Frequency Adaptive – String Similarity Join) in the third stage which is further divided into four MapReduce phases. Hence, MRFA-SSJ has been proposed to handle skewness in the string similarity join. The experiments have been implemented on three different datasets namely: DBLP, Query log and a real dataset of IP addresses & Cookies by deploying Hadoop framework. The proposed algorithm has been compared with three known algorithms and it has been noticed that all these algorithms fail when data is highly skewed, whereas our proposed method handles highly skewed data without any problem. A set-up of the 15-node cluster has been used in this experiment, and we are following the Zipf distribution law for the analysis of skewness factor. Also, a comparison among existing and proposed techniques has been shown. Existing techniques survived till Zipf factor 0.5 whereas the proposed algorithm survives up to Zipf factor 1. Hence the proposed algorithm is skew insensitive and ensures scalability with a reasonable query processing time for string similarity database join. It also ensures the even distribution of attributes.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 24 January 2020

Xia Yao, Hongbo Sun and Baode Fan

The purpose of this paper is to aim mainly at social public decision-making problems, studies the corresponding relationship between different voting rule combinations and the…

Abstract

Purpose

The purpose of this paper is to aim mainly at social public decision-making problems, studies the corresponding relationship between different voting rule combinations and the final results, and discusses the quantitative relationships between group intelligence (final votes) and individual intelligence (everyone) to defend democracy under the circumstance of rapid development of network technology, and crowd intelligence becomes more complicated and universal.

Design/methodology/approach

After summarizing the crowd co-decisions of related studies, the standards, frameworks, techniques, methods and tools have been discussed according to the characteristics of large-scale simulations.

Findings

The contributions of this paper will be useful for both academics and practitioners for formulating VV&A in large-scale simulations.

Originality/value

This paper will help researchers solve the social public decision-making problems in large-scale simulations.

Details

International Journal of Crowd Science, vol. 4 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 5 August 2021

Rui Qiu and Wen Ji

Many recommender systems are generally unable to provide accurate recommendations to users with limited interaction history, which is known as the cold-start problem. This issue…

Abstract

Purpose

Many recommender systems are generally unable to provide accurate recommendations to users with limited interaction history, which is known as the cold-start problem. This issue can be resolved by trivial approaches that select random items or the most popular one to recommend to the new users. However, these methods perform poorly in many cases. This paper aims to explore the problem that how to make accurate recommendations for the new users in cold-start scenarios.

Design/methodology/approach

In this paper, the authors propose embedded-bandit method, inspired by Word2Vec technique and contextual bandit algorithm. The authors describe user contextual information with item embedding features constructed by Word2Vec. In addition, based on the intelligence measurement model in Crowd Science, the authors propose a new evaluation method to measure the utility of recommendations.

Findings

The authors introduce Word2Vec technique for constructing user contextual features, which improved the accuracy of recommendations compared to traditional multi-armed bandit problem. Apart from this, using this study’s intelligence measurement model, the utility also outperforms.

Practical implications

Improving the accuracy of recommendations during the cold-start phase can greatly raise user stickiness and increase user favorability, which in turn contributes to the commercialization of the app.

Originality/value

The algorithm proposed in this paper reflects that user contextual features can be represented by clicked items embedding vector.

Details

International Journal of Crowd Science, vol. 5 no. 3
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1210

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 20 September 2022

Joo Hun Yoo, Hyejun Jeong, Jaehyeok Lee and Tai-Myoung Chung

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be…

2907

Abstract

Purpose

This study aims to summarize the critical issues in medical federated learning and applicable solutions. Also, detailed explanations of how federated learning techniques can be applied to the medical field are presented. About 80 reference studies described in the field were reviewed, and the federated learning framework currently being developed by the research team is provided. This paper will help researchers to build an actual medical federated learning environment.

Design/methodology/approach

Since machine learning techniques emerged, more efficient analysis was possible with a large amount of data. However, data regulations have been tightened worldwide, and the usage of centralized machine learning methods has become almost infeasible. Federated learning techniques have been introduced as a solution. Even with its powerful structural advantages, there still exist unsolved challenges in federated learning in a real medical data environment. This paper aims to summarize those by category and presents possible solutions.

Findings

This paper provides four critical categorized issues to be aware of when applying the federated learning technique to the actual medical data environment, then provides general guidelines for building a federated learning environment as a solution.

Originality/value

Existing studies have dealt with issues such as heterogeneity problems in the federated learning environment itself, but those were lacking on how these issues incur problems in actual working tasks. Therefore, this paper helps researchers understand the federated learning issues through examples of actual medical machine learning environments.

Details

International Journal of Web Information Systems, vol. 18 no. 2/3
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of 46