Search results

1 – 10 of over 2000
Open Access
Article
Publication date: 21 June 2021

Bufei Xing, Haonan Yin, Zhijun Yan and Jiachen Wang

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and…

Abstract

Purpose

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and sharing.

Design/methodology/approach

This paper proposes a hybrid approach to combining domain knowledge similarity and topic similarity to retrieve similar questions in online health communities. The domain knowledge similarity can evaluate the domain distance between different questions. And the topic similarity measures questions’ relationship base on the extracted latent topics.

Findings

The experiment results show that the proposed method outperforms the baseline methods.

Originality/value

This method conquers the problem of word mismatch and considers the named entities included in questions, which most of existing studies did not.

Details

International Journal of Crowd Science, vol. 5 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1223

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 4 August 2020

Kanak Meena, Devendra K. Tayal, Oscar Castillo and Amita Jain

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the…

742

Abstract

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the uneven distribution of attributes occurs, and it can cause a severe load imbalance problem. When database join operations are applied to these datasets, skewness occurs exponentially. All the algorithms developed to date for the implementation of database joins are highly skew sensitive. This paper presents a new approach for handling data-skewness in a character- based string similarity join using the MapReduce framework. In the literature, no such work exists to handle data skewness in character-based string similarity join, although work for set based string similarity joins exists. Proposed work has been divided into three stages, and every stage is further divided into mapper and reducer phases, which are dedicated to a specific task. The first stage is dedicated to finding the length of strings from a dataset. For valid candidate pair generation, MR-Pass Join framework has been suggested in the second stage. MRFA concepts are incorporated for string similarity join, which is named as “MRFA-SSJ” (MapReduce Frequency Adaptive – String Similarity Join) in the third stage which is further divided into four MapReduce phases. Hence, MRFA-SSJ has been proposed to handle skewness in the string similarity join. The experiments have been implemented on three different datasets namely: DBLP, Query log and a real dataset of IP addresses & Cookies by deploying Hadoop framework. The proposed algorithm has been compared with three known algorithms and it has been noticed that all these algorithms fail when data is highly skewed, whereas our proposed method handles highly skewed data without any problem. A set-up of the 15-node cluster has been used in this experiment, and we are following the Zipf distribution law for the analysis of skewness factor. Also, a comparison among existing and proposed techniques has been shown. Existing techniques survived till Zipf factor 0.5 whereas the proposed algorithm survives up to Zipf factor 1. Hence the proposed algorithm is skew insensitive and ensures scalability with a reasonable query processing time for string similarity database join. It also ensures the even distribution of attributes.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 11 October 2023

Bachriah Fatwa Dhini, Abba Suganda Girsang, Unggul Utan Sufandi and Heny Kurniawati

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes…

Abstract

Purpose

The authors constructed an automatic essay scoring (AES) model in a discussion forum where the result was compared with scores given by human evaluators. This research proposes essay scoring, which is conducted through two parameters, semantic and keyword similarities, using a SentenceTransformers pre-trained model that can construct the highest vector embedding. Combining these models is used to optimize the model with increasing accuracy.

Design/methodology/approach

The development of the model in the study is divided into seven stages: (1) data collection, (2) pre-processing data, (3) selected pre-trained SentenceTransformers model, (4) semantic similarity (sentence pair), (5) keyword similarity, (6) calculate final score and (7) evaluating model.

Findings

The multilingual paraphrase-multilingual-MiniLM-L12-v2 and distilbert-base-multilingual-cased-v1 models got the highest scores from comparisons of 11 pre-trained multilingual models of SentenceTransformers with Indonesian data (Dhini and Girsang, 2023). Both multilingual models were adopted in this study. A combination of two parameters is obtained by comparing the response of the keyword extraction responses with the rubric keywords. Based on the experimental results, proposing a combination can increase the evaluation results by 0.2.

Originality/value

This study uses discussion forum data from the general biology course in online learning at the open university for the 2020.2 and 2021.2 semesters. Forum discussion ratings are still manual. In this survey, the authors created a model that automatically calculates the value of discussion forums, which are essays based on the lecturer's answers moreover rubrics.

Details

Asian Association of Open Universities Journal, vol. 18 no. 3
Type: Research Article
ISSN: 1858-3431

Keywords

Open Access
Article
Publication date: 9 May 2024

Yanhao Sun, Tao Zhang, Shuxin Ding, Zhiming Yuan and Shengliang Yang

In order to solve the problem of inaccurate calculation of index weights, subjectivity and uncertainty of index assessment in the risk assessment process, this study aims to…

Abstract

Purpose

In order to solve the problem of inaccurate calculation of index weights, subjectivity and uncertainty of index assessment in the risk assessment process, this study aims to propose a scientific and reasonable centralized traffic control (CTC) system risk assessment method.

Design/methodology/approach

First, system-theoretic process analysis (STPA) is used to conduct risk analysis on the CTC system and constructs risk assessment indexes based on this analysis. Then, to enhance the accuracy of weight calculation, the fuzzy analytical hierarchy process (FAHP), fuzzy decision-making trial and evaluation laboratory (FDEMATEL) and entropy weight method are employed to calculate the subjective weight, relative weight and objective weight of each index. These three types of weights are combined using game theory to obtain the combined weight for each index. To reduce subjectivity and uncertainty in the assessment process, the backward cloud generator method is utilized to obtain the numerical character (NC) of the cloud model for each index. The NCs of the indexes are then weighted to derive the comprehensive cloud for risk assessment of the CTC system. This cloud model is used to obtain the CTC system's comprehensive risk assessment. The model's similarity measurement method gauges the likeness between the comprehensive risk assessment cloud and the risk standard cloud. Finally, this process yields the risk assessment results for the CTC system.

Findings

The cloud model can handle the subjectivity and fuzziness in the risk assessment process well. The cloud model-based risk assessment method was applied to the CTC system risk assessment of a railway group and achieved good results.

Originality/value

This study provides a cloud model-based method for risk assessment of CTC systems, which accurately calculates the weight of risk indexes and uses cloud models to reduce uncertainty and subjectivity in the assessment, achieving effective risk assessment of CTC systems. It can provide a reference and theoretical basis for risk management of the CTC system.

Details

Railway Sciences, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2755-0907

Keywords

Open Access
Article
Publication date: 20 July 2020

Abdelghani Bakhtouchi

With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds…

1878

Abstract

With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds of data. Unfortunately, existing data is not proper due to the existence of the same information in different sources, as well as erroneous and incomplete data. The aim of data integration systems is to offer to a user a unique interface to query a number of sources. A key challenge of such systems is to deal with conflicting information from the same source or from different sources. We present, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion. The reference reconciliation methods seek to decide if two data descriptions are references to the same entity in reality. We define the principles of reconciliation method then we distinguish the methods of reference reconciliation, first on how to use the descriptions of references, then the way to acquire knowledge. We finish this section by discussing some current data reconciliation issues that are the subject of current research. Data fusion in turn, has the objective to merge duplicates into a single representation while resolving conflicts between the data. We define first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies. We present then, the relational operators and data fusion techniques. Likewise, we finish this section by discussing some current data fusion issues that are the subject of current research.

Details

Applied Computing and Informatics, vol. 18 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 5 September 2023

Ali Akbar Izadi and Hamed Rasam

Efficient thermal management of central processing unit (CPU) cooling systems is vital in the context of advancing information technology and the demand for enhanced data…

Abstract

Purpose

Efficient thermal management of central processing unit (CPU) cooling systems is vital in the context of advancing information technology and the demand for enhanced data processing speeds. This study aims to explore the thermal performance of a CPU cooling setup using a cylindrical porous metal foam heat sink.

Design/methodology/approach

Nanofluid flow through the metal foam is simulated using the Darcy–Brinkman–Forschheimer equation, accounting for magnetic field effects. The temperature distribution is modeled through the local thermal equilibrium equation, considering viscous dissipation. The problem’s governing partial differential equations are solved using the similarity method. The CPU’s hot surface serves as a solid wall, with nanofluid entering the heat sink as an impinging jet. Verification of the numerical results involves comparison with existing research, demonstrating strong agreement across numerical, analytical and experimental findings. Ansys Fluent® software is used to assess temperature, velocity and streamlines, yielding satisfactory results from an engineering standpoint.

Findings

Investigating critical parameters such as Darcy number (10−4DaD ≤ 10−2), aspect ratio (0.5 ≤ H/D ≤ 1.5), Reynolds number (5 ≤ ReD,bf ≤ 3500), Eckert number (0 ≤ ECbf ≤ 0.1) , porosity (0.85 ≤ ε ≤ 0.95), Hartmann number (0 ≤ HaD,bf ≤ 300) and the volume fraction of nanofluid (0 ≤ φ ≤ 0.1) reveals their impact on fluid flow and heat sink performance. Notably, Nusselt number will reduce 45%, rise 19.2%, decrease 14.1%, and decrease 0.15% for Reynolds numbers of 600, with rising porosity from 0.85 to 0.95, Darcy numbers from 10−4 to 10−2, Eckert numbers from 0 to 0.1, and Hartman numbers from 0 to 300.

Originality/value

Despite notable progress in studying thermal management in CPU cooling systems using porous media and nanofluids, there are still significant gaps in the existing literature. First, few studies have considered the Darcy–Brinkman–Forchheimer equation, which accounts for non-Darcy effects and the flow and geometric interactions between coolant and porous medium. The influence of viscous dissipation on heat transfer in this specific geometry has also been largely overlooked. Additionally, while nanofluids and impinging jets have demonstrated potential in enhancing thermal performance, their utilization within porous media remains underexplored. Furthermore, the unique thermal and structural characteristics of porous media, along with the incorporation of a magnetic field, have not been fully investigated in this particular configuration. Consequently, this study aims to address these literature gaps and introduce novel advancements in analytical modeling, non-Darcy flow, viscous dissipation, nanofluid utilization, impinging jets, porous media characteristics and the impact of a magnetic field. These contributions hold promising prospects for improving CPU cooling system thermal management and have broader implications across various applications in the field.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 34 no. 1
Type: Research Article
ISSN: 0961-5539

Keywords

Open Access
Article
Publication date: 26 November 2020

Bernadette Bouchon-Meunier and Giulianella Coletti

The paper is dedicated to the analysis of fuzzy similarity measures in uncertainty analysis in general, and in economic decision-making in particular. The purpose of this paper is…

1330

Abstract

Purpose

The paper is dedicated to the analysis of fuzzy similarity measures in uncertainty analysis in general, and in economic decision-making in particular. The purpose of this paper is to explain how a similarity measure can be chosen to quantify a qualitative description of similarities provided by experts of a given domain, in the case where the objects to compare are described through imprecise or linguistic attribute values represented by fuzzy sets. The case of qualitative dissimilarities is also addressed and the particular case of their representation by distances is presented.

Design/methodology/approach

The approach is based on measurement theory, following Tversky’s well-known paradigm.

Findings

A list of axioms which may or may not be satisfied by a qualitative comparative similarity between fuzzy objects is proposed, as extensions of axioms satisfied by similarities between crisp objects. They enable to express necessary and sufficient conditions for a numerical similarity measure to represent a comparative similarity between fuzzy objects. The representation of comparative dissimilarities is also addressed by means of specific functions depending on the distance between attribute values.

Originality/value

Examples of functions satisfying certain axioms to represent comparative similarities are given. They are based on the choice of operators to compute intersection, union and difference of fuzzy sets. A simple application of this methodology to economy is given, to show how a measure of similarity can be chosen to represent intuitive similarities expressed by an economist by means of a quantitative measure easily calculable. More detailed and formal results are given in Coletti and Bouchon-Meunier (2020) for similarities and Coletti et al. (2020) for dissimilarities.

Details

Asian Journal of Economics and Banking, vol. 4 no. 3
Type: Research Article
ISSN: 2615-9821

Keywords

Open Access
Article
Publication date: 9 January 2024

Kazuyuki Motohashi and Chen Zhu

This study aims to assess the technological capability of Chinese internet platforms (BAT: Baidu, Alibaba, Tencent) compared to US ones (GAFA: Google, Amazon, Facebook, Apple)…

Abstract

Purpose

This study aims to assess the technological capability of Chinese internet platforms (BAT: Baidu, Alibaba, Tencent) compared to US ones (GAFA: Google, Amazon, Facebook, Apple). More specifically, this study explores Baidu’s technological catching-up process with Google by analyzing their patent textual information.

Design/methodology/approach

The authors retrieved 26,383 Google patents and 6,695 Baidu patents from PATSTAT 2019 Spring version. The collected patent documents were vectorized using the Word2Vec model first, and then K-means clustering was applied to visualize the technological space of two firms. Finally, novel indicators were proposed to capture the technological catching-up process between Baidu and Google.

Findings

The results show that Baidu follows a trend of US rather than Chinese technology which suggests Baidu is aggressively seeking to catch up with US players in the process of its technological development. At the same time, the impact index of Baidu patents increases over time, reflecting its upgrading of technological competitiveness.

Originality/value

This study proposed a new method to analyze technology mapping and evolution based on patent text information. As both US and China are crucial players in the internet industry, it is vital for policymakers in third countries to understand the technological capacity and competitiveness of both countries to develop strategic partnerships effectively.

Details

Asia Pacific Journal of Innovation and Entrepreneurship, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2071-1395

Keywords

Open Access
Article
Publication date: 22 October 2019

Li Xuemei, Yun Cao, Junjie Wang, Yaoguo Dang and Yin Kedong

Research on grey systems is becoming more sophisticated, and grey relational and prediction analyses are receiving close review worldwide. Particularly, the application of grey…

3220

Abstract

Purpose

Research on grey systems is becoming more sophisticated, and grey relational and prediction analyses are receiving close review worldwide. Particularly, the application of grey systems in marine economics is gaining importance. The purpose of this paper is to summarize and review literature on grey models, providing new directions in their application in the marine economy.

Design/methodology/approach

This paper organized seminal studies on grey systems published by Chinese core journal database – CNKI, Web of Science and Elsevier from 1982 to 2018. After searching the aforementioned database for the said duration, the authors used the CiteSpace visualization tools to analyze them.

Findings

The authors sorted the studies according to their countries/regions, institutions, keywords and categories using the CiteSpace tool; analyzed current research characteristics on grey models; and discussed their possible applications in marine businesses, economy, scientific research and education, marine environment and disasters. Finally, the authors pointed out the development trend of grey models.

Originality/value

Although researches are combining grey theory with fractals, neural networks, fuzzy theory and other methods, the applications, in terms of scope, have still not met the demand. With the increasingly in-depth research in marine economics and management, international marine economic research has entered a new period of development. Grey theory will certainly attract scholars’ attention, and its role in marine economy and management will gain considerable significance.

Details

Marine Economics and Management, vol. 2 no. 2
Type: Research Article
ISSN: 2516-158X

Keywords

1 – 10 of over 2000