Search results
1 – 9 of 9Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal
Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…
Abstract
Purpose
Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.
Design/methodology/approach
In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.
Findings
The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.
Originality/value
To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.
Details
Keywords
Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of…
Abstract
Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.
Details
Keywords
Using the methodologies of text mining, this paper examines the implications of US and Chinese policies on bilateral trade. Official speeches by political leaders of the U.S. and…
Abstract
Using the methodologies of text mining, this paper examines the implications of US and Chinese policies on bilateral trade. Official speeches by political leaders of the U.S. and China on the issues of trade were collected and analytically examined for US-China gaps in major foreign policies, such as bilateral trade and the Belt and Road Initiative. In this paper, a term frequency-inverse document frequency word cloud, a network similarities index, machine learning-processed latent Dirichlet allocation (LDA), and structural equivalence are applied to examine the meanings of the speeches. The main arguments in this paper are as follows. First, the document similarity between the speeches of Chinese and US leaders appears to be completely different. Also, while the documents from Chinese leaders are considerably similar, the documents from US leaders differ by far. Secondly, LDA topic analysis indicates that China concentrates more on international and collaborative relationships, while the U.S. has more focus on domestic and economic interests. Third, from a word hierarchy analysis, the basic words used by American and Chinese leaders are also completely different. Agriculture, farmers, automobiles, and negotiations are the basic words for American leaders, but for Chinese leaders, the basic words are planning, markets, and education.
Details
Keywords
Taro Aso, Toshiyuki Amagasa and Hiroyuki Kitagawa
The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of…
Abstract
Purpose
The purpose of this paper is to propose a scheme that allows users to interactively explore relations between entities in knowledge bases (KBs). KBs store a wide range of knowledge about real-world entities in a structured form as (subject, predicate, object). Although it is possible to query entities and relations among entities by specifying appropriate query expressions of SPARQL or keyword queries, the structure and the vocabulary are complicated, and it is hard for non-expert users to get the desired information. For this reason, many researchers have proposed faceted search interfaces for KBs. Nevertheless, existing ones are designed for finding entities and are insufficient for finding relations.
Design/methodology/approach
To this problem, the authors propose a novel “relation facet” to find relations between entities. To generate it, they applied clustering on predicates for grouping those predicates that are connected to common objects. Having generated clusters of predicates, the authors generated a facet according to the result. Specifically, they proposed to use a couple of clustering algorithms, namely, agglomerative hierarchical clustering (AHC) and CANDECOMP/PARAFAC (CP) tensor decomposition which is one of the tensor decomposition methods.
Findings
The authors experimentally show test the performance of clustering methods and found that AHC performs better than tensor decomposition. Besides, the authors conducted a user study and show that their proposed scheme performs better than existing ones in the task of searching relations.
Originality/value
The authors propose a relation-oriented faceted search method for KBs that allows users to explore relations between entities. As far as the authors know, this is the first method to focus on the exploration of relations between entities.
Details
Keywords
This paper aims to introduce a crowd-based method for theorizing. The purpose is not to achieve a scientific theory. On the contrary, the purpose is to achieve a model that may…
Abstract
Purpose
This paper aims to introduce a crowd-based method for theorizing. The purpose is not to achieve a scientific theory. On the contrary, the purpose is to achieve a model that may challenge current scientific theories or lead research in new phenomena.
Design/methodology/approach
This paper describes a case study of theorizing by using a crowd-based method. The first section of the paper introduces what do the authors know about crowdsourcing, crowd science and the aggregation of non-expert views. The second section details the case study. The third section analyses the aggregation. Finally, the fourth section elaborates the conclusions, limitations and future research.
Findings
This document answers to what extent the crowd-based method produces similar results to theories tested and published by experts.
Research limitations/implications
From a theoretical perspective, this study provides evidence to support the research agenda associated with crowd science. The main limitation of this study is that the crowded research models and the expert research models are compared in terms of the graph. Nevertheless, some academics may argue that theory building is about an academic heritage.
Practical implications
This paper exemplifies how to obtain an expert-level research model by aggregating the views of non-experts.
Social implications
This study is particularly important for institutions with limited access to costly databases, labs and researchers.
Originality/value
Previous research suggested that a collective of individuals may help to conduct all the stages of a research endeavour. Nevertheless, a formal method for theorizing based on the aggregation of non-expert views does not exist. This paper provides the method and evidence of its practical implications.
Details
Keywords
Kai Hänninen, Jouni Juntunen and Harri Haapasalo
The purpose of this study is to describe latent classes explaining the innovation logic in the Finnish construction companies. Innovativeness is a driver of competitive…
Abstract
Purpose
The purpose of this study is to describe latent classes explaining the innovation logic in the Finnish construction companies. Innovativeness is a driver of competitive performance and vital to the long-term success of any organisation and company.
Design/methodology/approach
Using finite mixture structural equation modelling (FMSEM), the authors have classified innovation logic into latent classes. The method analyses and recognises classes for companies that have similar logic in innovation activities based on the collected data.
Findings
Through FMSEM analysis, the authors have identified three latent classes that explain the innovation logic in the Finnish construction companies – LC1: the internal innovators; LC2: the non-innovation-oriented introverts; and LC3: the innovation-oriented extroverts. These three latent classes clearly capture the perceptions within the industry as well as the different characteristics and variables.
Research limitations/implications
The presented latent classes explain innovation logic but is limited to analysing Finnish companies. Also, the research is quantitative by nature and does not increase the understanding in the same manner as qualitative research might capture on more specific aspects.
Practical implications
This paper presents starting points for construction industry companies to intensify innovation activities. It may also indicate more fundamental changes for the structure of construction industry organisations, especially by enabling innovation friendly culture.
Originality/value
This study describes innovation logic in Finnish construction companies through three models (LC1–LC3) by using quantitative data analysed with the FMSEM method. The fundamental innovation challenges in the Finnish construction companies are clarified via the identified latent classes.
Details
Keywords
Donata Tania Vergura, Cristina Zerbini, Beatrice Luceri and Rosa Palladino
The research carried out a bibliometric analysis of the literature on environmental sustainability from a demand perspective by analyzing the scientific contributions published in…
Abstract
Purpose
The research carried out a bibliometric analysis of the literature on environmental sustainability from a demand perspective by analyzing the scientific contributions published in the last twenty years.
Design/methodology/approach
A bibliometric analysis was carried out to outline the scientific studies development, identifying the most discussed topics and those that would require future research. In total, 274 articles published between 1999 and 2021 were collected through the Web of Science database and analyzed with the SciMAT software.
Findings
By systematizing the literature results, the study revealed a steady growth in the number of publications and in the research areas, highlighting a substantial evolution of the research topic.
Research limitations/implications
The study contribute for conceptual, methodological and thematic development of the topic, systematizing the results of existing studies and providing useful indications for the promotion of sustainable consumer habits.
Originality/value
The study attempts to bridge the gap in current literature by offering a holistic view on the role of consumer behavior in pursuing sustainability goals, identifying both the most treated areas and the emerging ones that can represent opportunities for future research.
Details
Keywords
Bufei Xing, Haonan Yin, Zhijun Yan and Jiachen Wang
The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and…
Abstract
Purpose
The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and sharing.
Design/methodology/approach
This paper proposes a hybrid approach to combining domain knowledge similarity and topic similarity to retrieve similar questions in online health communities. The domain knowledge similarity can evaluate the domain distance between different questions. And the topic similarity measures questions’ relationship base on the extracted latent topics.
Findings
The experiment results show that the proposed method outperforms the baseline methods.
Originality/value
This method conquers the problem of word mismatch and considers the named entities included in questions, which most of existing studies did not.
Details
Keywords
Daniel Camuñas-García, María Pilar Cáceres-Reche and María de la Encarnación Cambil-Hernández
The purpose of this study was to analyze the state of mobile game-based learning in the field of cultural heritage education.
Abstract
Purpose
The purpose of this study was to analyze the state of mobile game-based learning in the field of cultural heritage education.
Design/methodology/approach
A bibliometric methodology based on scientific mapping and an analysis of co-words was used. The scientific production on this field of study indexed in Scopus was analyzed. The analysis included a total of 725 publications.
Findings
The results show that the National Research Council of Italy is the institution with the highest volume of production. Among the journals, the Journal on Computing and Cultural Heritage stands out. In addition, in the analysis of the structural and thematic development of co-words, a low percentage of keyword matching was observed. The research is currently mainly oriented to pedagogical methods, especially game-based learning, gamification and the use of serious games, although these are not the only trends in this field. Research is also focusing on virtual reality, augmented reality, and mixed reality.
Originality/value
This work is an exploratory and novel study that analyzes the publications to date on mobile game-based learning in cultural heritage education. In theoretical terms, this can serve as support so that other researchers interested in this field can access the information highlighted in this work. From a practical perspective, this work will contribute to the promotion of new innovative actions in cultural heritage education to satisfy the demands of a learning group increasingly familiar with games technology.
Details