Search results

1 – 10 of over 2000
Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1210

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 14 August 2017

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

2049

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2
Type: Research Article
ISSN: 2399-1747

Keywords

Content available
Article
Publication date: 1 August 2004

C.J.H. Mann

308

Abstract

Details

Kybernetes, vol. 33 no. 7
Type: Research Article
ISSN: 0368-492X

Content available
Article
Publication date: 11 December 2019

Lara Agostini, Anna Nosella, Riikka M. Sarala, J.C. Spender and Douglas Wegner

617

Abstract

Details

Journal of Knowledge Management, vol. 23 no. 10
Type: Research Article
ISSN: 1367-3270

Content available
Article
Publication date: 2 November 2021

Oksana Zavalina, Xiaoguang Wang and Qikai Cheng

312

Abstract

Details

The Electronic Library , vol. 39 no. 3
Type: Research Article
ISSN: 0264-0473

Content available
Article
Publication date: 1 September 2001

Robert Raeside and John Walker

172

Abstract

Details

Measuring Business Excellence, vol. 5 no. 3
Type: Research Article
ISSN: 1368-3047

Content available
Article
Publication date: 2 November 2015

Heidi Hanson and Zoe Stewart-Marshall

141

Abstract

Details

Library Hi Tech News, vol. 32 no. 9
Type: Research Article
ISSN: 0741-9058

Content available
Article
Publication date: 3 August 2015

Heidi Hanson and Zoe Stewart-Marshall

132

Abstract

Details

Library Hi Tech News, vol. 32 no. 6
Type: Research Article
ISSN: 0741-9058

Content available
Article
Publication date: 7 September 2015

Heidi Hanson and Zoe Stewart-Marshall

96

Abstract

Details

Library Hi Tech News, vol. 32 no. 7
Type: Research Article
ISSN: 0741-9058

Content available
Article
Publication date: 5 October 2015

Heidi Hanson and Zoe Stewart-Marshall

120

Abstract

Details

Library Hi Tech News, vol. 32 no. 8
Type: Research Article
ISSN: 0741-9058

1 – 10 of over 2000