Search results

1 – 10 of 73
Open Access
Article
Publication date: 5 April 2024

Miquel Centelles and Núria Ferran-Ferrer

Develop a comprehensive framework for assessing the knowledge organization systems (KOSs), including the taxonomy of Wikipedia and the ontologies of Wikidata, with a specific…

Abstract

Purpose

Develop a comprehensive framework for assessing the knowledge organization systems (KOSs), including the taxonomy of Wikipedia and the ontologies of Wikidata, with a specific focus on enhancing management and retrieval with a gender nonbinary perspective.

Design/methodology/approach

This study employs heuristic and inspection methods to assess Wikipedia’s KOS, ensuring compliance with international standards. It evaluates the efficiency of retrieving non-masculine gender-related articles using the Catalan Wikipedian category scheme, identifying limitations. Additionally, a novel assessment of Wikidata ontologies examines their structure and coverage of gender-related properties, comparing them to Wikipedia’s taxonomy for advantages and enhancements.

Findings

This study evaluates Wikipedia’s taxonomy and Wikidata’s ontologies, establishing evaluation criteria for gender-based categorization and exploring their structural effectiveness. The evaluation process suggests that Wikidata ontologies may offer a viable solution to address Wikipedia’s categorization challenges.

Originality/value

The assessment of Wikipedia categories (taxonomy) based on KOS standards leads to the conclusion that there is ample room for improvement, not only in matters concerning gender identity but also in the overall KOS to enhance search and retrieval for users. These findings bear relevance for the design of tools to support information retrieval on knowledge-rich websites, as they assist users in exploring topics and concepts.

Article
Publication date: 19 November 2018

Moritz Schubotz, Philipp Scharpf, Kaushal Dudhat, Yash Nagar, Felix Hamborg and Bela Gipp

This paper aims to present an open source math-aware Question Answering System based on Ask Platypus.

Abstract

Purpose

This paper aims to present an open source math-aware Question Answering System based on Ask Platypus.

Design/methodology/approach

The system returns as a single mathematical formula for a natural language question in English or Hindi. These formulae originate from the knowledge-based Wikidata. The authors translate these formulae to computable data by integrating the calculation engine sympy into the system. This way, users can enter numeric values for the variables occurring in the formula. Moreover, the system loads numeric values for constants occurring in the formula from Wikidata.

Findings

In a user study, this system outperformed a commercial computational mathematical knowledge engine by 13 per cent. However, the performance of this system heavily depends on the size and quality of the formula data available in Wikidata. As only a few items in Wikidata contained formulae when the project started, the authors facilitated the import process by suggesting formula edits to Wikidata editors. With the simple heuristic that the first formula is significant for the paper, 80 per cent of the suggestions were correct.

Originality/value

This research was presented at the JCDL17 KDD workshop.

Details

Information Discovery and Delivery, vol. 46 no. 4
Type: Research Article
ISSN: 2398-6247

Keywords

Article
Publication date: 20 August 2019

Marçal Mora-Cantallops, Salvador Sánchez-Alonso and Elena García-Barriocanal

The purpose of this paper is to review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide…

1297

Abstract

Purpose

The purpose of this paper is to review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work.

Design/methodology/approach

A systematic literature review is conducted to identify and review how Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous and systematic process is implemented, aiming not only to summarize existing studies and research on the topic, but also to include an element of analytical criticism and a perspective on gaps and future research.

Findings

Despite Wikidata’s potential and the notable rise in research activity, the field is still in the early stages of study. Most research is published in conferences, highlighting such immaturity, and provides little empirical evidence of real use cases. Only a few disciplines currently benefit from Wikidata’s applications and do so with a significant gap between research and practice. Studies are dominated by European researchers, mirroring Wikidata’s content distribution and limiting its Worldwide applications.

Originality/value

The results collect and summarize existing Wikidata research articles published in the major international journals and conferences, delivering a meticulous summary of all the available empirical research on the topic which is representative of the state of the art at this time, complemented by a discussion of identified gaps and future work.

Details

Data Technologies and Applications, vol. 53 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 3 January 2020

Timothy Kanke

The purpose of this paper is to investigate how editors participate in Wikidata and how they organize their work.

Abstract

Purpose

The purpose of this paper is to investigate how editors participate in Wikidata and how they organize their work.

Design/methodology/approach

This qualitative study used content analysis of discussions involving data curation and negotiation in Wikidata. Activity theory was used as a conceptual framework for data collection and analysis.

Findings

The analysis identified six activities: conceptualizing the curation process, appraising objects, ingesting objects from external sources, creating collaborative infrastructure, re-organizing collaborative infrastructure and welcoming newcomers. Many of the norms and rules that were identified help regulate the activities in Wikidata.

Research limitations/implications

This study mapped Wikidata activities to curation and ontology frameworks. Results from this study provided implications for academic studies on online peer-curation work.

Practical implications

An understanding of the activities in Wikidata will help inform communities wishing to contribute data to or reuse data from Wikidata, as well as inform the design of other similar online peer-curation communities, scientific research institutional repositories, digital archives and libraries.

Originality/value

Wikidata is one of the largest knowledge curation projects on the web. The data from this project are used by other Wikimedia projects such as Wikipedia, as well as major search engines. This study explores an aspect of Wikidata WikiProject editors to the author’s knowledge has yet to be researched.

Details

Library Hi Tech, vol. 39 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 July 2022

Maayan Zhitomirsky-Geffet, Inna Kizhner and Sara Minster

Large cultural heritage datasets from museum collections tend to be biased and demonstrate omissions that result from a series of decisions at various stages of the collection…

Abstract

Purpose

Large cultural heritage datasets from museum collections tend to be biased and demonstrate omissions that result from a series of decisions at various stages of the collection construction. The purpose of this study is to apply a set of ethical criteria to compare the level of bias of six online databases produced by two major art museums, identifying the most biased and the least biased databases.

Design/methodology/approach

At the first stage, the relevant data have been automatically extracted from all six databases and mapped to a unified ontological scheme based on Wikidata. Then, the authors applied ethical criteria to the results of the geographical distribution of records provided by two major art museums as online databases accessed via museums' websites, API datasets and datasets submitted to Wikidata.

Findings

The authors show that the museums use different artworks in each of its online databases and each data-base has different types of bias reflected by the study variables, such as artworks' country of origin or the creator's nationality. For most variables, the database behind the online search system on the museum's website is more balanced and ethical than the API dataset and Wikidata databases of the two museums.

Originality/value

By applying ethical criteria to the analysis of cultural bias in various museum databases aimed at different audiences including end users, researchers and commercial institutions, this paper shows the importance of explicating bias and maintaining integrity in cultural heritage representation through different channels that potentially have high impact on how culture is perceived, disseminated, contextualized and transformed.

Details

Journal of Documentation, vol. 79 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 15 March 2019

Benedikt Simon Hitz-Gamper, Oliver Neumann and Matthias Stürmer

Linked data is a technical standard to structure complex information and relate independent sets of data. Recently, governments have started to use this technology for bridging…

Abstract

Purpose

Linked data is a technical standard to structure complex information and relate independent sets of data. Recently, governments have started to use this technology for bridging separated data “(silos)” by launching linked open government data (LOGD) portals. The purpose of this paper is to explore the role of LOGD as a smart technology and strategy to create public value. This is achieved by enhancing the usability and visibility of open data provided by public organizations.

Design/methodology/approach

In this study, three different LOGD governance modes are deduced: public agencies could release linked data via a dedicated triple store, via a shared triple store or via an open knowledge base. Each of these modes has different effects on usability and visibility of open data. Selected case studies illustrate the actual use of these three governance modes.

Findings

According to this study, LOGD governance modes present a trade-off between retaining control over governmental data and potentially gaining public value by the increased use of open data by citizens.

Originality/value

This study provides recommendations for public sector organizations for the development of their data publishing strategy to balance control, usability and visibility considering also the growing popularity of open knowledge bases such as Wikidata.

Details

International Journal of Public Sector Management, vol. 32 no. 5
Type: Research Article
ISSN: 0951-3558

Keywords

Article
Publication date: 3 October 2023

Haklae Kim

Despite ongoing research into archival metadata standards, digital archives are unable to effectively represent records in their appropriate contexts. This study aims to propose a…

Abstract

Purpose

Despite ongoing research into archival metadata standards, digital archives are unable to effectively represent records in their appropriate contexts. This study aims to propose a knowledge graph that depicts the diverse relationships between heterogeneous digital archive entities.

Design/methodology/approach

This study introduces and describes a method for applying knowledge graphs to digital archives in a step-by-step manner. It examines archival metadata standards, such as Records in Context Ontology (RiC-O), for characterising digital records; explains the process of data refinement, enrichment and reconciliation with examples; and demonstrates the use of knowledge graphs constructed using semantic queries.

Findings

This study introduced the 97imf.kr archive as a knowledge graph, enabling meaningful exploration of relationships within the archive’s records. This approach facilitated comprehensive record descriptions about different record entities. Applying archival ontologies with general-purpose vocabularies to digital records was advised to enhance metadata coherence and semantic search.

Originality/value

Most digital archives serviced in Korea are limited in the proper use of archival metadata standards. The contribution of this study is to propose a practical application of knowledge graph technology for linking and exploring digital records. This study details the process of collecting raw data on archives, data preprocessing and data enrichment, and demonstrates how to build a knowledge graph connected to external data. In particular, the knowledge graph of RiC-O vocabulary, Wikidata and Schema.org vocabulary and the semantic query using it can be applied to supplement keyword search in conventional digital archives.

Details

The Electronic Library , vol. 42 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 26 October 2020

Trilce Navarrete and Elena Villaespesa

This study aimed at understanding the use of paintings outside of an art-related context, in the English version of Wikipedia.

Abstract

Purpose

This study aimed at understanding the use of paintings outside of an art-related context, in the English version of Wikipedia.

Design/methodology/approach

For this investigation, the authors identified 8,104 paintings used in 10,008 articles of the English Wikipedia edition. The authors manually coded the topic of the article in question, documented the number of monthly average views and identified the originating museum. They analysed the use of images based on frequency of use, frequency of view, associated topics and location. Early in the analysis three distinct perspectives emerged: the readers of the online encyclopaedia, the editors of the articles and the museum organisations providing the painting images (directly or indirectly).

Findings

Wikipedia is a widely used online information resource where images of paintings serve as visual reference to illustrate articles, notably also beyond an art-related topic and where no alternative image is available – as in the case of historic portraits. Editors used paintings as illustration of the work itself or art-related movement, but also as illustration of past events, as alternative to photographs, as well as to represent a concept or technique. Images have been used to illustrate up to 76 articles, evidencing the polysemic nature of paintings. The authors conclude that images of paintings are highly valuable information sources, also beyond an art-related context. They also find that Wikipedia is an important dissemination channel for museum collections. While art-related articles contain greater number of paintings, these receive less views than non-art-related articles containing fewer paintings. Readers of all topics, predominantly history, science and geographic articles, viewed art pieces outside of an art context. Painting images in Wikipedia receive a much larger online audience than the physical painting does when compared to the number of museum onsite visitors. The authors’ results confirm the presence of a strong long-tail pattern in the frequency of image use (only 3% of painting images are used in a Wikipedia article), image view and museums represented, characteristic of network dynamics of the Internet.

Research limitations/implications

While this is the first analysis of the complete collection of paintings in the English Wikipedia, the authors’ results are conservative as many paintings are not identified as such in Wikidata, used for automatic harvesting. Tools to analyse image view specifically are not yet available and user privacy is highly protected, limiting the disaggregation of user data. This study serves to document a lack of diversity in image availability for global online consumption, favouring well-known Western objects. At the same time, the study evidences the need to diversify the use of images to reflect a more global perspective, particularly where paintings are used to represent concepts of techniques.

Practical implications

Museums wanting to increase visibility can target the reuse of their collections in non-art-related articles, which received 88% of all views in the authors’ sample. Given the few museums collaborating with the Wikimedia Foundation and the apparent inefficiency resulting from leaving the use of paintings as illustration to the crowd, as only 3% of painting images are used, suggests further collaborative efforts to reposition museum content may be beneficial.

Social implications

This paper highlights the reach of Wikipedia as information source, where museum content can be positioned to reach a greater user group beyond the usual museum visitor, in turn increasing visual and digital literacy.

Originality/value

This is the first study that documents the frequency of use and views, the topical use and the originating institution of “all the paintings” in the English Wikipedia edition.

Details

Journal of Documentation, vol. 77 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1204

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 14 May 2018

Anne Chardonnens, Ettore Rizza, Mathias Coeckelbergs and Seth van Hooland

Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is…

Abstract

Purpose

Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is problematic. The purpose of this paper is to address the problem of named entity recognition in digital library user queries.

Design/methodology/approach

The paper presents a large-scale case study conducted at the Royal Library of Belgium in its online historical newspapers platform BelgicaPress. The object of the study is a data set of 83,854 queries resulting from 29,812 visits over a 12-month period. By making use of information extraction methods, knowledge bases (KBs) and various authority files, this paper presents the possibilities and limits to identify what percentage of end users are looking for person and place names.

Findings

Based on a quantitative assessment, the method can successfully identify the majority of person and place names from user queries. Due to the specific character of user queries and the nature of the KBs used, a limited amount of queries remained too ambiguous to be treated in an automated manner.

Originality/value

This paper demonstrates in an empirical manner how user queries can be extracted from a web analytics tool and how named entities can then be mapped with KBs and authority files, in order to facilitate automated analysis of their content. Methods and tools used are generalisable and can be reused by other collection holders.

Details

Journal of Documentation, vol. 74 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of 73