Search results

1 – 10 of 259
Article
Publication date: 18 August 2022

Henrik Dibowski

The curation of ontologies and knowledge graphs (KGs) is an essential task for industrial knowledge-based applications, as they rely on the contained knowledge to be correct and…

Abstract

Purpose

The curation of ontologies and knowledge graphs (KGs) is an essential task for industrial knowledge-based applications, as they rely on the contained knowledge to be correct and error-free. Often, a significant amount of a KG is curated by humans. Established validation methods, such as Shapes Constraint Language, Shape Expressions or Web Ontology Language, can detect wrong statements only after their materialization, which can be too late. Instead, an approach that avoids errors and adequately supports users is required.

Design/methodology/approach

For solving that problem, Property Assertion Constraints (PACs) have been developed. PACs extend the range definition of a property with additional logic expressed with SPARQL. For the context of a given instance and property, a tailored PAC query is dynamically built and triggered on the KG. It can determine all values that will result in valid property value assertions.

Findings

PACs can avoid the expansion of KGs with invalid property value assertions effectively, as their contained expertise narrows down the valid options a user can choose from. This simplifies the knowledge curation and, most notably, relieves users or machines from knowing and applying this expertise, but instead enables a computer to take care of it.

Originality/value

PACs are fundamentally different from existing approaches. Instead of detecting erroneous materialized facts, they can determine all semantically correct assertions before materializing them. This avoids invalid property value assertions and provides users an informed, purposeful assistance. To the author's knowledge, PACs are the only such approach.

Details

Data Technologies and Applications, vol. 57 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 25 October 2022

Samir Sellami and Nacer Eddine Zarour

Massive amounts of data, manifesting in various forms, are being produced on the Web every minute and becoming the new standard. Exploring these information sources distributed in…

Abstract

Purpose

Massive amounts of data, manifesting in various forms, are being produced on the Web every minute and becoming the new standard. Exploring these information sources distributed in different Web segments in a unified way is becoming a core task for a variety of users’ and companies’ scenarios. However, knowledge creation and exploration from distributed Web data sources is a challenging task. Several data integration conflicts need to be resolved and the knowledge needs to be visualized in an intuitive manner. The purpose of this paper is to extend the authors’ previous integration works to address semantic knowledge exploration of enterprise data combined with heterogeneous social and linked Web data sources.

Design/methodology/approach

The authors synthesize information in the form of a knowledge graph to resolve interoperability conflicts at integration time. They begin by describing KGMap, a mapping model for leveraging knowledge graphs to bridge heterogeneous relational, social and linked web data sources. The mapping model relies on semantic similarity measures to connect the knowledge graph schema with the sources' metadata elements. Then, based on KGMap, this paper proposes KeyFSI, a keyword-based semantic search engine. KeyFSI provides a responsive faceted navigating Web user interface designed to facilitate the exploration and visualization of embedded data behind the knowledge graph. The authors implemented their approach for a business enterprise data exploration scenario where inputs are retrieved on the fly from a local customer relationship management database combined with the DBpedia endpoint and the Facebook Web application programming interface (API).

Findings

The authors conducted an empirical study to test the effectiveness of their approach using different similarity measures. The observed results showed better efficiency when using a semantic similarity measure. In addition, a usability evaluation was conducted to compare KeyFSI features with recent knowledge exploration systems. The obtained results demonstrate the added value and usability of the contributed approach.

Originality/value

Most state-of-the-art interfaces allow users to browse one Web segment at a time. The originality of this paper lies in proposing a cost-effective virtual on-demand knowledge creation approach, a method that enables organizations to explore valuable knowledge across multiple Web segments simultaneously. In addition, the responsive components implemented in KeyFSI allow the interface to adequately handle the uncertainty imposed by the nature of Web information, thereby providing a better user experience.

Details

International Journal of Web Information Systems, vol. 18 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Book part
Publication date: 17 May 2018

Richard Marciano, Victoria Lemieux, Mark Hedges, Maria Esteva, William Underwood, Michael Kurtz and Mark Conrad

Purpose – For decades, archivists have been appraising, preserving, and providing access to digital records by using archival theories and methods developed for paper records…

Abstract

Purpose – For decades, archivists have been appraising, preserving, and providing access to digital records by using archival theories and methods developed for paper records. However, production and consumption of digital records are informed by social and industrial trends and by computer and data methods that show little or no connection to archival methods. The purpose of this chapter is to reexamine the theories and methods that dominate records practices. The authors believe that this situation calls for a formal articulation of a new transdiscipline, which they call computational archival science (CAS).

Design/Methodology/Approach – After making a case for CAS, the authors present motivating case studies: (1) evolutionary prototyping and computational linguistics; (2) graph analytics, digital humanities, and archival representation; (3) computational finding aids; (4) digital curation; (5) public engagement with (archival) content; (6) authenticity; (7) confluences between archival theory and computational methods: cyberinfrastructure and the records continuum; and (8) spatial and temporal analytics.

Findings – Each case study includes suggestions for incorporating CAS into Master of Library Science (MLS) education in order to better address the needs of today’s MLS graduates looking to employ “traditional” archival principles in conjunction with computational methods. A CAS agenda will require transdisciplinary iSchools and extensive hands-on experience working with cyberinfrastructure to implement archival functions.

Originality/Value – We expect that archival practice will benefit from the development of new tools and techniques that support records and archives professionals in managing and preserving records at scale and that, conversely, computational science will benefit from the consideration and application of archival principles.

Details

Re-envisioning the MLS: Perspectives on the Future of Library and Information Science Education
Type: Book
ISBN: 978-1-78754-884-8

Keywords

Article
Publication date: 20 March 2018

Sihua Hu, Kaitlin T. Torphy, Amanda Opperman, Kimberly Jansen and Yun-Jia Lo

The purpose of this paper is to examine early career teachers’ Socialized Knowledge Communities (SKCs) as they relate to the pursuit of mathematics knowledge and teaching. The…

Abstract

Purpose

The purpose of this paper is to examine early career teachers’ Socialized Knowledge Communities (SKCs) as they relate to the pursuit of mathematics knowledge and teaching. The authors investigate Pinterest, a living data archive, as an opportunity to view teachers’ sense-making and construction of instructional resources. Through this lens, the authors examine how teachers form and share mathematical meaning individually and collectively through professional collaboration.

Design/methodology/approach

This work characterizes teachers’ curation of mathematical resources both in the kinds of mathematics teachers are choosing and the quality therein. Finally, the authors examine through epistemic network analysis how teachers are sense-making through a statistical approach to identifying their organization of mathematics curation by typology and cognitive process demand.

Findings

Results show that sampled teachers predominantly curate instructional resources that require students to perform standard algorithm and represent mathematics relationships in visualization within Pinterest. Additionally, the authors find the resources curated by teachers have lower cognitive demand. Finally, epistemic networks show teachers make connections among instructional resources with particular types as well as with different levels of cognitive demand as they sense-make their curated curriculum. In particular, difference in teachers’ internal consideration of the quality of tasks is associated with their years of experience.

Originality/value

Twenty-first century classrooms and teachers engage frequently in curation of instructional resources online. The work contributes to an emergent understanding of teachers’ professional engagement in virtual spaces by characterizing the instructional resources being accessed, shared, and diffused. Understanding the nature of the content permeating teachers’ SKCs is essential to increase teachers’ professional capital in the digital age.

Details

Journal of Professional Capital and Community, vol. 3 no. 2
Type: Research Article
ISSN: 2056-9548

Keywords

Article
Publication date: 3 January 2020

Timothy Kanke

The purpose of this paper is to investigate how editors participate in Wikidata and how they organize their work.

Abstract

Purpose

The purpose of this paper is to investigate how editors participate in Wikidata and how they organize their work.

Design/methodology/approach

This qualitative study used content analysis of discussions involving data curation and negotiation in Wikidata. Activity theory was used as a conceptual framework for data collection and analysis.

Findings

The analysis identified six activities: conceptualizing the curation process, appraising objects, ingesting objects from external sources, creating collaborative infrastructure, re-organizing collaborative infrastructure and welcoming newcomers. Many of the norms and rules that were identified help regulate the activities in Wikidata.

Research limitations/implications

This study mapped Wikidata activities to curation and ontology frameworks. Results from this study provided implications for academic studies on online peer-curation work.

Practical implications

An understanding of the activities in Wikidata will help inform communities wishing to contribute data to or reuse data from Wikidata, as well as inform the design of other similar online peer-curation communities, scientific research institutional repositories, digital archives and libraries.

Originality/value

Wikidata is one of the largest knowledge curation projects on the web. The data from this project are used by other Wikimedia projects such as Wikipedia, as well as major search engines. This study explores an aspect of Wikidata WikiProject editors to the author’s knowledge has yet to be researched.

Details

Library Hi Tech, vol. 39 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 10 February 2012

Jake Carlson

As libraries become more involved in curating research data, reference librarians will need to be trained in conducting data interviews with researchers to better understand their…

3845

Abstract

Purpose

As libraries become more involved in curating research data, reference librarians will need to be trained in conducting data interviews with researchers to better understand their data and associated needs. This article seeks to identify and provide definitions for the basic terms and concepts of data curation for librarians to properly frame and carry out a data interview using the Data Curation Profiles (DCP) Toolkit.

Design/methodology/approach

The DCP Toolkit is a semi‐structured interview designed to assist librarians in identifying the data curation needs of researchers. The components of the DCP Toolkit were analyzed to determine the base level of knowledge needed for librarians to conduct effective data interviews. Specific concepts, definitions, and examples were sought through a review of articles, case studies, practitioner resources and from the experiences of the Purdue University Libraries.

Findings

Data curation concepts and terminology are not yet well‐defined and often vary across, or even within fields of study. This research informed the development of a workshop to train librarians in using the DCP Toolkit. The definitions and concepts addressed in the workshop include: data, data set, data lifecycle, data curation, data sharing, and roles for reference librarians.

Practical implications

Conducting a data interview can be a daunting task given the complexity of data curation and the lack of shared definitions. Practical tools and training are needed to help librarians develop capacity in data curation.

Originality/value

This article provides practical information for public service librarians to help them conceptualize and conduct a data interview with researchers.

Details

Reference Services Review, vol. 40 no. 1
Type: Research Article
ISSN: 0090-7324

Keywords

Article
Publication date: 16 November 2012

Rebecca L. Harris‐Pierce and Yan Quan Liu

This study aims to present the results of a survey of library and information science (LIS) schools' websites used to determine if the number of data curation courses offered is…

1673

Abstract

Purpose

This study aims to present the results of a survey of library and information science (LIS) schools' websites used to determine if the number of data curation courses offered is adequate to address the needs of the so‐called “data deluge”. Many authors have identified a gap in the education of LIS students for data curation.

Design/methodology/approach

This study surveyed the websites of LIS schools in North America to identify data curation courses. It reviewed and analyzed course descriptions, objectives and syllabi (when available) as well as compared course objectives, requirements, topics, assignments, and projects of the identified courses.

Findings

Of the 52 LIS schools in North America's websites examined in this study, 16 institutions offered courses on data curation. The increase in the number of schools offering courses in data curation showed that LIS schools are responding to the demand for data curation professionals. More LIS schools need to add data curation to their curriculum. LIS schools currently offering data curation courses should continue to work together to determine the optimal course objectives and learning outcomes.

Originality/value

Although there are several papers focused on particular data curation programs at a few universities, there are no papers that provide an overall view of the status of data curation education in higher education institutions today. This research will be of value and interest to LIS educators and professionals to determine if there is adequate education in place and to identify and evaluate the current state of data curation education.

Article
Publication date: 20 November 2023

Laksmi Laksmi, Muhammad Fadly Suhendra, Shamila Mohamed Shuhidan and Umanto Umanto

This study aims to identify the readiness of institutional repositories in Indonesia to implement digital humanities (DH) data curation. Data curation is a method of managing…

Abstract

Purpose

This study aims to identify the readiness of institutional repositories in Indonesia to implement digital humanities (DH) data curation. Data curation is a method of managing research data that maintains the data’s accuracy and makes it available for reuse. It requires controlled data management.

Design/methodology/approach

The study uses a qualitative approach. Data collection was carried out through a focus group discussion in September–October 2022, interviews and document analysis. The informants came from four institutions in Indonesia.

Findings

The findings reveal that the national research repository has implemented data curation, albeit not optimally. Within the case study, one of the university repositories diligently curates its humanities data and has established networks extending to various ASEAN countries. Both the national archive repository and the other university repository have implemented rudimentary data curation practices but have not prioritized them. In conclusion, the readiness of the national research repository and the university repository stand at the high-capacity stage, while the national archive repository and the other university repository are at the established and early stages of data curation, respectively.

Research limitations/implications

This study examined only four repositories due to time constraints. Nonetheless, the four institutions were able to provide a comprehensive picture of their readiness for DH data curation management.

Practical implications

This study provides insight into strategies for developing DH data curation activities in institutional repositories. It also highlights the need for professional development for curators so they can devise and implement stronger ownership policies and data privacy to support a data-driven research agenda.

Originality/value

This study describes the preparations that must be considered by institutional repositories in the development of DH data curation activities.

Article
Publication date: 16 November 2015

Tobias Blanke, Michael Bryant and Reto Speck

In 2010 the European Holocaust Research Infrastructure (EHRI) was funded to support research into the Holocaust. The project follows on from significant efforts in the past to…

Abstract

Purpose

In 2010 the European Holocaust Research Infrastructure (EHRI) was funded to support research into the Holocaust. The project follows on from significant efforts in the past to develop and record the collections of the Holocaust in several national initiatives. The purpose of this paper is to introduce the efforts by EHRI to create a flexible research environment using graph databases. The authors concentrate on the added features and design decisions to enable efficient processing of collection information as a graph.

Design/methodology/approach

The paper concentrates on the specific customisations EHRI had to develop, as the graph database approach is new, and the authors could not rely on existing solutions. The authors describe the serialisations of collections in the graph to provide for efficient processing. Because the EHRI infrastructure is highly distributed, the authors also had to invest a lot of effort into reliable distributed access control mechanisms. Finally, the authors analyse the user-facing work on a portal and a virtual research environment (VRE) in order to discover, share and analyse Holocaust material.

Findings

Using the novel graph database approach, the authors first present how we can model collection information as graphs and why this is effective. Second, we show how we make collection information persistent and describe the complex access management system we have developed. Third, we outline how we integrate user interaction with the data through a VRE.

Originality/value

Scholars require specialised access to information. The authors present the results of the work to develop integrated research with collections on the Holocaust researchers and the proposals for a socio-technical ecosystem based on graph database technologies. The use of graph databases is new and the authors needed to work on several innovative customisations to make them work in the domain.

Details

Library Hi Tech, vol. 33 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Open Access
Article
Publication date: 29 June 2020

Paolo Manghi, Claudio Atzori, Michele De Bonis and Alessia Bardi

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate…

4524

Abstract

Purpose

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.

Design/methodology/approach

This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.

Findings

GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.

Originality/value

To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.

Details

Data Technologies and Applications, vol. 54 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of 259