Search results

1 – 5 of 5
Content available
Article
Publication date: 18 September 2017

Adèle Paul-Hus, Nadine Desrochers, Sarah de Rijcke and Alexander D. Rushforth

2997

Abstract

Details

Aslib Journal of Information Management, vol. 69 no. 5
Type: Research Article
ISSN: 2050-3806

Content available
Book part
Publication date: 19 August 2019

Abstract

Details

The New Metrics: Practical Assessment of Research Impact
Type: Book
ISBN: 978-1-78973-269-6

Content available
Article
Publication date: 1 June 2010

Mike McGrath

355

Abstract

Details

Interlending & Document Supply, vol. 38 no. 2
Type: Research Article
ISSN: 0264-1615

Content available
Book part
Publication date: 8 October 2018

Abstract

Details

Challenging the “Jacks of All Trades but Masters of None” Librarian Syndrome
Type: Book
ISBN: 978-1-78756-903-4

Open Access
Article
Publication date: 29 June 2020

Paolo Manghi, Claudio Atzori, Michele De Bonis and Alessia Bardi

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate…

4536

Abstract

Purpose

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.

Design/methodology/approach

This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.

Findings

GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.

Originality/value

To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.

Details

Data Technologies and Applications, vol. 54 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Access

Only content I have access to

Year

Content type

1 – 5 of 5