Search results

1 – 10 of 60

Open Access

Article

Publication date: 29 June 2020

Entity deduplication in big data graphs for scholarly communication

Paolo Manghi, Claudio Atzori, Michele De Bonis and Alessia Bardi

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate…

HTML

PDF (2.4 MB)

Downloads

5036

Abstract

Purpose

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.

Design/methodology/approach

This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.

Findings

GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.

Originality/value

To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.

Details

Data Technologies and Applications, vol. 54 no. 4

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 28 June 2023

Knowledge organisation in institutional repositories: a case study on policies and procedures manuals in the Ibero-American environment

Gema Bueno de la Fuente, Carmen Agustín-Lacruz, Mariângela Spotti Lopes Fujita and Ana Lúcia Terra

The purpose of this study is to analyse the recommendations on knowledge organisation from guidelines, policies and procedure manuals of a sample of institutional repositories and…

HTML

PDF (1.5 MB)

Downloads

251

Abstract

Purpose

The purpose of this study is to analyse the recommendations on knowledge organisation from guidelines, policies and procedure manuals of a sample of institutional repositories and networks within the Latin American area and observe the level of follow-up of international guidelines.

Design/methodology/approach

Presented is an exploratory and descriptive study of repositories’ professional documents. This study comprised four steps: definition of convenience sample; development of data codebook; coding of data; and analysis of data and conclusions drawing. The convenience sample includes representative sources at three levels: local institutional repositories, national aggregators and international network and aggregators. The codebook gathers information from the repositories’ sample, such as institutional rules and procedure manuals openly available, or recommendations on the use of controlled vocabularies.

Findings

The results indicate that at the local repository level, the use of controlled vocabularies is not regulated, leaving the choice of terms to the authors’ discretion. It results in a set of unstructured keywords, not standardised terms, mixing subject terms with other authorities on persons, institutions or places. National aggregators do not regulate these issues either and limit to pointing to international guidelines and policies, which simply recommend the use of controlled vocabularies, using URIs to facilitate interoperability.

Originality/value

The originality of this study lies in identifying how the principles of knowledge organisation are effectively applied by institutional repositories, at local, national and international levels.

Details

The Electronic Library , vol. 41 no. 6

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 3 April 2017

The data-literature interlinking service: Towards a common infrastructure for sharing data-article links

Adrian Burton, Hylke Koers, Paolo Manghi, Sandro La Bruzzo, Amir Aryani, Michael Diepenbroek and Uwe Schindler

Research data publishing is today widely regarded as crucial for reproducibility, proper assessment of scientific results, and as a way for researchers to get proper credit for…

HTML

PDF (1.2 MB)

Downloads

1493

Abstract

Purpose

Research data publishing is today widely regarded as crucial for reproducibility, proper assessment of scientific results, and as a way for researchers to get proper credit for sharing their data. However, several challenges need to be solved to fully realize its potential, one of them being the development of a global standard for links between research data and literature. Current linking solutions are mostly based on bilateral, ad hoc agreements between publishers and data centers. These operate in silos so that content cannot be readily combined to deliver a network graph connecting research data and literature in a comprehensive and reliable way. The Research Data Alliance (RDA) Publishing Data Services Working Group (PDS-WG) aims to address this issue of fragmentation by bringing together different stakeholders to agree on a common infrastructure for sharing links between datasets and literature. The paper aims to discuss these issues.

Design/methodology/approach

This paper presents the synergic effort of the RDA PDS-WG and the OpenAIRE infrastructure toward enabling a common infrastructure for exchanging data-literature links by realizing and operating the Data-Literature Interlinking (DLI) Service. The DLI Service populates and provides access to a graph of data set-literature links (at the time of writing close to five million, and growing) collected from a variety of major data centers, publishers, and research organizations.

Findings

To achieve its objectives, the Service proposes an interoperable exchange data model and format, based on which it collects and publishes links, thereby offering the opportunity to validate such common approach on real-case scenarios, with real providers and consumers. Feedback of these actors will drive continuous refinement of the both data model and exchange format, supporting the further development of the Service to become an essential part of a universal, open, cross-platform, cross-discipline solution for collecting, and sharing data set-literature links.

Originality/value

This realization of the DLI Service is the first technical, cross-community, and collaborative effort in the direction of establishing a common infrastructure for facilitating the exchange of data set-literature links. As a result of its operation and underlying community effort, a new activity, name Scholix, has been initiated involving the technological level stakeholders such as DataCite and CrossRef.

Details

Program, vol. 51 no. 1

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

Open Access

Article

Publication date: 13 June 2023

Open access books through open data sources: assessing prevalence, providers, and preservation

Mikael Laakso

Science policy and practice for open access (OA) books is a rapidly evolving area in the scholarly domain. However, there is much that remains unknown, including how many OA books…

HTML

PDF (1.2 MB)

Downloads

1996

Abstract

Purpose

Science policy and practice for open access (OA) books is a rapidly evolving area in the scholarly domain. However, there is much that remains unknown, including how many OA books there are and to what degree they are included in preservation coverage. The purpose of this study is to contribute towards filling this knowledge gap in order to advance both research and practice in the domain of OA books.

Design/methodology/approach

This study utilized open bibliometric data sources to aggregate a harmonized dataset of metadata records for OA books (data sources: the Directory of Open Access Books, OpenAIRE, OpenAlex, Scielo Books, The Lens, and WorldCat). This dataset was then cross-matched based on unique identifiers and book titles to openly available content listings of trusted preservation services (data sources: Cariniana Network, CLOCKSS, Global LOCKSS Network, and Portico). The web domains of the OA books were determined by querying the web addresses or digital object identifiers provided in the metadata of the bibliometric database entries.

Findings

In total, 396,995 unique records were identified from the OA book bibliometric sources, of which 19% were found to be included in at least one of the preservation services. The results suggest reason for concern for the long tail of OA books distributed at thousands of different web domains as these include volatile cloud storage or sometimes no longer contained the files at all.

Research limitations/implications

Data quality issues, varying definitions of OA across services and inconsistent implementation of unique identifiers were discovered as key challenges. The study includes recommendations for publishers, libraries, data providers and preservation services for improving monitoring and practices for OA book preservation.

Originality/value

This study provides methodological and empirical findings for advancing the practices of OA book publishing, preservation and research.

Details

Journal of Documentation, vol. 79 no. 7

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

Content available

Article

Publication date: 3 August 2012

New & Noteworthy

HTML

Downloads

195

Details

Library Hi Tech News, vol. 29 no. 6

Type: Research Article

DOI:

ISSN: 0741-9058

View access options

Article

Publication date: 27 August 2014

The D-NET software toolkit: A framework for the realization, maintenance, and operation of aggregative infrastructures

Paolo Manghi, Michele Artini, Claudio Atzori, Alessia Bardi, Andrea Mannocci, Sandro La Bruzzo, Leonardo Candela, Donatella Castelli and Pasquale Pagano

The purpose of this paper is to present the architectural principles and the services of the D-NET software toolkit. D-NET is a framework where designers and developers find the…

HTML

PDF (673 KB)

Downloads

390

Abstract

Purpose

The purpose of this paper is to present the architectural principles and the services of the D-NET software toolkit. D-NET is a framework where designers and developers find the tools for constructing and operating aggregative infrastructures (systems for aggregating data sources with heterogeneous data models and technologies) in a cost-effective way. Designers and developers can select from a variety of D-NET data management services, can configure them to handle data according to given data models, and can construct autonomic workflows to obtain personalized aggregative infrastructures.

Design/methodology/approach

The paper provides a definition of aggregative infrastructures, sketching architecture, and components, as inspired by real-case examples. It then describes the limits of current solutions, which find their lacks in the realization and maintenance costs of such complex software. Finally, it proposes D-NET as an optimal solution for designers and developers willing to realize aggregative infrastructures. The D-NET architecture and services are presented, drawing a parallel with the ones of aggregative infrastructures. Finally, real-cases of D-NET are presented, to show-case the statement above.

Findings

The D-NET software toolkit is a general-purpose service-oriented framework where designers can construct customized, robust, scalable, autonomic aggregative infrastructures in a cost-effective way. D-NET is today adopted by several EC projects, national consortia and communities to create customized infrastructures under diverse application domains, and other organizations are enquiring for or are experimenting its adoption. Its customizability and extendibility make D-NET a suitable candidate for creating aggregative infrastructures mediating between different scientific domains and therefore supporting multi-disciplinary research.

Originality/value

D-NET is the first general-purpose framework of this kind. Other solutions are available in the literature but focus on specific use-cases and therefore suffer from the limited re-use in different contexts. Due to its maturity, D-NET can also be used by third-party organizations, not necessarily involved in the software design and maintenance.

Details

Program, vol. 48 no. 4

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

Content available

Article

Publication date: 2 March 2012

New & Noteworthy

HTML

Downloads

381

Details

Library Hi Tech News, vol. 29 no. 1

Type: Research Article

DOI:

ISSN: 0741-9058

View access options

Book part

Publication date: 24 November 2010

Recent Trends in EU Information Policy: Toward Greater Transparency in the Information Society

Debbie Rabina and Scott Johnston

This chapter discusses recent information policy activities and initiatives in the European Union (EU). EU information policy refers to the legislation and strategies pertaining…

HTML

PDF (151 KB)

EPUB (61 KB)

Abstract

This chapter discusses recent information policy activities and initiatives in the European Union (EU). EU information policy refers to the legislation and strategies pertaining to the creation of the European information society. It is concerned with economic and industrial competitiveness, with an emphasis on the role that information and communication technologies play in revolutionizing everyday life. This discussion focuses on the information policy areas of greatest interest to information professionals. It addresses the EU's struggles with the concept of transparency with regard to the Anti-Counterfeiting Trade Agreement, the application of privacy measures to the Internet of Things, and open-access to EU-funded research.

Details

Advances in Librarianship

Type: Book

DOI:

ISBN: 978-1-84950-979-4

View access options

Article

Publication date: 5 July 2013

Open access in the Czech Republic: an overview

Ondřej Fabián

The purpose of this paper is to give a complex description and evaluation of open access adoption in the environment of the Czech Republic, from both the green road and golden…

HTML

PDF (217 KB)

Downloads

817

Abstract

Purpose

The purpose of this paper is to give a complex description and evaluation of open access adoption in the environment of the Czech Republic, from both the green road and golden road points of view.

Design/methodology/approach

Data and conclusions in this paper are numerically supported by quantitative analyses from several relevant databases (e.g. JCR, Scopus, DOAJ or ROAR).

Findings

The issue of open access has not been given appropriate attention in the Czech Republic. Therefore, most of the important activities have only recently been implemented, or are still underway. Open access is still being completely ignored at the level of Czech state offices and funding agencies, which leads to scientific institutions learning of this phenomenon individually. Compared to other Central European countries, the Czech Republic can be classified as average in certain respects, but it is no competition for developed West European and North American countries in terms of awareness, infrastructure and open access adoption.

Originality/value

This is the very first article that comprehensively sums up all aspects of the issue of open access in the Czech Republic.

Details

Library Review, vol. 62 no. 4/5

Type: Research Article

DOI:

ISSN: 0024-2535

Keywords

View access options

Article

Publication date: 9 September 2014

Institutional repository as an important part of scholarly communication

Teja Koler-Povh, Matjaž Mikoš and Goran Turk

The purpose of this paper is to present the institutional repository (IR) named DRUGG (Digital Repository of the University of Ljubljana, Faculty of Civil and Geodetic…

HTML

PDF (259 KB)

Downloads

1671

Abstract

Purpose

The purpose of this paper is to present the institutional repository (IR) named DRUGG (Digital Repository of the University of Ljubljana, Faculty of Civil and Geodetic Engineering) of the University of Ljubljana, Faculty of Civil and Geodetic Engineering (UL FGG), just from its beginnings in 2011, and using the statistics of visits to present its merits for higher visibility of scholarly publications on the web. The role of all stakeholders involved in the construction of this IR is highlighted.

Design/methodology/approach

The historical overview of the awareness of researchers on the UL FGG on worldwide scientific communication through web sites is showed from beginning in the 1990s. Using Google Analytics the statistics of visits and downloads after a year of operations is showed, as well as the statistics of access from different networks from all over the world.

Findings

In the DRUGG repository mainly theses are archived which are usually not published elsewhere. They are very interesting for professional engineers working in practice. The statistics showed that 89 per cent of all visits come from public domains, while only 11 per cent are from the home domain of the University of Ljubljana (UL).

Research limitations/implications

This paper is a case study and limited only to IR DRUGG. It describes the steps taken in implementing the IR considering the technological infrastructure, human resources and collaboration of the library staff with other professional and administrative faculty units.

Practical implications

The repository is to a large extent used by the professional public and that the use is not limited only to the home institution – UL.

Originality/value

This paper helps in planning to build an IR. It also presents an overview of worldwide research and analysis about the influence of IRs on citations of scholarly publications to convince the sceptical research policy makers.

Details

Library Hi Tech, vol. 32 no. 3

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

Access

Year

Content type

1 – 10 of 60

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Details

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Abstract

Details

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions