Linked Data for Libraries, Archives and Museums. How to Clean, Link and Publish your Metadata

Records Management Journal

ISSN: 0956-5698

Article publication date: 16 March 2015



Alison Kay (2015), "Linked Data for Libraries, Archives and Museums. How to Clean, Link and Publish your Metadata", Records Management Journal, Vol. 25 No. 1, pp. 135-136.



Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited

Linked data theoretically holds the promise of creating a “global database”. To memory institutions specifically, it holds the promise of meaningful links between objects of disparate collections. All that carefully structured data held in libraries, archives and museums could be unleashed, made accessible via the web and, thereby transparently opened to re-use. However, the implementation is complex to say the least.

In their very timely handbook, van Hooland and Verborgh acknowledge that linked data remains very much a “moving target”, (p. 3) but argue that it is one that can be captured as a set of best practices towards the publication of structured data on the web. The outcome of the authors’ discussions with practitioners, Linked Data for Libraries, Archives and Museums, therefore focuses on the practicalities and problems in mediating the relationship between imperfect collection metadata and linked data logistics.

Linked data presents tremendous challenges for cultural heritage institutions in regard to the quality of their metadata. Indeed, even the very term “linked data” does not represent one well-defined technology or standard yet. This leads to enthusiasm and frustration. This handbook aims to pin linked data down a bit more and provides common principles, best practices and hands-on case studies. Its aim, say van Hooland and Verborgh, is to lower the technical barrier towards understanding linked data; propose a critical view of linked data (by not making an abstraction of the challenges); and to provide a conceptual introduction. The accompanying website allows readers to download the metadata used in the case studies and repeat the case studies and exercises at their own pace. Three out of the five case studies involve the use of OpenRefine – software that is freely accessible.

The handbook can be read as a course, or the chapters can be read independently. The best practice propositions are grouped into five practical steps: Modelling, Cleaning, Reconciling, Enriching and Publishing. Each of these provides useful contextual material. For example, the chapter on cleaning begins with a useful discussion on data quality, and the chapter on modelling reviews the different data models used over recent decades to manage structured metadata (pp. 14-30); reviews meta-mark up languages (pp. 31-43); the development of the semantic web vision and a machine-readable web (pp. 44-49). Hence, Linked Data for Libraries, Archives and Museums would be useful for teaching purposes and not just for practitioners.

In rounding off their handbook, van Hooland and Verborgh raise some significant flags for the potential negative impact of linked data. For example, the funnelling effect and the preferencing of the most popular statements. This is a significant disadvantage for the humanities with their interest in “the long tail of minority values” (p. 245). The authors also raise the veil on the significant dangers regarding economic forces and market shifts in knowledge bases (e.g. DBpedia/Freebase) and metadata schemes (e.g. Protocol), both largely opaque not only to the general public, but also to researchers (p. 247).

Linked Data for Libraries, Archives and Museums is accessible, targeted and avoids too light a touch. It is a meaningful “how to” guide. Practitioners and scholars will benefit from familiarizing themselves with this evolving technology and, thereby ensuring that the access to the “long tail” of values is safeguarded. This handbook provides a very useful starting point for doing just that.

Related articles