To read this content please select one of the options below:

From tree to network: reordering an archival catalogue

Mark Bell (The National Archives, Kew, UK)

Records Management Journal

ISSN: 0956-5698

Article publication date: 1 July 2020

Issue publication date: 4 December 2020

491

Abstract

Purpose

This paper presents the results of a number of experiments performed at the National Archives, all related to the theme of linking collections of records. This paper aims to present a methodology for translating a hierarchy into a network structure using a number of methods for deriving statistical distributions from records metadata or content and then aggregating them. Simple similarity metrics are then used to compare and link, collections of records with similar characteristics.

Design/methodology/approach

The approach taken is to consider a record at any level of the catalogue hierarchy as a summary of its children. A distribution for each child record is created (e.g. word counts and date distribution) and averaged/summed with the other children. This process is repeated up the hierarchy to find a representative distribution of the whole series. By doing this the authors can compare record series together and create a similarity network.

Findings

The summarising method was found to be applicable not only to a hierarchical catalogue but also to web archive data, which is by nature stored in a hierarchical folder structure. The case studies raised many questions worthy of further exploration such as how to present distributions and uncertainty to users and how to compare methods, which produce similarity scores on different scales.

Originality/value

Although the techniques used to create distributions such as topic modelling and word frequency counts, are not new and have been used to compare documents, to the best of the knowledge applying the averaging approach to the archival catalogue is new. This provides an interesting method for zooming in and out of a collection, creating networks at different levels of granularity according to user needs.

Keywords

Acknowledgements

The bulk of the work was performed without funding. Traces through Time was funded which I have already referenced in the text. If that isn’t sufficient then it was funded through the AHRC, project reference AH/L010186/1.

Citation

Bell, M. (2020), "From tree to network: reordering an archival catalogue", Records Management Journal, Vol. 30 No. 3, pp. 379-394. https://doi.org/10.1108/RMJ-09-2019-0051

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Mark Bell.

Related articles