From tree to network: reordering an archival catalogue
ISSN: 0956-5698
Article publication date: 1 July 2020
Issue publication date: 4 December 2020
Abstract
Purpose
This paper presents the results of a number of experiments performed at the National Archives, all related to the theme of linking collections of records. This paper aims to present a methodology for translating a hierarchy into a network structure using a number of methods for deriving statistical distributions from records metadata or content and then aggregating them. Simple similarity metrics are then used to compare and link, collections of records with similar characteristics.
Design/methodology/approach
The approach taken is to consider a record at any level of the catalogue hierarchy as a summary of its children. A distribution for each child record is created (e.g. word counts and date distribution) and averaged/summed with the other children. This process is repeated up the hierarchy to find a representative distribution of the whole series. By doing this the authors can compare record series together and create a similarity network.
Findings
The summarising method was found to be applicable not only to a hierarchical catalogue but also to web archive data, which is by nature stored in a hierarchical folder structure. The case studies raised many questions worthy of further exploration such as how to present distributions and uncertainty to users and how to compare methods, which produce similarity scores on different scales.
Originality/value
Although the techniques used to create distributions such as topic modelling and word frequency counts, are not new and have been used to compare documents, to the best of the knowledge applying the averaging approach to the archival catalogue is new. This provides an interesting method for zooming in and out of a collection, creating networks at different levels of granularity according to user needs.
Keywords
Acknowledgements
The bulk of the work was performed without funding. Traces through Time was funded which I have already referenced in the text. If that isn’t sufficient then it was funded through the AHRC, project reference AH/L010186/1.
Citation
Bell, M. (2020), "From tree to network: reordering an archival catalogue", Records Management Journal, Vol. 30 No. 3, pp. 379-394. https://doi.org/10.1108/RMJ-09-2019-0051
Publisher
:Emerald Publishing Limited
Copyright © 2020, Mark Bell.