Unsilencing colonial archives via automated entity recognition
ISSN: 0022-0418
Article publication date: 31 January 2023
Issue publication date: 3 September 2024
Abstract
Purpose
This paper aims to expand the scope and mitigate the biases of extant archival indexes.
Design/methodology/approach
The authors use automatic entity recognition on the archives of the Dutch East India Company to extract mentions of underrepresented people.
Findings
The authors release an annotated corpus and baselines for a shared task and show that the proposed goal is feasible.
Originality/value
Colonial archives are increasingly a focus of attention for historians and the public, broadening access to them is a pressing need for archives.
Keywords
Acknowledgements
The authors thank Saskia Virgina Noot, Thijs Vorstenburg and Clare Shutt for conducting the pilot [7] for this project. This work was made possible by the digitization efforts of the Dutch Nationaal Archief and the authors thank Milo van de Pol, Liesbeth Keijser and Diederick Kortlang for providing authors with context on the testaments. Nadia F. Dwiandari from the Indonesian Arsip Nasional was helpful in providing some background information on the VOC-notary archives in Indonesia. The authors express the gratitude for the integral feedback of the participants of the workshop at The Critical Visitor [8] Field Lab which have been crucial in helping authors in thinking about the politics of categories which led authors to revise the typology. The dataset was made possible by the annotations created by researchers: Roos Bijleveld, Silja de Vilder Coombs, Emma Louise van der Hage, Jonas Guigonnat, Yolien Mulder and Bert van Splunter. Sincere thanks to Leon van Wissen for setting up the annotation software infrastructure and his insightful feedback at numerous points during this project. The authors thank the reviewers for their feedback. Finally, the authors thank the digital humanities research group CREATE [9] at the University of Amsterdam and the Dutch Research Council (NWO, project number NWA.1228.192.108), for providing financial support.
Citation
Luthra, M., Todorov, K., Jeurgens, C. and Colavizza, G. (2024), "Unsilencing colonial archives via automated entity recognition", Journal of Documentation, Vol. 80 No. 5, pp. 1080-1105. https://doi.org/10.1108/JD-02-2022-0038
Publisher
:Emerald Publishing Limited
Copyright © 2022, Emerald Publishing Limited