To read this content please select one of the options below:

Unsilencing colonial archives via automated entity recognition

Mrinalini Luthra (University of Amsterdam, Amsterdam, The Netherlands)
Konstantin Todorov (University of Amsterdam, Amsterdam, The Netherlands)
Charles Jeurgens (University of Amsterdam, Amsterdam, The Netherlands)
Giovanni Colavizza (University of Amsterdam, Amsterdam, The Netherlands)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 31 January 2023




This paper aims to expand the scope and mitigate the biases of extant archival indexes.


The authors use automatic entity recognition on the archives of the Dutch East India Company to extract mentions of underrepresented people.


The authors release an annotated corpus and baselines for a shared task and show that the proposed goal is feasible.


Colonial archives are increasingly a focus of attention for historians and the public, broadening access to them is a pressing need for archives.



The authors thank Saskia Virgina Noot, Thijs Vorstenburg and Clare Shutt for conducting the pilot [7] for this project. This work was made possible by the digitization efforts of the Dutch Nationaal Archief and the authors thank Milo van de Pol, Liesbeth Keijser and Diederick Kortlang for providing authors with context on the testaments. Nadia F. Dwiandari from the Indonesian Arsip Nasional was helpful in providing some background information on the VOC-notary archives in Indonesia. The authors express the gratitude for the integral feedback of the participants of the workshop at The Critical Visitor [8] Field Lab which have been crucial in helping authors in thinking about the politics of categories which led authors to revise the typology. The dataset was made possible by the annotations created by researchers: Roos Bijleveld, Silja de Vilder Coombs, Emma Louise van der Hage, Jonas Guigonnat, Yolien Mulder and Bert van Splunter. Sincere thanks to Leon van Wissen for setting up the annotation software infrastructure and his insightful feedback at numerous points during this project. The authors thank the reviewers for their feedback. Finally, the authors thank the digital humanities research group CREATE [9] at the University of Amsterdam and the Dutch Research Council (NWO, project number NWA.1228.192.108), for providing financial support.


Luthra, M., Todorov, K., Jeurgens, C. and Colavizza, G. (2023), "Unsilencing colonial archives via automated entity recognition", Journal of Documentation, Vol. ahead-of-print No. ahead-of-print.



Emerald Publishing Limited

Copyright © 2022, Emerald Publishing Limited

Related articles