To read this content please select one of the options below:

Demystifying oral history with natural language processing and data analytics: a case study of the Densho digital collection

Haihua Chen (Department of Information Science, University of North Texas, Denton, Texas, USA)
Jeonghyun (Annie) Kim (Department of Information Science, University of North Texas, Denton, Texas, USA)
Jiangping Chen (Department of Information Science, University of North Texas, Denton, Texas, USA)
Aisa Sakata (Department of Information Science, University of North Texas, Denton, Texas, USA)

The Electronic Library

ISSN: 0264-0473

Article publication date: 28 June 2024

Issue publication date: 26 July 2024

176

Abstract

Purpose

This study aims to explore the applications of natural language processing (NLP) and data analytics in understanding large-scale digital collections in oral history archives.

Design/methodology/approach

NLP and data analytics were used to analyse the oral interview transcripts of 904 survivors of the Japanese American incarceration camps collected from Densho Digital Repository, relying specifically on descriptive analysis, keyword extraction, topic modelling and sentiment analysis (SA).

Findings

The researchers found multiple geographic areas of large residential communities of ethnic Japanese people and the place names of the concentration camps. The keywords and topics extracted reflect the deplorable conditions and militaristic nature of the camps and the forced labour of the internees. When remembering history, the main focus for the narrators remains the redress and reparation movement to obtain the restitution of their civil rights. SA further found that the forcible removal and incarceration of Japanese Americans during Second World War negatively impacted and brought deep trauma to the narrators.

Originality/value

This case study demonstrated how NLP and data analytics could be applied to analyse oral history archives and open avenues for discovery. Archival researchers and the general public may benefit from this type of analysis in making connections between temporal, spatial and emotional elements, which will contribute to a holistic understanding of individuals and communities in terms of their collective memory.

Keywords

Acknowledgements

The authors would like to thank Yingying Han, a PhD candidate in Library and Information Science from the University of Illinois Urbana-Champaign, for providing excellent feedback on the research design. The authors appreciate Roopesh Maganti, a master's student in data science from the University of North Texas, who collected the data and helped with the visualizations. The authors would also like to thank Marie Bloechle at the University of North Texas for editing the language of the paper and are grateful to all the anonymous reviewers for their precious comments and suggestions.

Citation

Chen, H., Kim, J.(A)., Chen, J. and Sakata, A. (2024), "Demystifying oral history with natural language processing and data analytics: a case study of the Densho digital collection", The Electronic Library, Vol. 42 No. 4, pp. 643-663. https://doi.org/10.1108/EL-12-2023-0303

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles