Guest editorial: Artificial intelligence for cultural heritage materials

Glen Layne-Worthey (School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA)
J. Stephen Downie (School of Information Sciences, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 3 September 2024

Issue publication date: 3 September 2024

313

Citation

Layne-Worthey, G. and Downie, J.S. (2024), "Guest editorial: Artificial intelligence for cultural heritage materials", Journal of Documentation, Vol. 80 No. 5, pp. 1025-1030. https://doi.org/10.1108/JD-09-2024-275

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited


Theoretical introduction

This special issue of the Journal of Documentation is focused on the uses of artificial intelligence (AI) in the provisioning and analysis of digital cultural heritage materials. Such uses may be occasioned by restricted or difficult access (e.g. due to privacy concerns or copyright restrictions or to the sheer and ever-increasing volume of cultural heritage data). Absolutely crucial as well are those uses of AI that researchers have chosen to make not out of necessity, but out of choice, relying on the affordances of new methods in computation and new algorithms to understand old cultural heritage materials in altogether new ways.

The authors of all these articles responded to a call with three principal questions: How can we use AI to make digital cultural heritage collections more accessible? How might we analyze these collections using AI research methods? And can we identify synergies and collaborative avenues among cultural organizations around the world that are engaged in AI-enhanced research and access methods? We invited scholars, curators and other cultural heritage workers in any aligned fields – in the humanities; in the information, data and computer sciences and in the libraries, archives and museums sector – to submit their work for publication in this special issue.

Context

Before we get to those articles, let us put this special issue in context: it is just one of the key research outputs of the Artificial Intelligence for Cultural Organisations (AEOLIAN) Network, an effort funded jointly by the National Endowment for the Humanities (NEH) in the United States of America and by the Arts and Humanities Research Council (AHRC) in the United Kingdom. Lead institutions of this project were the University of Illinois Urbana–Champaign and Loughborough University; project partners included Durham University, Glasgow University, Dublin City University, Auburn University (Georgia, USA), the Frick Collection and Stanford University.

During its three years of activity, the AEOLIAN project also produced a robust and well-attended series of six international workshops (recordings of which are available on the project website, https://www.aeolian-network.net/category/workshops/); a set of five case studies from cultural heritage organizations in the USA and the UK (which form the basis of an edited volume currently under contract with University College London Press); a number of blog posts (e.g. Gooding, 2021; Worthey, 2021) and substantive interviews published in a variety of cultural heritage venues (e.g. Smith, 2021; Dressler, 2021) and original studies such as Jaillant and Aske (2023a, b).

We also commend to your attention another AEOLIAN-sponsored publication: the “Special Issue on Applying Innovative Technologies to Digitized and Born-Digital Archives,” in the Journal on Computing and Cultural Heritage (Vol. 16, No. 4), edited by AEOLIAN’s UK Principal Investigator Dr Lise Jaillant.

The AEOLIAN project is named, of course, for the Aeolian harp, a musical instrument of ancient origins played solely by wind and often producing ethereal, haunting, inhuman melodies that are a source of both inspiration and scientific study. It is our hope that this project’s investigations into the issues involved in the application of AI in the cultural heritage sector will both inspire and haunt. Current public discourse about artificial intelligence – which has increased in both sound and fury over the course of our project – is full of questions and answers, promises, fears and disappointments that all swirl about us in contradictory, unsettled, but still fascinating fashion. The AEOLIAN project has sought to avoid both the utopian and the cataclysmic modes of much current debate around AI (as in the highly reductive and speculative, “Will AI save us or destroy us?” etc.). Instead, we have sought to engage critically and practically with questions of current and potential applications of AI and machine learning in the service of cultural heritage.

As you read and consider the outstanding articles in this special issue of the Journal of Documentation dedicated to AI in the cultural heritage realm, we commend the many other AEOLIAN outcomes as well, all available at the project website, https://www.aeolian-network.net/. Taken together, they offer rich evidence of the complex landscape of AI activity, praxis, thought and critique as they are understood in the very human space of cultural heritage.

Paper descriptions

Even taken on their own and in their relatively modest number, the articles published in this special issue demonstrate an impressive array of fields and methodologies, which are making use of an impressive set of artificial intelligence approaches, and in this, they reflect well the broad diversity of the AEOLIAN project. Topics covered in these articles range from metadata creation to archival practice to archaeological field work, and even within this broad range they include collection types as broadly divergent as audiovisual, colonial and community archives, as well as national archival research infrastructures. On the metadata front, they include automatic classification for library catalogs, impressively fine-grained enhancements for historical newspaper collections and the recovery of hidden figures in those colonial archives. Equally diverse are the subject areas supported by the work documented in these articles, which span the gamut from archeology to the martial arts and much between and beyond. Finally, seen from the methodological point of view, they include techniques as diverse as computer vision, natural language processing, digitization and crowdsourcing.

In their article “Computer vision and machine learning approaches for metadata enrichment to improve searchability of historical newspaper collections,” Ali et al. (2024) describe a major project at the Royal Library of Belgium that deploys computer vision to automatically enhance the metadata, and thus the accessibility, of article-level content in digitized historical newspaper collections. “Metadata enhancement” may seem like an old-fashioned concern, but this team’s use of AI-enhanced methods goes many steps further, determining from among the mass of digitized newspaper content the boundaries of individual articles, identifying their sub-genres (for example, feuilletons, news stories, etc.), extracting named entities from within those articles and creating a richer and more richly searchable historical collection.

Golub et al.’s (2024) article, “Automated Dewey Decimal Classification of Swedish library metadata using Annif software,” presents the results of a robust evaluation of Annif, a promising open-source package developed at the National Library of Finland, for subject classification. Golub’s team deployed this package to evaluate the performance of five different machine-learning algorithms (lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble of the four) against the results of five expert human catalogers. Their results – that no single algorithm can do the job as well as a combination of algorithms – would seem to reaffirm not only the deep complexity and subtlety of a very common and important library function (subject classification) but also the promise and potential of such an automated system to extend (though apparently not replace) limited-supply human labor in order to provide better, more efficient access to library materials.

Luthra et al. (2024) describe a major project that seeks to remediate some crucial omission biases in the historical record. Their article, “Unsilencing colonial archives via automated entity recognition,” describes the use of automatic named entity recognition on the archives of the Dutch East India Company to identify and recover mentions of previously unrecognized and underrepresented people documented in those archival records. In doing so, they focus not only on the rigorously applied and verified statistical and machine learning methods of their project, but more importantly on its ethical dimensions and its ability to rectify the injustices of 17th and 18th century hierarchies of religion, race, gender, class and colonial power.

Naumann and Neuberger (2024) discuss the changing realities of traditional archival organizations in the digital age in their article “User perspectives through cross-connections. The role of archives as part of the German digital research data infrastructure.” They address user perspectives and challenges in a richly interconnected environment where “individual” or “standalone” archives no longer really exist, addressing operational questions such as the role of portal infrastructures that link together different archival institutions. The paper focuses on emerging approaches to inter-institutional connections in Germany and its national data infrastructure, but its lessons may be applicable generally: although institutional cross-connections may not be a new phenomenon, these connections appear significantly different in a digital context. The authors reflect especially on the enhanced need for archivists to remain vigilant to the quality of their data and metadata and especially to seek out institutional systems that prioritize support for cross-connection and interlinking of data.

As the previous articles all demonstrate, AI and machine learning are proving to be valuable tools – though certainly not silver bullets – for the enhancement of description and access for traditional library and archival collections of books, historical newspapers and paper archives. But what of cultural heritage in newer (and thus less familiar), less tangible (and sometimes ephemeral) forms such as multimedia, born-digital and other novel types of cultural heritage? The next set of articles deals precisely with these newer (or more newly appreciated), less tangible types of collections.

Although recorded audiovisual cultural heritage has been with us for well over a century, its preservation and access continue to present vexing difficulties. In the digital age, a new set of challenges and opportunities have arisen. Yang’s (2024) article “Datafication of Audiovisual Archives: from Practice Mapping to a Thinking Model” describes the still-new practice of datafication of these materials, addressing in particular the questions of what sorts of data should be extracted from AV materials and for what purpose. Constructing a model along three broad dimensions of audiovisual content (archival, affective and esthetic and social and historical), Yang proposes mapping the data derivable from such archival materials to the specific purposes to which these data can be put. This exercise leads both to theoretically more justifiable metadata standards and to an enhanced understanding of the multimedia content itself.

Much newer than audiovisual materials (and arguably more ephemeral and problematic) are the born-digital materials that now constitute a vast and growing ocean of human cultural production. Hannaford et al. (2024) describe a complex set of solutions to the vexing problem of preserving and making accessible a particularly critically endangered type of cultural heritage material: community-generated digital content, especially that of marginalized communities. In their article, “Our Heritage, Our Stories: Developing AI Tools to Link and Support Community-Generated Digital Cultural Heritage,” they describe one of the primary challenges in dealing with this unique type of content as follows: the best existing attempts to collect, integrate and preserve community-created content require “bespoke interventionist activities” that are expensive, time-consuming and unsustainable at scale; at the same time, the unsophisticated use of computational methods, meant to deal with the problem of scale, tends to erase the meaning and purpose of both the content and its creators – effectively silencing already marginalized communities. The authors instead rely on a combination of multidisciplinary methods, AI tools and, crucially, a co-design process that includes the community creators themselves.

However, it is not the case that only the new, digital forms of culture require new thinking in the information age: much older forms of cultural heritage are also demanding much newer forms of attention. Hou et al. (2024) take us from the contemporary and disembodied world of digital content to the ancient and profoundly embodied realm of martial arts. Their article, “Unlocking a multimodal archive of Southern Chinese martial arts through embodied cues,” proposes a novel approach to the martial arts as an “authentic carrier of cultural practice. They combine methods in “movement computing” with domain-specific modeling to enable the search and retrieval of the “embodied cues” inherent in Southern Chinese martial arts. This work allows for the archiving of human movement, the creative expression that it embodies and the cultural contexts in which it is embedded. They use machine learning methods to enhance the archival expressions of such intangible cultural heritage.

Although all of our authors express subtle combinations of hope and healthy skepticism about the promise of AI in its application to cultural heritage, one of the clearest such cautionary tales comes in work related to the oldest cultural expressions treated in our special issue. Sobotkova et al. (2024) describe their thoughtful application of machine-learning methods to a well-defined set of problems in archeology in their article, “Validating Predictions of Burial Mounds with Field Data: the Promise and Reality of Machine Learning.” Here they describe a case study related to burial mounds in the Kazanlak Valley, Bulgaria, documented with high-resolution satellite imagery. Comparing the observations of carefully trained neural networks to those of even a novice human observer, they discover that even the most sophisticated models struggle when faced with the sorts of inconsistencies that occur in real-world landscapes. Importantly, though, the authors do not altogether reject machine learning but rather offer cautions for those many who will want to continue to experiment with AI and suggestions for its improvement.

Conclusion

Most readers will no doubt realize that a lot has happened – many would say an entire revolution – in the world of AI during the three years since we began the AEOLIAN project in 2021 and since the research reported here was done. Fair enough: large language models especially, and most obviously (and most brashly) those that power “generative AI” like the ubiquitous ChatGPT and image-creation programs, have indeed taken the world by storm in the popular press. Much of this press has been hyperbolic: does this or that particular mode of artificial intelligence represent the bright future of humanity or spell its doom? Or are we simply observing a high point in the predictable technology hype cycle, a bubble that will (or perhaps already has, by the time you read these words) burst?

Regardless of the answers to any of those questions, we believe that the solid work presented here, grounded in both curiosity and a respect for human cultural production, will stand the test of time. Although the timeline of scholarship and scholarly publishing in the humanities is painfully long compared with that of fickle public attention, the timeline of human culture is even longer, and it’s our hope that the thoughtful pieces presented here will still offer thoughtful readers something to think about, something to work on, for a long time to come.

References

Ali, D., Milleville, K., Verstockt, S., Van de Weghe, N., Chambers, S. and Birkholz, J.M. (2024), “Computer vision and machine learning approaches for metadata enrichment to improve searchability of historical newspaper collections”, Journal of Documentation, Vol. 80 No. 5, pp. 1031-1056, doi: 10.1108/JD-01-2022-0029.

Dressler, V. (2021), “Artificial intelligence for cultural organizations (AEOLIAN) NEH grant, interview with Glen Worthey”, Choose Privacy Every Day, published by the American Library Association, available at: https://chooseprivacyeveryday.org/artificial-intelligence-for-cultural-organizations-aeolian-neh-grant-interview-with-dr-glen-worthey/ (accessed 1 December 2023).

Golub, K., Suominen, O., Mohammed, A.T., Aagaard, H. and Osterman, O. (2024), “Automated Dewey decimal classification of Swedish library metadata using Annif software”, Journal of Documentation, Vol. 80 No. 5, pp. 1057-1079, doi: 10.1108/jd-01-2022-0026.

Gooding, P. (2021), “‘In the AI tonight’: introducing the AEOLIAN project”, available at: https://www.dpconline.org/blog/aeolian-paul-gooding (accessed 1 December 2023).

Hannaford, E.D., Schlegel, V., Lewis, R., Ramsden, S., Bunn, J., Moore, J., Alexander, M., Barker, H., Batista-Navarro, R., Hughes, L. and Nenadic, G. (2024), “Our Heritage, Our Stories: developing AI tools to link and support community-generated digital cultural heritage”, Journal of Documentation, Vol. 80 No. 5, pp. 1133-1147, doi: 10.1108/jd-03-2024-0057.

Hou, Y., Seydou, F.M. and Kenderdine, S. (2024), “Unlocking a multimodal archive of Southern Chinese martial arts through embodied cues”, Journal of Documentation, Vol. 80 No. 5, pp. 1148-1166, doi: 10.1108/JD-01-2022-0027.

Jaillant, L. and Aske, K. (eds.), (2023a), “Journal on Computing and Cultural Heritage: special issue on applying innovative technologies to digitized and born-digital archives”, Vol. 16 No. 4, available at: https://dl.acm.org/toc/jocch/2023/16/4#sec11

Jaillant, L. and Aske, K. (2023b), “Are users of digital archives ready for the AI era? Obstacles to the application of computational research methods and new opportunities”, Journal on Computing and Cultural Heritage, Vol. 16 No. 4, pp. 1-16, doi: 10.1145/3631125.

Luthra, M., Todorov, K., Jeurgens, C. and Colavizza, G. (2024), “Unsilencing colonial archives via automated entity recognition”, Journal of Documentation, Vol. 80 No. 5, pp. 1080-1105, doi: 10.1108/JD-02-2022-0038.

Naumann, K. and Neuburger, A. (2024), “User perspectives through cross-connections. The role of archives as part of the German digital research data infrastructure”, Journal of Documentation, Vol. 80 No. 5, pp. 1106-1118, doi: 10.1108/JD-04-2022-0081.

Smith, A. (2021), Artificial Intelligence for Cultural Organisations: A Conversation with Dr Lise Jaillant, NewsEye, available at: https://www.newseye.eu/blog/news/an-interview-with-lisa-jaillant-uk-principal-investigator-for-the-aeolian-network/ (accessed 1 December 2023).

Sobotkova, A., Kristensen-McLachlan, R.D., Mallon, O. and Ross, S.A. (2024), “Validating predictions of burial mounds with field data: the promise and reality of machine learning”, Journal of Documentation, Vol. 80 No. 5, pp. 1167-1189, doi: 10.1108/jd-05-2022-0096.

Yang, Y. (2024), “Datafication of audiovisual archives: from practice mapping to a thinking model”, Journal of Documentation, Vol. 80 No. 5, pp. 1119-1132, doi: 10.1108/JD-04-2022-0093.

Worthey, G. (2021), “The AEOLIAN network: ‘blowin’ in the wind’”, Artificial Intelligence for Libraries, Archives, and Museums (AI4LAM), available at: https://sites.google.com/view/ai4lam/news/20210609worthey (accessed 1 December 2023).

Further reading

The AEOLIAN network: artificial intelligence for cultural Organisations (2021-2023), available at: https://www.aeolian-network.net/ (accessed 1 December 2023).

Acknowledgements

Funding: Funding for AEOLIAN (“Artificial Intelligence for Cultural Organisations”) came jointly from the Arts and Humanities Research Council in the UK (Reference: AH/V009443/1) and the National Endowment for the Humanities in the US (Reference: HC-278124-21). The views, findings, conclusions, and recommendations expressed in this article and those that follow, do not necessarily represent those of the National Endowment for the Humanities or the Arts and Humanities Research Council.

Related articles