“So how do we balance all of these needs?”: how the concept of AI technology impacts digital archival expertise

Amber L. Cushing (School of Information and Communication Studies, University College Dublin, Dublin, Ireland)
Giulia Osti (School of Information and Communication Studies, University College Dublin, Dublin, Ireland)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 21 October 2022

Issue publication date: 18 December 2023

5320

Abstract

Purpose

This study aims to explore the implementation of artificial intelligence (AI) in archival practice by presenting the thoughts and opinions of working archival practitioners. It contributes to the extant literature with a fresh perspective, expanding the discussion on AI adoption by investigating how it influences the perceptions of digital archival expertise.

Design/methodology/approach

In this study a two-phase data collection consisting of four online focus groups was held to gather the opinions of international archives and digital preservation professionals (n = 16), that participated on a volunteer basis. The qualitative analysis of the transcripts was performed using template analysis, a style of thematic analysis.

Findings

Four main themes were identified: fitting AI into day to day practice; the responsible use of (AI) technology; managing expectations (about AI adoption) and bias associated with the use of AI. The analysis suggests that AI adoption combined with hindsight about digitisation as a disruptive technology might provide archival practitioners with a framework for re-defining, advocating and outlining digital archival expertise.

Research limitations/implications

The volunteer basis of this study meant that the sample was not representative or generalisable.

Originality/value

Although the results of this research are not generalisable, they shed light on the challenges prospected by the implementation of AI in the archives and for the digital curation professionals dealing with this change. The evolution of the characterisation of digital archival expertise is a topic reserved for future research.

Keywords

Citation

Cushing, A.L. and Osti, G. (2023), "“So how do we balance all of these needs?”: how the concept of AI technology impacts digital archival expertise", Journal of Documentation, Vol. 79 No. 7, pp. 12-29. https://doi.org/10.1108/JD-08-2022-0170

Publisher

:

Emerald Publishing Limited

Copyright © 2022, Amber L. Cushing and Giulia Osti

License

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode


Introduction

Explorations of the use of artificial intelligence (AI) tools have appeared in archival studies in the past few years. However, many of these articles are mostly limited to testing implementations or opinion pieces from academia. Considering this landscape, we wish to expand the discussion of AI technology in archives by empirically exploring the thoughts and opinions of archival and digital preservation practitioners.

This article attempts to fill the gap by reporting on focus groups with those working in the archives sector as practitioners about their thoughts and opinions related to adopting AI tools in archival work. Our goal is to situate the current discussion about using AI in archival practice via the perspective of working archivists.

In doing so, we hope to learn more about the challenges that may exist in a potential wide-spread implementation of AI technology in the archives field.

We aim to empirically explore potential social issues associated with the use of AI tools in archival work as perceived by these practitioners, rather than focus on the outcome of a specific application of the technology. We hope that this focus will add to the conversation about AI in archives at the current time.

Literature review

Defining AI

Prevailing discussions about “Al and archives” are mediated by the definition of “AI” currently being used in the discussions. The existing discussions tend to favour social and cultural definitions of AI over technical definitions that may be used in other fields. The highly cited Crawford (2021) explains that the definition of AI shifts overtime. While “AI” is frequently used in funding applications, the term “machine learning” (ML) is more frequently used in technical literature. She explains that ML can be understood as a model that can learn from data it has been given. This model can utilise ML and/or computer vision (CV). While ML focuses on numerical, categorical, textual and temporal (time series) data, CV utilises visual data. Crawford (2021) utilises the term ML to refer to technical approaches such as broad scale data mining, classification of data and CV. The author uses the metaphor of an Atlas to describe AI due to the technology's far-reaching social and infrastructural implications.

Explicit definitions of AI in the context of archives have been offered by a few pieces situated in archival studies research. In their survey of archival literature, Colavizza et al. (2022) explain that they utilise the term AI as a proxy for ML, but also use AI “to encompass the professional, cultural and social consequences of automated systems for recordkeeping processes and for archivists” (p. 4). Also situated in archives and recordkeeping, Rolan et al. (2019), references text from Bellman (1978) in their definition: “we understand AI as involving digital systems that automate or assist in activities that we associate with human thinking, activities such as decision-making, problem-solving, learning [and] creating” (p. 181). In this study, we adopt Rolan et al.’s definition of AI.

Access and use

Viewing digital humanists as users of archival collections can also yield insight into existing thoughts and opinions about archives and AI from the digital humanities (DH) perspective. This perspective can be considered parallel to archival studies discussions about AI, as it is written from the perspective of access and use and generally does not consider the work practices of archival practitioners. Jaillant and Caputo (2022) express frustration at the inability of archival repositories to make large digital datasets available in a timely manner. Additionally, Jaillant (2022) noted that the lack of accessibility may impact end users – DH practitioners included.

DH, an umbrella term referring to humanities scholarship concerned with the use of computers as an integrated and essential part of research (Brügger, 2016), have published much more on the use of ML and CV in comparison with archival studies literature. For “genetic” reasons, most applications of AI in the field of DH are related to computational linguistics and are mostly grounded in natural language processing (NLP) methods, although there is a growing interest in working with other media types, including 3D objects. Neural networks and deep learning techniques are among the most current approaches, enabling DH researchers to tackle demanding NLP and CV tasks. Examples range from more traditional use cases such as text analysis from historic and contemporary corpora (Clanuwat et al., 2019; Kestemont et al., 2017; Tanasescu et al., 2018), image and object classification (Bermeitinger et al., 2016; Wevers and Smits, 2020), to more particular applications like Egyptian hieroglyphs recognition, classification and translation (Barucci et al., 2021) or the development of semantic analysis and comparative query of art-historic collections (Garcia and Vogiatzis, 2019; Jain et al., 2021; Springstein et al., 2021). Gefen et al. (2021) caution against the intrinsic disruptiveness of AI, which might deeply impact the way we understand, approach and produce cultural knowledge (p. 196).

Digital humanists' increased use of AI has urged them to reflect on the nature of their relationship with archivists – which remained latent until recent times. Sabharwal (2017) described the concurring interests between these communities, both aiming at the life-cycle extension of the humanistic data and knowledge within the digital landscape (p. 239). Similarly, Poole and Garwood (2018) analysed the outputs from the digging into data 3 global challenge, with the goal of defining the roles held by librarians and archivists in highly structured DH projects. The authors stress that, although the work of archivists (including librarians) in the examined projects have low visibility, the digital curation and data lifecycle constitute a solid opportunity for collaboration between digital humanists and archival practitioners.

Within archival studies, AI holds promise for managing the archival backlog. Many ideas have been proposed over the years to tame the backlog and make collections available to the public more quickly. Most notable is Greene and Meissner's (2005) more product, less process (MPLP) method of arrangement and description, in which records are not described at the item level in order to alleviate processing backlogs. Crowdsourcing with user generated tags has also been explored as a method to alleviating processing backlogs (Benoit, 2017, 2018) as has participatory archives initiatives (Eveleigh, 2014; Roeschley and Benoit, 2019) which harness the crowd and also attempt to involve users in archival work.

According to Jaillant and Caputo (2022), AI offers potential to sort through backlogs more efficiently, specifically by more efficiently screening data for sensitivity. The authors state that “archival collections often close entire collections due to data protection concerns” and that “closing entire collections for an indeterminate period of time is not ethical, since archives in publicly funded organisations are meant to be open to the public” (p. 5). The authors also suggest that archivists should make it easier for (DH) researchers to utilise large datasets for their work. Writing from the archives perspective, Lee (2018) also supports the use of ML to review collections for sensitivity, as well as using ML in the appraisal and selection process. Both authors reference the ePADD (email: Process, Appraise, Discover, Deliver) email project in their work (https://library.stanford.edu/projects/epadd).

In their thoughts on the future of the archival field, Moss et al. (2018) also write of the challenges archivists face when trying to provide user access to large digital datasets. They acknowledge that large datasets have changed research methods in the humanities, using history as an example. Whereas close reading was previously the dominant research method, “distant reading” using AI technology, is becoming more common. The authors suggest that archivists may re-envision their collections as “data to be mined”, echoing Jalliant and Caputo's interest in obtaining access to large digital datasets held by archives quickly. Moss et al. (2018) suggest that as a result, the focus of archival work on arrangement and description of historical collections may need to evolve, especially if users might not find classification as useful for distant reading. However, they note that the largest cohort of archival users is family historians, not academic researchers. Proctor and Marciano (2021) suggest that these family historians would be served using CV to extract names and dates from images. The opinion that arrangement and description may not be as useful as it was previously is not shared by Randby and Marciano (2020), who state that digital curation, including description, is a necessary preparation step for application of ML algorithms.

Development of new skills in the context of archives and AI

Developed by a team of academics, computational archival science (CAS) attempts to combine skills and knowledge from archival science, information science and computer science to create a new interdisciplinary field. CAS is defined as:

an interdisciplinary field concerned with the application of computational methods and resources to large-scale records/archives processing, analysis, storage, long-term preservation, and access, with the aim of improving efficiency, productivity and precision in support of appraisal, arrangement and description, preservation and access decisions, and engaging and undertaking research with archival material (Payne, 2018, p. 2743).

CAS considers ML a computational tool, in addition to blockchain technology. Grounded in archival science, CAS focuses on the nature of the record and application of computational tools to that record, rather than focussing on data processing which is typical of computer science (Marciano et al., 2018). Marciano et al. (2018) provide several case studies where collaboration between researchers from the different fields has enriched project findings. Proctor and Marciano (2021) provide an example of how CV can be used to process a collection using a CAS framework. CAS appears to implement what Moss et al. (2018) suggest that archives may need to be re-envisioned as “data to be mined”. Computational skills may aid in this endeavour. At the least, CAS skills and knowledge can help archivists understand how their users wish to access data: as large datasets, a perspective shared by the project Always Already Computational, resulting in the redaction of the Santa Barbara Statement on collections as data (Padilla et al., 2019b). Running between 2016 and 2018, the initiative was centred on understanding and mapping “the current and potential approaches to developing cultural heritage collections that support computationally-driven research and teaching” (Padilla et al., 2019a).

Another currently running initiative is InterPARES TRUST AI (https://interparestrustai.org/), a large project that aims to train students and professionals in these computational skills. The project goals are to explore AI technologies in the context of records and archives via case studies at locations around the world.

Social and ethical concerns

In the past few years, archival scholars have provided initial thoughts as to how AI may impact the field. Moss et al. (2018) does not specifically discuss AI, but their suggestion that archives be understood as data collections to be “mined” is a nod to the technology that allows for the mining to occur. Theimer (2018) agrees with this sentiment, suggesting that archivists need to become data scientists. CAS provides one perspective most notably that the archival science focus on the nature of the record remains central moving forward (Marciano et al., 2018; Payne, 2018). Additional opinions take a wider view beyond application to archival work and ponder how AI technology may change the ways in which archives are perceived and why this may be a cause for concern.

Rolan et al. (2019) explored how AI technology can automate different aspects of archives and recordkeeping work. After briefly describing how ML works, the authors describe how ML might work with an electronic digital records management system (EDRMS). They note that some proprietary commercial products have begun to offer the technology via an “AI as a service” model. One such example is Preservica, which claims to integrate Microsoft Azure ML into their software products (https://preservica.com/partners/active-digital-preservation-on-microsoft-azure). Rolan et al. (2019) then discuss the need for case studies to further explore how AI technology might be applied to archival work. Since 2019, some of these case studies have come to fruition, such as Randby and Marciano (2020) and Proctor and Marciano (2021). The authors finish by summarising some on-going projects in Australia that are trialling ML. They make one final important note: many “AI as a service” models offered by large tech companies such as Microsoft, Amazon and Google typically rely on cloud storage. This may conflict with recordkeeping law, which prohibits transfer of electronic records outside of a specific jurisdiction. This serves as one example of the contextual issues that institutions managing government archives and records may face: while “off the shelf” products are affordable and may be customised via an easy to learn interface, these products were initially set up for private business use and may not automatically obey local recordkeeping laws. The design of AI to meet the needs of a capitalist society is one of the issues that Crawford (2021) describes as a problem when AI attempts to be used in “AI for good” type projects. The structure of the technology is not necessarily set up to meet the needs of organisations that lack a profit driven mission. Depending on the type of archive, this presents a problem that has yet to be completely addressed.

Bunn (2020) explored Explainable AI (XAI), which is described as a focus on shedding light on ML models, which are often understood as a “black box” lacking transparency. Bunn (2020) links XAI with “accountability, fairness, social justice, and trust” (p. 144). Similarly, social justice has been of interest in archival studies, which calls into question the colonial past of much of traditional archival practices and positions archives as tools to support work toward an equitable society (Duff et al., 2013; Punzalan and Caswell, 2016). Bunn (2020) links recordkeeping and XAI via the goal of working toward explainability.

Colavizza et al. (2022) explored thoughts and opinions of AI and archives via the approach of an environmental scan of literature. The authors used 53 articles published on the topic of archives and AI between 2015 and 2020 as their corpus. They found four broad themes dominating the literature which they discuss using the framework of the records continuum model: theoretical and professional considerations, automating recordkeeping processes, organising and accessing archives, and novel forms of digital archives. Overall, the authors find that there is a trend in AI being used to probe the traditional definition and concepts of what an archive is, which would put traditional archival principles such as provenance and original order “under pressure.” Other scholars have also highlighted the colonialist overtones of provenance and original order, explaining that they can conflict with diversity and inclusion goals (Punzalan and Caswell, 2016; Steinmeier, 2020).

Chabin (2020) uses case study to highlight a point also found by Bunn (2020) and Colavizza et al. (2022) – that other archival and recordkeeping principles may be of use to the AI field, specifically authenticity. Chabin (2020) uses the case of the data processing associated with the great debate in France to explain how diplomatic analysis can be used to enrich a dataset that is used in ML or CV. When compared with discussions over original order and provenance, it is clear that Colavizza et al.'s (2022) analysis that AI technology is being used to trigger conversations about the application of traditional archival principles is accurate. The lingering question from this review is how the archival field may progress this debate about the archives field in the context of novel technology such as AI.

Method

A series of focus groups held via Zoom were used to gather data about how working archival and digital preservation practitioners think and feel about using AI technology in archival practice. All focus groups received ethics clearance from University College Dublin. In phase one of data collection, focus group one was held to explore reactions to the use of AI CV for assisting with item level metadata creation for historical photograph collections as a way to understand the larger AI issues related to digital preservation work. This focus group was largely exploratory in nature and was used to develop a baseline of understanding of relevant issues that would be used to develop questions for the next round of data collection.

In phase two of data collection, focus groups two to four of professionals working at different institutions in the archival and digital preservation field were asked for their thoughts and opinions about using AI technology to complete their work. Participants were recruited via word of mouth and posts made to professional listservs and Twitter.

Participants

Please see Table 1 below for a description of participant demographics.

The volunteer basis of this study meant that the sample was not representative or generalisable. Focus groups are not designed to provide statistical validity since they mostly privilege thickness and richness of information (Morgan, 1998). Instead of trying to achieve saturation (sensu Glaser and Strauss, 2017) as a criterion of representativity, we sought a good trade-off between the quality of the insights and the number of focus groups to run – as suggested by Carlsen and Glenton (2011).

Procedure

In phase one, four staff members of a national library responded to prompts that consisted of Microsoft Azure produced tags and titles for four photographs from one of the library's digitised photograph collections. These tags were created as part of a class project for the Digital Curation module at University College Dublin (UCD). The purpose of the focus group was not to “test” the use of Microsoft Azure Cognitive services for CV on the collection, but to gather participant response to some of the tags and descriptions as a way to prompt further discussion about the use of AI technology in archival and digital preservation work. All staff members were provided an information sheet about the study and consented to participation and audio recording of the focus group.

In phase two, three different focus groups were held with archival and digital preservation professionals to gather their thoughts and opinions on using AI technology in their work. Some of these participants had familiarity with using AI in an archival setting, some did not. Once participants responded to the call for participation, they were provided with an information sheet outlining their participation in the study, and provided verbal consent to participate and be audio-recorded before the focus groups began. As an ice breaker, participants were first asked to provide their “idealised or dystopian” implementation of AI in their field of work and were then provided a working definition of AI technology that would be used in the focus group. Participants were then asked if they wanted to share any positive or negative experiences that they had had with AI technology. Next, participants were presented with three prompts about the use of AI in archival and digital preservation work.

Prompt one detailed a scenario in which the participant was asked to provide users access to a partially digitised photograph collection, which has not been described in any detail. AI technology is suggested as a tool to speed up description of the collection. Participants were asked to respond with their thoughts about the idea and what they might do in response to the suggested use of the technology.

Prompt two adds further detail to prompt one, in which the participant has noted that several of their colleagues do not fully trust an implementation of the AI technology due to concerns about job losses, lack of training available to learn the new system, and concerns over reliability of the system and whether the technology may become obsolete quickly. The participant was asked if they relate with any of these concerns and if they had other concerns and why.

In the final prompt (three), participants were told that in a continuation of the scenario, their work using AI technology to describe the collection was featured in an online newspaper article in their community. The author of the newspaper article mentions general concerns about privacy, transparency and ethical use of data about AI in the article not specific to their team's use of the technology. The participants were asked to respond to the article.

To conclude, participants were asked to summarise their position on the use of AI to complete their work and if there was anything that was missed, that they would like to add to the discussion.

Analysis

Template analysis, a style of thematic analysis, was used to analyse the data (Braun and Clarke, 2021; King, 2012; King and Brooks, 2017). We used a deductive process to identify themes: a scan of literature about AI and archival studies was completed a priori, and tentative themes from the literature were identified.

The focus group one transcript was read and re-read by the Primary Investigator (PI) to gather information to develop the prompts in the second phase of focus group data collection. After focus groups two thru four were held, all focus groups were read and re-read to develop tentative themes for an initial template. Both researchers worked together to refine the final template, which was complete after four rounds of revision. All focus groups were then coded using nVivo 12 to identify evidence for patterns. After this process, a final tentative template with 4 themes was developed and accounts were written describing the details of each theme.

Findings

Fitting AI into the day to day practice

Most of the concern about “fitting AI in” to existing work centred around additional duties that use of AI technology may require, such as checking outputs before making them available to users. The fact that focus groups two thru four were asked to respond to hypothetical prompts could account for the lack of specific information about the practicalities of using AI technology in digital archives and preservation settings.

Similarly, the participants from focus groups two thru four did not discuss specific benefits that could result from the implementation of AI in the day to day practice, but still considered the potential for AI technology to be valuable. However, they described the technology as having the potential to create more tasks to be performed by humans. Commenting on the use of AI technology to generate descriptive metadata (prompt one, option one), PD204 proposed an alternative scenario, tackling the need for manual validation. While suggesting that AI could support the generation of administrative metadata to enable another option presented in prompt one, the participant clearly defined when additional human performed tasks might be necessary and why:

I would also like to throw in a bit of crowdsourcing in (option) number one because the AI can describe what it sees but it can't tell us anything about the context or the event leading up to what's happening on the picture so we need some again some human intelligence involved.

In relation to the same prompt, PD402 agreed with PD204 on the use of AI to generate descriptive metadata, (in this case to enhance accessibility), while also expressed a concern about their job security:

I guess if you can automate, you know, describing, you know, basic description for each (photograph) that I mean, I guess that would help with, you know, making the collection somewhat accessible for people to actually view, you know, what's, what's in the collection, em it might put me out of a job, but [laughs], I dunno, I think I’d go for (option) number one.

PD404 added to the reflection, cautioning against the lack of supervision of machine-generated outputs:

I think that all the results that come out of these (AI) systems need to be reviewed, and in theory they're going to do things that, at scale um, that they can do; and so there's all the things that they can't do, that we now have the time to do, because those things are doing that. I think there is always that risk, depending on how we frame the use of these tools up the chain, um, as to what they're really capable of doing.

From these participant quotes it is clear that there is optimism that AI technology can be of use, but there is also concern about how to fit it into existing workloads. This concern was largely framed as a need to “check” the “work” or output of the AI technology.

In light of these expectations, many participants expected the use of AI to be central to archival work in the future. As PD203 highlights, “it's gonna be such a huge part of what every librarian has, or information professional and general archivist librarian, whatever role you're involved in”. Participants expressed major concerns about who would be responsible for upkeep associated with the AI systems and what the upkeep would require. At times, these anxieties were expressed via binary choices: outsourcing of the AI system development/maintenance versus the use/upkeep of the same by the in-house staff. Both the reliability of the software and the effective role(s) of the person(s) expected to secure the functionality of the system were questioned by PD201: “I'd be much more concerned with how reliable the software is and also, if this is a tool, what practices are in place eh, that, to keep it going [ …]”. PD404 agreed:

And so that is really where my concerns come in around whenever this comes up, they’re like, we're going to just bring in somebody and drop this on your lap. I, I who is taking care of it, who is maintaining it, who is growing it and who is dealing with its outputs? So that's, that's always me, and again that reliability of the software, what is it really doing? Is it doing what I think it's doing? And what happens if something goes wrong?

Responsible use of technology as expertise

Within the focus groups, “responsible use of technology” was discussed as a human-machine partnership – often found in literature as human-robot collaboration. However, who should be responsible for the AI system and with which modalities, was not clear. Interestingly, the awareness about the pitfalls commonly associated with AI emerged across the focus groups from both of the phases; the role of human agency emerged while participants discussed the tasks that could be “safely” performed by the machine.

As recognised by PD204, AI is “not just simply a tool, it creates issues around sensitivity, privacy; those two things are different”. PD205, commenting on the application of AI to manage legal records including recent crimes (having the relatives of the victims eventually exposed to the AI system outputs), tackled the need for a critical design to prevent ethical issues:

I think AI can help us, but it's, it's how you think through each of those steps and, and structure something around it, so that it's, it's helping rather hindering kind of human interaction.

Participants were sceptical about the adoption of AI to address sensitive matters, (such as records that should not be made public) or decision-making process: PD201 argued that “automated systems are not really sophisticated enough to do a human's job.” She kept elaborating on the theme of a human-machine partnership:

So, I think, in general, the consensus here is that AI technologies are useful for making jobs more efficient and for helping the humans, the experts in the field that to (parse?) large bodies of data. But a lot of these ethical questions, em, and special interest questions need to be dealt with by humans.

PD404 asserted “I'm not trusting AI to make curatorial decisions […]. I'd want to have a better sense of the collection, to see, you know, how much information is there, really, to get from, from these pieces to, to actually transcribe and acquire and whether that would be the most useful thing”. PD204 proposed a different point of view, commenting on quality delivery in discussion of the lack of transparency (or explainability) of the AI systems on use, which required extra-caution in proposing un-reviewed AI outputs to their customers (researchers):

I'm running a project where we're looking at the use of two AI, AI systems in literature reviews and we are incredibly sceptical regarding the quality we are getting back. We can't understand, using our librarian ninja skills, that we can't understand how, ehm, the string that's, ehh, how relevance is considered, ehm, in these two systems we’re using and the quality of the sources, the systems are searching in as well is also quite, ehm, it's almost a trade secret. We can’t get a full list of where these, eh, systems are searching, ehm, so we are very cautious before we start, eh, introducing these two systems to our researchers. We are testing, testing, testing to make doubly sure we as a library can still deliver quality products.

Participants experienced a lack of trust toward using AI technology, characterised by concern over the ability of the AI technology to redact personal/sensitive information, or complete a task to the same level of an archival and/or digital preservation professional. As a result, participants were more comfortable with the potential concept of a human-machine partnership, in which the human had the ability to check the AI technology's output for potential issues before releasing the collections to users.

Managing expectations

Another subject that was frequently discussed in all focus groups was the belief that the use of AI technology (including ML) would require practitioners to “manage expectations” of “higher ups” (their line managers, and those managing their organisations). In focus groups two thru four, prompt three, which asked the participants to imagine that a newspaper article was being written about their organisation's use of AI, was the prompt most likely to result in discussion of managing expectations.

Participants were most concerned with managing the expectations of those that work above them because these “higher ups” controlled resources and funding of specific departments in the organisation. The use of AI technology was frequently compared to the early days of the digitisation “era” in the mid-1990s, when heritage institutions experienced a radical transformation in the way they delivered access to collections. It may be significant that AI technology is being considered by Galleries, Libraries, Archives and Museums (GLAMs) at a period of time nearly 30 years after the transition to digitisation and mass availability of digital resources. Those beginning their careers in the mid-1990s that experienced the difficulties that came with this disruptive technology first hand may still be on the job and more aware of how important managing expectations of those who allocate resources can be. According to PD205:

Yeah it's linked to, to training, but also awareness, especially at the top of the organisation, where people could often see the, the digital solution to be the answer to all the problems, em, and, and that creates quite a different shift in the way an organisation can put it through resources.

PD202 also compared the potential adoption of AI in the sector to previous adoption of digitisation:

Just thinking of management em, and the higher ups of the organisation, when digitisation started being used in archives, one of the issues was that the term digitisation was being bandied about a lot. So, it was almost seen as replacing the physical archives and replacing the work of cataloguing and arranging and, you know, as as mentioned earlier, it isn't just about the specific document you're looking at, you need the context um, it's a huge, hugely important part of our job is to provide that context.

The concern that AI could divert resources away from other day to day work practices is understandable, considering that putting effort in at the start will be necessary. However, PD202's concern that AI technology may leech resources “context” building tasks is worth further investigation. PD202 explained context as expertise in the following statement as “making all of those kinds of connections”:

If we're going to use AI at all in the archive, it should be assisting and never overriding the expertise of the, of, of the professionals, and as PD203 alluded to there, talking about librarians would actually be the archivist involved in something like this, you know our whole training is all about appraising collections and deciding what's kept and what's not kept and why, em, and arranging collections and making all of those kind of connections, so AI should be able to assist us with that, but absolutely never override that.

These comments also express the concern that management may fixate on a new technology as a way to improve efficiency, but that this efficiency would come at the cost of other essential duties that require digital archival expertise, similar to “when digital happened”. PD404 also echoed this sentiment:

I am concerned, again, cultural heritage continues to get more and more and more breadth in all the things that we're able to do, and we saw it when digital happened to begin with, you know, everyone who does the acquisitions work and the collection development work and the descriptive work and our physical preservation team, and now we had to support these digitisation folks and the digital preservation folks. None of it is well funded and now we're looking at AI. So how do we balance all of these needs?

Similarly, PD404's description of acquisitions, collection development and descriptive work is also central to what the archivists do and how they demonstrate their expertise.

Bias

Across all focus groups, there was discussion about concerns over bias. Others have discussed how algorithms can become embedded with the bias of the programmers who train them and how algorithms can exacerbate bias against marginalised communities (Crawford, 2021). In the context of archives and digital preservation practitioners, concerns about bias were linked with the ways in which AI technology could allow collections to be used in ways that would further marginalise under-represented communities, misrepresent collections and conflict with institutional diversity aims and objectives and the archival and digital preservation field in general.

Throughout the focus groups, participants were asked to reference what they knew of AI technology. In broaching the issues of AI and bias, participants in two different focus groups referenced the story of how Timnit Gerbu, a prominent AI ethics researcher, was forced out of Google as a culmination of her criticism of the company's policies and practices associated with AI technology in relation to diversity (Hao, 2020). While participants did not mention Gerbu by name, the “Google case” was discussed: “a woman who was researching the ethical implications and biases of Google's algorithms was fired by Google and she wrote some op eds and stuff. Anyway, what they did to her was not good (PD201). In addition, PD101:

So, it's worth looking into the Google researcher that was recently fired, asked to leave over the paper that she wrote. And I think that was a huge petition to kind of support her and things like that. And it was around questions of diversity and how these models can uphold existing structures of inequality and all that and also climate issues. And I think it was rejected by Google and the kind of a dodgy way they didn't want to publish it. And then she had to leave the company. And she's a very well respected researcher, who also happened to be a person of colour as well. So it was, it was really like, really troubling.

There was also a concern that users might extend their existing understanding of bias and AI to concern about use of AI in the archive for any purpose:

Yeah, you know race relations being such a forefront in academia, right now, like AI is just very untouchable for a lot of American academia at the moment. Especially like in the college where I'm working at, if I phrased it as AI it would not be good. If I phrased it as stylometry they might be interested. So, you know machine learning - maybe, but AI would definitely be a no go. (PD301)

PD301 was concerned about use of AI on the existing collections because users (academics and students in her case) would be concerned about AI and racial bias. This prior knowledge of AI technology and bias was applied to practitioners' own context in working with digital collections. For example, several participants acknowledged the colonialist past of some aspects of archival theory and practice (Punzalan and Caswell, 2016). There was an acute awareness that what they knew about AI and bias would further exacerbate marginalisation. According to PD404:

I would just say the bias question is a huge one, depending on what the data set is and what you're looking at and what it's being trained on, which continues to be my, my deepest fear about AI, even in sort of innocuous areas, like is this a bird or is this a plant, which happens more in my work. When we're talking about what is really going on in this image and if, since we're working in what is still a predominantly white field, and is still a field that is, is full of legacy data that we haven't been able to clean up before training the AI on records that were created seventy years ago that we haven't touched since I think it's really, really disturbing, especially when we're talking about descriptive, descriptive information if if we're going beyond transcription translation, that kind of work and we’re actually saying what do you see in this image AI and what do you think this is, that always makes me very squeamish, and I always want to do that, that thorough review before it goes up, um, and so it's it's definitely one of my big concerns.

For PD404, there was a concern that digitised collections in her care might need to be “cleaned up”: a subtle reference to the concept that what was acceptable in the past is no longer acceptable now and if collections were not “cleaned up” harmful stereotypes could be perpetuated. This also demonstrates some of the additional tasks a practitioner might need to take on when using AI technology. PD305 had similar concerns, which were expressed as “very strange views of the world”:

… There’s instances of this happening, where the records have come through the system, and the, there is data in, in which there’s personal nature, and then they have to be taken back out again from the public domain and that's with humans being involved in the process. So almost, it's a question of can the AI recognize the type of collection which is likely to have the things in it which might need a close look from a human, because clearly the examples you put up there are things which might not be obviously interpretable by a machine, so you could have like saying physical representations of people, or the language that was used to create those records, or which is in those records, say, colonial papers from the 1930s, are going to have some very strange views of the world. Em, which we wouldn't share today, some of us wouldn't anyway.

Viewed in the context of digital archival expertise, comments from PD404 and PD305 also suggest concern that an AI system may not have enough relevant expertise to understand stereotypes that were common in the past and that these stereotypes are no longer acceptable to reference in description of a collection today. The relevant expertise in this example is understanding of harmful stereotypes that could perpetuate marginalisation, as well as expertise about current values held in modern archival and digital preservation studies, such as a commitment to social justice.

PD401 was also concerned with the way in which AI technology could preference the voices of elites in a collection. This is exemplified in discussion of a hypothetical situation in which AI technology was used to search a large, digitised collection to meet a user request for information:

So, 85% of the population are missing from the archive. So, these are all balances that we have to come to, because somebody asks a question of the archive, what is the answer that they're getting back, are they hearing the voices of the people in charge, or are they hearing the voices of the people? And how to use and interpret that answer into something that you think is either fair and accurate? So yeah these are things that we do have to, we have thought a lot about, and we are careful as to what we will include on the bias of what we're including, and have lots of disclaimers, and this is what you're looking at kind of commentary throughout it. (PD401)

PD101 held a similar concern, and contextualised the concern in reference to his institution's diversity policy:

I've been hearing more about lately is the things like all of the bias that's involved when it comes to image recognition it is severely biased towards white men in particular, people of colour and women tend to be significantly underrepresented. And that's something that's quite interesting, I suppose, in a general sense in our fields, but I suppose specifically in [institution] where we are quite serious about our diversity policies and stuff like that. Which … that would be pretty much it, I guess, my understanding would be that, you know, the machine learning and the computer vision is really only as good as whatever like model you have like the actual like, what what data are you actually basing all this on and stuff like that, but what corpus … is it are you working with?

Discussion

One of the common threads that run throughout the different themes identified in the focus group data is the nature of digital archival expertise and how AI challenges and/or supports the ways in which participants conceive of digital archival expertise.

For example, in discussing “fitting AI in the day to day practice”, there was concern that using AI technology might result in additional work tasks for practitioners because the AI system would lack certain expertise and the outputs would need to be “checked” by a human. The unclear division of responsibilities towards the upkeep of an AI system to ensure its reliability through time and the lack of specifics to the expertise of the person(s) in charge of the task emerged among the major anxieties.

Following, in discussion of responsible use of AI technology in the context of archives, participants discussed the concept of “checking” outputs; that led to suggestions of a human-machine collaboration in which archives and digital preservation practitioners would provide the expertise needed to make sure sensitive/personal records were not accidently “missed” and/or released to users when they should not be.

When participants discussed the need to manage the expectations of organisation management and “higher ups,” this “managing of expectations” was discussed as an “art”– the ability to explain the limits of what the AI technology was capable of and how use of the AI technology would require a human to “check” the work, while simultaneously signposting the importance of traditional archival activities such as arrangement and description of collections. This need to “not forget” about traditional archival activities was framed as a lesson learned from the adoption of digitisation in archives in the mid-1990s which radically changed the way users engage with and their expectations for access and use of archival collections.

Finally, in discussing bias, this expertise was framed as knowledge of harmful stereotypes and context surrounding information in digital collections which might act as datasets for application of AI, the ability to steer how AI technology should be applied to digital archival collections, as well as digitisation priorities in the context of the archives social justice movement. Punzalan and Caswell (2016) include “inclusion of underrepresented and marginalised sectors of society” as one of their five areas of social justice in archival studies. Participants expressed concern about this specific issue-how AI technology may inappropriately be applied by users in the context of marginalised and under-represented groups represented in the collections for which they care.

Much of the exiting archives literature about AI technology is focused on how AI may impact the archives field: this includes how AI may change traditional archival work such as arrangement, description and appraisal via automation (Lee, 2018); how traditional archival theory such as original order and provenance may change (Colavizza et al., 2022); and how AI technology will require archivists to learn new computational skills (Marciano et al., 2018; Payne, 2018). Moss et al. (2018) and Theimer (2018) specifically address the concept of archival expertise and how it may change in the context of AI, suggesting that expertise would grow to closely resemble data science.

Susskind and Susskind (2015) in The future of the professions: How technology will transform the work of human experts predict the decline of professions that require expertise like law and medicine because of the rise of advanced technology, including AI. Theimer (2018) applies this perspective to the archival profession by predicting that, facing the growing use of AI technology, archivists will attempt to challenge the rise of automation by highlighting the work that a machine can't do which requires “creative thought” (p. 11) and making the argument that “a machine can't do all the parts of my job” (p. 11). Archival tasks that are difficult to automate, that require “a human touch” (p. 12) will be highlighted.

The thought that the “machine can't do all the parts of my job” was present throughout the focus group participants' discussion. For example, in all themes, digital archival expertise was framed as expertise needed to “check” outputs of the AI, the ability to provide context for collections, and to steer the use of algorithms to tasks that would not conflict with social justice values held by the archival profession, including not compounding marginalisation of under-represented groups.

Theimer (2018) counters that while archivists may try to make the “a machine can't do all the parts of my job” argument, Susskind and Susskind (2015) argue that eventually, all tasks will become routinised in one way or another, such as tasks being completed in an entirely different manner in which the end result is similar enough to the “non-routinised task” (the parts the machine can't do). She also predicts that what will matter to users is “delivering an acceptable level of service as freely and broadly as possible” (p. 12) which contradicts the argument that users will always value the “human touch” work that archivists do (and machines can't currently do) – the digital archival expertise.

In the focus groups, there was little concern that AI technology would completely replace the work of a professional archivist – this was underpinned by a focus on digital archival expertise: the ability to manage expectations, the contextual expertise an archivist can provide to collections, and the dedication to upholding social justice in archives principles, namely, to not continue to marginalise under-represented groups and perpetuate bias. However, in light of the Theimer piece, this discussion of “digital archival expertise” can be explained as an offensive mounted against the rising use of intelligent systems such as AI technology.

As such, when one asks, what are the thoughts and opinions of archival practitioners about AI, as a way to understand how AI adoption may impact archival work? The focus group data suggests that AI, as a disruptive technology, impacts the ways in which archivists characterise and communicate digital archival expertise. AI may cause a reaction to revise the ways in which digital archival expertise is highlighted and presented to “the higher ups,” users and the public. In this sense, AI technology in archives acts as a counter to balance digital archival expertise against. This is the potential “impact” of AI on digital archives practice – it will cause a change in the characteristics of digital archival expertise and the elements of the expertise that are advocated and communicated to interested parties. Our findings suggest that the way that digital archival expertise is characterised will slowly evolve to represent “what the machines can't do.” For our participants, digital archival expertise was described as arrangement, description and appraisal, managing expectations, understanding and contextualising user needs and a dedication to social justice initiatives. We predict this will change, and the speed with which it will change will depend on how quickly those tasks can become routinised.

The focus of this article was to explore thoughts and opinions of AI technology in archival practice, as a way to learn more about how AI technology may impact archival work. Results suggest that the impact may rest in how digital archival expertise is characterised, highlighted and communicated to interested parties. The specific characteristics of the digital archival expertise and how it may evolve in the future is beyond the scope of the current work and could be explored in future research projects. The impact of AI on archival and digital preservation practice in this study may be summarised as a force that triggers an evolution in how digital archival expertise is characterised, discussed and highlighted. However, it is not acting alone: several of the focus group participants referenced the time period when digitisation “happened” in the mid-1990s in discussing potential AI adoption. As such, AI technology, combined with understanding of how to adopt a disruptive technology “better” by applying lessons learned from the digitisation era of the mid-1990s may be prompting the desire to re-evaluate and change the ways in which digital archival expertise is discussed.

AI technology, along with the desire to apply lessons learned from the implementation of mass digitisation, are acting as a trigger for archival practitioners to re-evaluate the way they discuss their contributions and their practice, which may have continued knock on effects for activities such as advocacy and outreach. Of course, AI technology will impact workflows and archival activities, but that was not found to be of most concern to our participants when we spoke to them.

Lastly, the results can be situated in the context of a social construction of technology (SCOT) perspective to produce greater insight. According to Baym (2015), in brief, the SCOT perspective “focuses on how technologies arise from social processes” (p. 44). Applied to our example, a SCOT perspective would have investigated the social contexts of archivists to understand their use and adoption of AI. We would not say that we explicitly used a SCOT theoretical framework to organise this study – we have focused on opinions and perceptions which are only part of a greater social context. That being said, we can use SCOT as one lens through which to view our results.

In contrast to SCOT, technological determinism aligns with the belief that technology changes us. We would not go as far as to qualify participant concerns in this study as techno deterministic – AI technology is not causing a change in expertise. In contrast, AI technology combined with hindsight about mass digitisation as a disruptive technology is affording archival practitioners the opportunity to reinterpret their concepts of digital archival expertise. This perspective is more aligned with a SCOT perspective (Baym, 2015). Future research could explore this overlap using an explicit SCOT theoretical framework. In addition, we would argue that any AI adoption in the archives sector will need to address the issue of evolving concepts of digital archival expertise to move forward with large scale adoption of any new technology.

Conclusion

We began this project posing the following questions: what are the thoughts and opinions of archival and digital preservation practitioners concerning the use of AI technology in their work? In what ways can these thoughts and opinions help us understand the impact of AI technology on archives and digital preservation work? We conducted four focus groups across two phases of data collection. As a result, findings were not generalisable. Using template analysis, we were able to identify four themes from the data, which suggests that AI technology combined with hindsight about digitisation as a disruptive technology may prompt archival and digital preservation practitioners to change the way they characterise and communicate digital archival expertise. The specific content of this expertise is beyond the scope of this article but could be addressed in future work. Investigating further the adoption of AI in specific archival contexts would benefit the understanding we have of the evolution of digital archival expertise, going beyond sentiment and perception mapping.

Demographics of the focus group participants

Participant numberJob title and sectorGender identityLocation
Phase 1 focus group: All staff from the same national institution
FG1
PD101Digital Preservation ManagerMaleIreland
PD102Digitisation Programme ManagerMaleIreland
PD103Assistant Keeper, Digital CollectingFemaleIreland
PD104Assistant Keeper, Digital CollectingFemaleIreland
Phase 2 focus groups: Mixed participants from different organisations
FG2
PD201Researcher, Archival ProjectFemaleIreland
PD202Archivist, University Special CollectionsFemaleIreland
PD203Digital Curator, Private BusinessFemaleIreland
PD204Digital Curator, Government LibraryFemaleDenmark
PD205Archives Manager, Government ArchiveMaleEngland
FG3
PD301Digital Humanities Librarian, UniversityFemaleUSA
PD302User Manager Assistant, Library VendorFemaleChina
PD303User Services Assistant, Library VendorFemaleChina
FG4
PD401Project Manager, Archives/University Collaborative ProjectMaleIreland
PD402Digital curator, MuseumFemaleEngland
PD403Digital curator, Moving Image ArchiveMaleIreland
PD404Library and Digitisation Manager, MuseumFemaleUSA

References

Barucci, A., Cucci, C., Franci, M., Loschiavo, M. and Argenti, F. (2021), “A deep learning approach to ancient Egyptian hieroglyphs classification”, IEEE Access, Vol. 9, pp. 123438-123447, doi: 10.1109/ACCESS.2021.3110082.

Baym, N.K. (2015), Personal Connections in the Digital Age, John Wiley & Sons, Cambridge.

Bellman, R. (1978), An Introduction to Artificial Intelligence: Can Computers Think?, Boyd & Fraser Publishing Company, San Francisco.

Benoit, E. III. (2017), “#MPLP Part 1: comparing domain expert and novice social tags in a minimally processed digital archives”, The American Archivist, Vol. 80 No. 2, pp. 407-438, doi: 10.17723/0360-9081-80.2.407.

Benoit, E. III. (2018), “#MPLP Part 2: replacing item-level metadata with user-generated social tags”, The American Archivist, Vol. 81 No. 1, pp. 38-64, doi: 10.17723/0360-9081-81.1.38.

Bermeitinger, B., Freitas, A., Donig, S. and Handschuh, S. (2016), “Object classification in images of neoclassical furniture using deep learning”, in Bozic, B., Mendel-Gleason, G., Debruyne, C. and O'Sullivan, D. (Eds), Computational History and Data-Driven Humanities, Springer Publishing, Cham, pp. 109-112, doi: 10.1007/978-3-319-46224-0_10.

Braun, V. and Clarke, V. (2021), Thematic Analysis: A Practical Guide, SAGE, Thousand Oaks.

Brügger, N. (2016), “Digital humanities”, Pooley, J.D. and Rothenbuhler, E.W. (Eds), The International Encyclopedia of Communication Theory and Philosophy, John Wiley & Sons.

Bunn, J. (2020), “Working in contexts for which transparency is important: a recordkeeping view of explainable artificial intelligence (XAI)”, Records Management Journal, Vol. 30 No. 2, pp. 143-153, doi: 10.1108/RMJ-08-2019-0038.

Carlsen, B. and Glenton, C. (2011), “What about N? A methodological study of sample-size reporting in focus group studies”, BMC Medical Research Methodology, Vol. 11 No. 26, pp. 1-10.

Chabin, M.A. (2020), “The potential for collaboration between AI and archival science in processing data from the French great national debate”, Records Management Journal, Vol. 30 No. 2, pp. 241-252, doi: 10.1108/RMJ-08-2019-0042.

Clanuwat, T., Lamb, A. and Kitamoto, A. (2019), “KuroNet: pre-modern Japanese Kuzushiji character recognition with deep learning”, 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 607-614, doi: 10.1109/ICDAR.2019.00103.

Colavizza, G., Blanke, T., Jeurgens, C. and Noordegraaf, J. (2022), “Archives and AI: an overview of current debates and future perspectives”, Journal on Computing and Cultural Heritage, Vol. 15 No. 1, pp. 1-15, doi: 10.1145/3479010.

Crawford, K. (2021), The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence, Yale University Press, New Haven.

Duff, W.M., Flinn, A., Suurtamm, K.E. and Wallace, D.A. (2013), “Social justice impact of archives: a preliminary investigation”, Archival Science, Springer, Vol. 13 No. 4, pp. 317-348.

Eveleigh, A. (2014), “Crowding out the archivist? Locating crowdsourcing within the broader landscape of participatory archives”, in Ridge, M. (Ed.), Crowdsourcing Our Cultural Heritage, Ashgate Publishing, Farnham, pp. 211-212.

Garcia, N. and Vogiatzis, G. (2019), “How to read paintings: semantic art understanding with multi-modal retrieval”, in Leal-Taixé, L. and Roth, S. (Eds), Computer Vision – ECCV 2018 Workshops, Springer, Cham, pp. 676-691.

Gefen, A., Saint-Raymond, L. and Venturini, T. (2021), “AI for digital humanities and computational social sciences”, in Braunschweig, B. and Ghallab, M. (Eds), Reflections on Artificial Intelligence for Humanity, Springer, Cham, pp. 191-202.

Glaser, B.G. and Strauss, A.L. (2017), The Discovery of Grounded Theory: Strategies for Qualitative Research, Routledge, New York, doi: 10.4324/9780203793206.

Greene, M. and Meissner, D. (2005), “More product, less process: revamping traditional archival processing”, The American Archivist, Vol. 68 No. 2, pp. 208-263.

Hao, K. (2020), “We read the paper that forced Timnit Gebru out of Google. Here's what it says”, MIT Technology Review, 4 December, available at: https://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-paper-forced-out-timnit-gebru/(accessed 2 August 2022).

Jaillant, L. (2022), “How can we make born-digital and digitised archives more accessible? Identifying obstacles and solutions”, Archival Science, Vol. 22, pp. 417-436.

Jaillant, L. and Caputo, A. (2022), “Unlocking digital archives: cross-disciplinary perspectives on AI and born-digital data”, AI and Society, Vol. 37 No. 3, pp. 823-835, doi: 10.1007/s00146-021-01367-x.

Jain, N., Bartz, C., Bredow, T., Metzenthin, E., Otholt, J. and Krestel, R. (2021), “Semantic analysis of cultural heritage data: aligning paintings and descriptions in art-historic collections”, in Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J. and Vezzani, R. (Eds), Pattern Recognition. ICPR International Workshops and Challenges, Springer Publishing, Cham, Vol. 12663, pp. 517-530.

Kestemont, M., de Pauw, G., van Nie, R. and Daelemans, W. (2017), “Lemmatization for variation-rich languages using deep learning”, Digital Scholarship in the Humanities, Vol. 32 No. 4, pp. 797-815, doi: 10.1093/llc/fqw034.

King, N. (2012), “Doing template Analysis”, in Qualitative Organizational Research: Core Methods and Current Challenges, SAGE, London, pp. 426-450.

King, N. and Brooks, J.M. (2017), Template Analysis for Business and Management Students, Template Analysis for Business and Management Students, SAGE, London.

Lee, C.A. (2018), “Computer-assisted appraisal and selection of archival materials”, 2018 IEEE International Conference on Big Data (Big Data), pp. 2721-2724.

Marciano, R., Lemieux, V., Hedges, M., Esteva, M., Underwood, W., Kurtz, M. and Conrad, M. (2018), “Archival records and training in the age of big data”, in Percell, J., Sarin, L.C., Jaeger, P.T. and Carlo Bertot, J. (Eds), Re-Envisioning the MLS: Perspectives on the Future of Library and Information Science Education, Emerald Publishing, Vol. 44B, pp. 179-199.

Morgan, D. (1998), “What do you get from focus groups?”, in The Focus Group Guidebook, SAGE Publications, Thousand Oaks, pp. 55-64.

Moss, M., Thomas, D. and Gollins, T. (2018), “The reconfiguration of the archive as data to Be mined”, Archivaria, Association of Canadian Archivists, Vol. 86 No. 86, pp. 118-151.

Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E. and Varner, S. (2019a), “Final report --- always Already computational: collections as data”, Zenodo. doi: 10.5281/zenodo.3152935.

Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E. and Varner, S. (2019b), “Santa Barbara statement on collections as data --- always Already computational: collections as data”, Zenodo. doi: 10.5281/zenodo.3066209.

Payne, N. (2018), “Stirring the cauldron: redefining computational archival science (CAS) for the big data domain”, 2018 IEEE International Conference on Big Data (Big Data), IEEE, Seattle, pp. 2743-2752.

Poole, A.H. and Garwood, D.A. (2018), “‘Natural allies’: librarians, archivists, and big data in international digital humanities project work”, Journal of Documentation, Emerald Publishing, Vol. 74 No. 4, pp. 804-826, 10.1108/JD-10-2017-0137.

Proctor, J. and Marciano, R. (2021), “An AI-assisted framework for rapid conversion of descriptive photo metadata into linked data”, 2021 IEEE International Conference on Big Data (Big Data), IEEE, Orlando, USA, pp. 2255-2261.

Punzalan, R.L. and Caswell, M. (2016), “Critical directions for archival approaches to social justice”, The Library Quarterly, The University of Chicago Press, Vol. 86 No. 1, pp. 25-42, doi: 10.1086/684145.

Randby, T. and Marciano, R. (2020), “Digital curation and machine learning experimentation in archives”, 2020 IEEE International Conference on Big Data (Big Data), IEEE, Atlanta, pp. 1904-1913.

Roeschley, A. and Benoit, E.I. (2019), “Chapter 14. Degrees of mediation: a review of the intersectionality between community and participatory archives”, Participatory Archives, 1st ed., Facet Publishing.

Rolan, G., Humphries, G., Jeffrey, L., Samaras, E., Antsoupova, T. and Stuart, K.J. (2019), “More human than human? Artificial intelligence in the archive”, Archives and Manuscripts, Vol. 47 No. 2, pp. 179-203, doi: 10.1080/01576895.2018.1502088.

Sabharwal, A. (2017), “Digital humanities and the emerging framework for digital curation”, College and Undergraduate Libraries, Routledge, Vol. 24 Nos 2-4, pp. 238-256, doi: 10.1080/10691316.2017.1336953.

Springstein, M., Schneider, S., Rahnama, J., Hüllermeier, E., Kohle, H. and Ewerth, R. (2021), “iART: a search engine for art-historical images to support research in the humanities”, Proceedings of the 29th ACM International Conference on Multimedia, Association for Computing Machinery, NY, pp. 2801-2803.

Steinmeier, D. (2020), “Diversity, inclusion, and digital preservation”, Patterns, Vol. 1 No. 9, pp. 1-4, doi: 10.1016/j.patter.2020.100152.

Susskind, R.E. and Susskind, D. (2015), The Future of the Professions: How Technology Will Transform the Work of Human Experts, Oxford University Press, Oxford.

Tanasescu, C., Kesarwani, V. and Inkpen, D. (2018), “Metaphor detection by deep learning and the place of poetic metaphor in digital humanities”, The Thirty-First International Florida Artificial Intelligence Research Society Conference (FLAIRS-31), Association for the Advancement of Artificial Intelligence, pp. 122-127.

Theimer, K. (2018), “It's the end of the archival profession as we know it, and I feel fine”, in Archival Futures, Facet Publishing, London, pp. 1-17.

Wevers, M. and Smits, T. (2020), “The visual digital turn: using neural networks to study historical images”, Digital Scholarship in the Humanities, Vol. 35 No. 1, pp. 194-207, doi: 10.1093/llc/fqy085.

Acknowledgements

This work was conducted with the financial support of the Science Foundation Ireland Centre for Research Training in Digitally-Enhanced Reality (d-real) under Grant No. 18/CRT/6224.

The authors would also like to thank Stephen Howell of Microsoft Ireland for his support with using Microsoft Azure to request tags and descriptions for the phase one focus group prompts.

In addition, the authors would like to thank the following students who assisted with data collection in the study: Rachael Agnew, MacKenzie Barry, Nancy Bruseker, Sinead Carey, Emma, Carroll, Lauren Caravati, Na Chen, Caroline Crowther, Aoife Cummins Georghiou, Marc Dagohoy, Desree Efamaui, Haichuan Feng, Laura Finucane, Nathan Fitzmaurice, Conor Greene, Yazhou He, Yuhan Jiang, Joang, Zhou, Grainne Kavanagh, Kate Keane, Mark Keleghan, Miao Li, Danyang Liu, Xijia Liu, Siqi Liu, Hannah Lynch, Conor Murphy, Niamh Elizabeth Murphy, Rebecca Murphy, Kyanna Murray, Kayse Nation, Blaithin NiChathain, Roisin O'Brien, Niall O'Flynn, Abigail Raebig, Bernadette Ryan, Emma Rothwell, John Francis Sharpe, Lin Shuhua, Zhongqian Wang, Robin Wharton, Zhillin Wei, India Wood, Bingye Wu, Deyan Zhang, Zhongwen Zheng and Zheyuan Zhang.

Corresponding author

Amber L. Cushing is the corresponding author and can be contacted at: amber.cushing@ucd.ie

About the authors

Amber L. Cushing is a Lecturer/Assistant Professor at the School of Information and Communication Studies at University College Dublin. Her main area of interest is in exploring the context of maintaining digital information over time at the institutional and personal level. She holds a Phd in Library and Information Science from the University of North Carolina at Chapel Hill, an MLIS from Simmons Graduate School of Library and Information Science and a BA in History from Mount Holyoke College.

Giulia Osti is a doctoral student and the School of Information and Communication Studies at the University College Dublin; her work is funded by the SFI Centre for Research Training in Digitally-Enhanced Reality (d-real). She is researching the interaction between digital curation practices and artificial intelligence, under the supervision of Dr Amber Cushing (University College Dublin) and of Assoc. Prof. Suzanne Little (Dublin City University).

Related articles