Search results

1 – 10 of 936
Article
Publication date: 24 August 2010

Elizabeth A. Novara

The purpose of this paper is to address the challenges that special collections repositories face when creating digital surrogates driven by researcher demand, to link these…

2294

Abstract

Purpose

The purpose of this paper is to address the challenges that special collections repositories face when creating digital surrogates driven by researcher demand, to link these digitization issues with archival practice, and to provide recommendations for improvement.

Design/methodology/approach

Presents an overview of the development of the University of Maryland Libraries' digital imaging workflows and a critique of current practices.

Findings

A viable digital repository can be built from surrogates created in response to researcher demand, but there are limitations to this approach, with opportunity for improvement.

Research limitations/implications

As a case study, this paper is limited to one institution's perspective.

Practical implications

Provides insight into constructing and managing digitization programs at special collections repositories.

Originality/value

This paper offers a case study approach for an institutional digital repository influenced heavily by researcher demand, in contrast to a digital repository constructed with a more structured plan.

Details

OCLC Systems & Services: International digital library perspectives, vol. 26 no. 3
Type: Research Article
ISSN: 1065-075X

Keywords

Article
Publication date: 20 May 2020

Tim Hutchinson

This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal…

1274

Abstract

Purpose

This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal and sensitivity reviews, and propose functional requirements and workflow considerations for transitioning from experimental to operational use of these tools.

Design/methodology/approach

The paper has four main sections. 1) A short overview of the NLP and machine learning concepts referenced in the paper. 2) A review of the literature reporting on NLP and machine learning applied to archival processes. 3) An overview and commentary on key existing and developing tools that use NLP or machine learning techniques for archives. 4) This review and analysis will inform a discussion of functional requirements and workflow considerations for NLP and machine learning tools for archival processing.

Findings

Applications for processing e-mail have received the most attention so far, although most initiatives have been experimental or project based. It now seems feasible to branch out to develop more generalized tools for born-digital, unstructured records. Effective NLP and machine learning tools for archival processing should be usable, interoperable, flexible, iterative and configurable.

Originality/value

Most implementations of NLP for archives have been experimental or project based. The main exception that has moved into production is ePADD, which includes robust NLP features through its named entity recognition module. This paper takes a broader view, assessing the prospects and possible directions for integrating NLP tools and techniques into archival workflows.

Article
Publication date: 14 August 2017

Lesley L. Parilla, Rebecca Morgan and Christina Fidler

The purpose of this paper is to discuss three projects from three institutions that are dealing with challenges with natural sciences field documentation. Each is working to…

1441

Abstract

Purpose

The purpose of this paper is to discuss three projects from three institutions that are dealing with challenges with natural sciences field documentation. Each is working to create the collection, item and data-level description required so that researchers can fully use the data to study how biodiversity has changed over time and space. Libraries, archives and museums recognize the need to make content searchable across material type. To create online catalogs that would make this possible, ideally, all records would describe one item. Museums and libraries describe their materials at the item level; however, archives must balance the need to describe the collection as a whole alongside needs of collection materials that may require more description to reconnect with library and museum items. There is a growing determination inside of archives to increase this flow of data, particularly for the natural sciences, by creating workflows that provide additional description to make these data discoverable. This process is a bit like drilling into the earth: each level must be described before the next can be dealt with.

Design/methodology/approach

The piece describes challenges, approaches and workflows of three institutions developing deeper levels of description for archival materials that will be made available online to a specialized audience. It also describes the methods developed so that the material’s data can eventually be accessed at a more granular level and linked to related resources.

Findings

Current systems, schema and standards are adapted as necessary, and the natural sciences archival community is still working to develop best practices. However, they are getting much closer through the collaboration made possible through grants in the recent years.

Originality/value

The work described in this paper is ongoing, and best practices resulting from the work are still under development.

Details

Digital Library Perspectives, vol. 33 no. 3
Type: Research Article
ISSN: 2059-5816

Keywords

Open Access
Article
Publication date: 31 July 2023

Sara Lafia, David A. Bleckley and J. Trent Alexander

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use…

Abstract

Purpose

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes the authors' approach using digital scanning, optical character recognition (OCR) and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen's Readjustment Act of 1944, also known as the G.I. Bill.

Design/methodology/approach

The authors used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers and the name and location of the bank handling the loan. The authors extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources.

Findings

The authors compared the flexible character accuracy of five OCR methods. The authors then compared the character error rate (CER) of three text extraction approaches (regular expressions, DIA and named entity recognition (NER)). The authors were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, the authors demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public.

Originality/value

The authors' workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR and DIA processes, the authors created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans and the institutions that implemented them.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 31 January 2020

Dennis Della Corte, Wolfgang Colsman, Ben Welker and Brian Rennick

The purpose of this technical paper is to evaluate the emerging standard “Allotrope Data Format (ADF)” in the context of digital preservation at a major US academic library hosted…

Abstract

Purpose

The purpose of this technical paper is to evaluate the emerging standard “Allotrope Data Format (ADF)” in the context of digital preservation at a major US academic library hosted at Brigham Young University. In combination with the new information management system ZONTAL Space (ZS), archiving with the ADF is compared with currently used systems CONTENTdm and ROSETTA.

Design/methodology/approach

The approach is a workflow-based comparison in terms of usability, functionality and reliability of the systems. Current workflows are replaced by optimized target processes, which limit the number of involved parties and process steps. The connectors or manual solutions between the current workflow steps are replaced with automatic functions inside of ZS. Reporting functionalities inside of ZS are used to track system and file lifecycle to ensure stability and data preservation.

Findings

The authors find that the target processes leveraging ZS drastically reduce complexity compared to current workflows. Archiving with the ADF is found to decrease integration complexity and provide a more robust data migration path for the future. The possibility to enrich data automatically with metadata and to store this information alongside the content in the same information package increases reusability of the data.

Research limitations/implications

The practical implications of this work suggest the arrival of a new information management system that can potentially revolutionize the archiving landscape within libraries. Beyond the scope of the initial proof of concept, the potential for the system can be seen to replace existing data management tools and provide access to new data analytics applications, like smart recommender systems.

Originality/value

The value of this study is a systematic introduction of ZS and the ADF, two emerging solutions from the Pharmaceutical Industry, to the broader audience of digital preservation experts within US libraries. The authors consider the exchange of best practices and solutions between industries to be of high value to the communities.

Details

Digital Library Perspectives, vol. 36 no. 1
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 27 May 2020

Zack Lischer-Katz

This paper aims to explore the opportunities and challenges that immersive virtual reality (VR) technologies pose for archival theory and practice.

1237

Abstract

Purpose

This paper aims to explore the opportunities and challenges that immersive virtual reality (VR) technologies pose for archival theory and practice.

Design/methodology/approach

This conceptual paper reviews research on VR adoption in information institutions and the preservation challenges of VR to identify ways in which VR has the potential to disrupt existing archival theory and practice.

Findings

Existing archival approaches are found to be disrupted by the multi-layered structural characteristics of VR, the part–whole relationships between the technological elements of VR environments and the three-dimensional content they contain and the immersive, experiential nature of VR experiences. This paper argues that drawing on perspectives from phenomenology and digital materiality is helpful for addressing the preservation challenges of VR.

Research limitations/implications

The findings extend conceptualizations of preservation by identifying gaps in existing preservation approaches to VR and stressing the importance of “experience” as a central element of archival practice and by emphasizing the embodied dimensions of interpreting archival records and the multiple scales of materiality that archival researchers and practitioners should consider to preserve VR.

Practical implications

These findings provide guidance for digital curators and preservationists by outlining the current thinking on VR preservation and the impact of VR on digital preservation strategies.

Originality/value

This paper gives new insight into VR as an emerging area of concern to digital curation and preservation and expands archival thinking with new conceptualizations that disrupt existing paradigms.

Details

Records Management Journal, vol. 30 no. 2
Type: Research Article
ISSN: 0956-5698

Keywords

Article
Publication date: 13 June 2022

Merrion Dale

Language archiving is achieved through the detailed assessment and assemblage of various types of language data. The Computational Resource for South Asian Languages (CoRSAL) is…

1082

Abstract

Purpose

Language archiving is achieved through the detailed assessment and assemblage of various types of language data. The Computational Resource for South Asian Languages (CoRSAL) is an emerging language archive that prioritizes the accommodation of depositors who have a variety of needs with respect to both research and infrastructure. As such, the CoRSAL team uses a workflow approach that caters to this diversity. The purpose of this paper is to detail the mediated workflow for collection ingest and promotion, citing two specific examples from recently published language collections, as well as discuss specific feedback the team has received from individual depositors and language community members on the effectiveness and usefulness of these efforts thus far.

Design/methodology/approach

This paper provides an exploration of the author’s approaches to a mediated archiving workflow.

Findings

The author discusses the encouraging and constructive feedback the team has received so far and includes instances of specific communication with individuals who have recently deposited and published collections with them.

Originality/value

This is the first research paper published by anyone on our team describing our workflow. It is an expansion of a shorter conference paper presented at the LangArc 2021: 1st International Workshop on Digital Language Archives on September 30, 2021 and published in the conference proceedings

Abstract

Purpose

An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.

Design/methodology/approach

This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material.

Findings

Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified.

Research limitations/implications

The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc.

Practical implications

Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field.

Social implications

The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals.

Originality/value

This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.

Article
Publication date: 14 May 2018

Wendy Walker and Teressa Keenan

The purpose of this paper is to describe methods for restructuring workflows and efficiently using staff members and volunteers to continue work on multiple, simultaneous digital…

Abstract

Purpose

The purpose of this paper is to describe methods for restructuring workflows and efficiently using staff members and volunteers to continue work on multiple, simultaneous digital collections as budgets and resources decline.

Design/methodology/approach

This paper describes one library’s varied approaches to several digital collections supported by literature or volunteers in libraries.

Findings

In the face of continually declining resources and new, time-sensitive priorities and compliance responsibilities, librarians can continue to maintain digital collections by modifying workflows, using the services of volunteers and communicating strategically.

Practical implications

This paper is relevant to librarians, archivists and others who are looking for ways to justify and capitalize on the use of unconventional personnel in digital collections programs.

Originality/value

This paper presents a case of the successful use of volunteers to accomplish digital collections-related tasks in an academic library and provides a communication-based strategy for addressing some of the challenges related to volunteers in academic libraries.

Details

Digital Library Perspectives, vol. 34 no. 2
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 17 August 2021

Nathan Moles

Conventional approaches to digital preservation posit that archives should define a Designated Community, or future user group, for whom they preserve digital information…

Abstract

Purpose

Conventional approaches to digital preservation posit that archives should define a Designated Community, or future user group, for whom they preserve digital information. Archivists can then use their knowledge of these users as a reference to help them deliver digital information that is intelligible and usable. However, this approach is challenging for archives with mandates to serve wide and diverse audiences; these archives risk undermining their efforts by focusing on the interests of a narrow user group.

Design/methodology/approach

A unique approach to this challenge was developed in the context of a project to build a digital preservation program at the Ontario Jewish Archives (OJA). It draws from previous research on this topic and is based on a combination of practical and theoretical considerations.

Findings

The approach described here replaces the reference of a Designated Community with three core components: a re-articulation of the Open Archival Information System (OAIS) mandatory responsibilities; the identification of three distinct tiers of access for digital records; and the implementation of an access portal that allows digital records to be accessed and rendered online. Together with supplemental shifts in reference points, they provide an alternative to the concept of a Designated Community in the determination of preservation requirements, the identification of significant properties, the creation of Representation Information and in the evaluation of success.

Originality/value

This article contributes a novel approach to the ongoing conversation about the Designated Community in digital preservation, its application and its limitations in an archival context.

Details

Journal of Documentation, vol. 78 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of 936