Search results

1 – 10 of 42

Open Access

Article

Publication date: 18 April 2024

The implications of handwritten text recognition for accessing the past at scale

Joseph Nockels, Paul Gooding and Melissa Terras

This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI)…

HTML

PDF (230 KB)

Downloads

401

Abstract

Purpose

This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI). With HTR now achieving high levels of accuracy, we consider its potential impact on our near-future information environment and knowledge of the past.

Design/methodology/approach

In undertaking a more constructivist analysis, we identified gaps in the current literature through a Grounded Theory Method (GTM). This guided an iterative process of concept mapping through writing sprints in workshop settings. We identified, explored and confirmed themes through group discussion and a further interrogation of relevant literature, until reaching saturation.

Findings

Catalogued as part of our GTM, 120 published texts underpin this paper. We found that HTR facilitates accurate transcription and dataset cleaning, while facilitating access to a variety of historical material. HTR contributes to a virtuous cycle of dataset production and can inform the development of online cataloguing. However, current limitations include dependency on digitisation pipelines, potential archival history omission and entrenchment of bias. We also cite near-future HTR considerations. These include encouraging open access, integrating advanced AI processes and metadata extraction; legal and moral issues surrounding copyright and data ethics; crediting individuals’ transcription contributions and HTR’s environmental costs.

Originality/value

Our research produces a set of best practice recommendations for researchers, data providers and memory institutions, surrounding HTR use. This forms an initial, though not comprehensive, blueprint for directing future HTR research. In pursuing this, the narrative that HTR’s speed and efficiency will simply transform scholarship in archives is deconstructed.

Details

Journal of Documentation, vol. 80 no. 7

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

Open Access

Article

Publication date: 23 July 2019

Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study

Guenter Muehlberger, Louise Seaward, Melissa Terras, Sofia Ares Oliveira, Vicente Bosch, Maximilian Bryan, Sebastian Colutto, Hervé Déjean, Markus Diem, Stefan Fiel, Basilis Gatos, Albert Greinoecker, Tobias Grüning, Guenter Hackl, Vili Haukkovaara, Gerhard Heyer, Lauri Hirvonen, Tobias Hodel, Matti Jokinen, Philip Kahle, Mario Kallio, Frederic Kaplan, Florian Kleber, Roger Labahn, Eva Maria Lang, Sören Laube, Gundram Leifert, Georgios Louloudis, Rory McNicholl, Jean-Luc Meunier, Johannes Michael, Elena Mühlbauer, Nathanael Philipp, Ioannis Pratikakis, Joan Puigcerver Pérez, Hannelore Putz, George Retsinas, Verónica Romero, Robert Sablatnig, Joan Andreu Sánchez, Philip Schofield, Giorgos Sfikas, Christian Sieber, Nikolaos Stamatopoulos, Tobias Strauß, Tamara Terbul, Alejandro Héctor Toselli, Berthold Ulreich, Mauricio Villegas, Enrique Vidal, Johanna Walcher, Max Weidemann, Herbert Wurster and Konstantinos Zagoris

An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR…

HTML

PDF (832 KB)

Downloads

10844

Abstract

Purpose

An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.

Design/methodology/approach

This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material.

Findings

Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified.

Research limitations/implications

The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc.

Practical implications

Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field.

Social implications

The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals.

Originality/value

This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.

Details

Journal of Documentation, vol. 75 no. 5

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 1 November 2019

How open is OpenGLAM? Identifying barriers to commercial and non-commercial reuse of digitised art images

Foteini Valeonti, Melissa Terras and Andrew Hudson-Smith

In recent years, OpenGLAM and the broader open license movement have been gaining momentum in the cultural heritage sector. The purpose of this paper is to examine OpenGLAM from…

HTML

PDF (819 KB)

Downloads

768

Abstract

Purpose

In recent years, OpenGLAM and the broader open license movement have been gaining momentum in the cultural heritage sector. The purpose of this paper is to examine OpenGLAM from the perspective of end users, identifying barriers for commercial and non-commercial reuse of openly licensed art images.

Design/methodology/approach

Following a review of the literature, the authors scope out how end users can discover institutions participating in OpenGLAM, and use case studies to examine the process they must follow to find, obtain and reuse openly licensed images from three art museums.

Findings

Academic literature has so far focussed on examining the risks and benefits of participation from an institutional perspective, with little done to assess OpenGLAM from the end users’ standpoint. The authors reveal that end users have to overcome a series of barriers to find, obtain and reuse open images. The three main barriers relate to image quality, image tracking and the difficulty of distinguishing open images from those that are bound by copyright.

Research limitations/implications

This study focusses solely on the examination of art museums and galleries. Libraries, archives and also other types of OpenGLAM museums (e.g. archaeological) stretch beyond the scope of this paper.

Practical implications

The authors identify practical barriers of commercial and non-commercial reuse of open images, outlining areas of improvement for participant institutions.

Originality/value

The authors contribute to the understudied field of research examining OpenGLAM from the end users’ perspective, outlining recommendations for end users, as well as for museums and galleries.

Details

Journal of Documentation, vol. 76 no. 1

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 31 March 2021

Identifying the future direction of legal deposit in the United Kingdom: The Digital Library Futures approach

Paul Gooding, Melissa Terras and Linda Berube

To date, there has been little research into users of the Legal Deposit Libraries (Non-Print Works) Regulations 2013. This paper addresses that gap by presenting key findings from…

HTML

PDF (713 KB)

Downloads

574

Abstract

Purpose

To date, there has been little research into users of the Legal Deposit Libraries (Non-Print Works) Regulations 2013. This paper addresses that gap by presenting key findings from the AHRC-funded Digital Library Futures project. Its purpose is to present a “user-centric” perspective on the potential future impact of the digital collections that are being created under electronic legal deposit regulations.

Design/methodology/approach

The study utilises a mixed methods case study of two academic legal deposit libraries in the United Kingdom: The Bodleian Libraries, University of Oxford; and Cambridge University Library. It combines surveys of users, web log analysis and expert interviews with librarians and cognate professionals.

Findings

User perspectives on NPLD were not fully considered in the planning and implementation of the 2013 regulations. The authors present findings from their user survey to show how contemporary tensions between user behaviour and access protocols risk limiting the instrumental value of NPLD collections, which have high perceived legacy value.

Originality/value

This is the first study to address the user context for UK Non-Print Legal Deposit. Its value lies in presenting a research-led user assessment of NPLD and in proposing “user-centric” analysis as an addition to the existing “four pillars” of legal deposit research.

Details

Journal of Documentation, vol. 77 no. 5

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 8 May 2017

What people study when they study Tumblr: Classifying Tumblr-related academic research

Rose Attu and Melissa Terras

Since its launch in 2007, research has been carried out on the popular social networking website Tumblr. The purpose of this paper is to identify published Tumblr-based research…

HTML

PDF (386 KB)

Downloads

2503

Abstract

Purpose

Since its launch in 2007, research has been carried out on the popular social networking website Tumblr. The purpose of this paper is to identify published Tumblr-based research, classify it to understand approaches and methods, and provide methodological recommendations for others.

Design/methodology/approach

Research regarding Tumblr was identified. Following a review of the literature, a classification scheme was adapted and applied, to understand research focus. Papers were quantitatively classified using open coded content analysis of method, subject, approach, and topic.

Findings

The majority of published work relating to Tumblr concentrates on conceptual issues, followed by aspects of the messages sent. This has evolved over time. Perceived benefits are the platform’s long-form text posts, ability to track tags, and the multimodal nature of the platform. Severe research limitations are caused by the lack of demographic, geo-spatial, and temporal metadata attached to individual posts, the limited Advanced Programming Interface, restricted access to data, and the large amounts of ephemeral posts on the site.

Research limitations/implications

This study focusses on Tumblr: the applicability of the approach to other media is not considered. The authors focus on published research and conference papers: there will be book content which was not found using the method. Tumblr as a platform has falling user numbers which may be of concern to researchers.

Practical implications

The authors identify practical barriers to research on the Tumblr platform including lack of metadata and access to big data, explaining why Tumblr is not as popular as Twitter in academic studies.

Social implications

This paper highlights the breadth of topics covered by social media researchers, which allows us to understand popular online platforms.

Originality/value

There has not yet been an overarching study to look at the methods and purpose of those who study Tumblr. The authors identify Tumblr-related research papers from the first appearing in 2011 July until 2015 July. The classification derived here provides a framework that can be used to analyse social media research, and in which to position Tumblr-related work, with recommendations on benefits and limitations of the platform for researchers.

Details

Journal of Documentation, vol. 73 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 10 May 2013

What do people study when they study Twitter? Classifying Twitter related academic papers

Shirley A. Williams, Melissa M. Terras and Claire Warwick

Since its introduction in 2006, messages posted to the microblogging system Twitter have provided a rich dataset for researchers, leading to the publication of over a thousand…

HTML

PDF (194 KB)

Downloads

5540

Abstract

Purpose

Since its introduction in 2006, messages posted to the microblogging system Twitter have provided a rich dataset for researchers, leading to the publication of over a thousand academic papers. This paper aims to identify this published work and to classify it in order to understand Twitter based research.

Design/methodology/approach

Firstly the papers on Twitter were identified. Secondly, following a review of the literature, a classification of the dimensions of microblogging research was established. Thirdly, papers were qualitatively classified using open coded content analysis, based on the paper's title and abstract, in order to analyze method, subject, and approach.

Findings

The majority of published work relating to Twitter concentrates on aspects of the messages sent and details of the users. A variety of methodological approaches is used across a range of identified domains.

Research limitations/implications

This work reviewed the abstracts of all papers available via database search on the term “Twitter” and this has two major implications: the full papers are not considered and so works may be misclassified if their abstract is not clear; publications not indexed by the databases, such as book chapters, are not included. The study is focussed on microblogging, the applicability of the approach to other media is not considered.

Originality/value

To date there has not been an overarching study to look at the methods and purpose of those using Twitter as a research subject. The paper's major contribution is to scope out papers published on Twitter until the close of 2011. The classification derived here will provide a framework within which researchers studying Twitter related topics will be able to position and ground their work.

Details

Journal of Documentation, vol. 69 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

Content available

Article

Publication date: 7 August 2009

Digital Images for the Information Professional

Philip Calvert

HTML

Downloads

140

Details

The Electronic Library, vol. 27 no. 4

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Content available

Article

Publication date: 4 September 2009

Digital Images for the Information Professional

Zinaida Manžuch

HTML

Downloads

167

Details

Journal of Documentation, vol. 65 no. 5

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 24 August 2021

Research output, intellectual structures and contributors of digital humanities research: a longitudinal analysis 2005–2020

Fangli Su and Yin Zhang

This study aims to update and extend previous efforts gauging the status of the quickly evolving field of digital humanities (DH). Based on a sample of directly relevant DH…

HTML

PDF (3.2 MB)

Downloads

561

Abstract

Purpose

This study aims to update and extend previous efforts gauging the status of the quickly evolving field of digital humanities (DH). Based on a sample of directly relevant DH literature during 2005–2020 from Web of Science, the study conducts a longitudinal examination of the research output, intellectual structures and contributors.

Design/methodology/approach

The study applies bibliometric methods, social network analysis and visualization tools to conduct a longitudinal examination.

Findings

The research output and scope of DH topics has grown over time with a widening and deepening field in four major development stages. Through both term frequency and term co-occurrence relationship networks, this study further identifies four major reoccurring topics and themes of DH research: (1) collections and contents; (2) technologies, techniques, theories and methods; (3) collaboration, interdisciplinarity and support and (4) DH evolution. Finally, leading DH research contributors (authors, institutions and nations) are also identified.

Originality/value

This study utilizes a greater number of and richer subject sources than previous efforts to identify the overall intellectual structures of DH research based on key terms from titles, abstracts and author keywords. It expands on previous efforts and furthers our understanding of DH research with more recent DH literature and richer subject sources from the literature.

Details

Journal of Documentation, vol. 78 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 30 May 2008

Library as virtual abbey

Robert Fox

The purpose of this paper is to explore the current state of the text encoding initiative (TEI) community and suggests directions in which that community should strive based on…

HTML

PDF (55 KB)

Downloads

316

Abstract

Purpose

The purpose of this paper is to explore the current state of the text encoding initiative (TEI) community and suggests directions in which that community should strive based on recommendations from experts in the field.

Design/methodology/approach

Looks at the history of, the present state of and future of TEI.

Findings

This column is simply exploratory, and examines issues regarding the TEI and the TEI consortium.

Practical implications

TEI is a very robust and expressive markup language used in the analysis of literature in the humanities fields. The community is encouraged to take proactive steps to ensure TEI as a viable markup language for the next 20 years, at least.

Originality/value

This column examines the enormous contribution that TEI has made to the humanities fields and explores ways in which the usage of TEI, even by non‐experts, can be expanded in order to enrich scholarship.

Details

OCLC Systems & Services: International digital library perspectives, vol. 24 no. 2

Type: Research Article

DOI:

ISSN: 1065-075X

Keywords

Access

Year

Content type

1 – 10 of 42

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page