Search results

1 – 3 of 3
Open Access
Article
Publication date: 18 April 2024

Joseph Nockels, Paul Gooding and Melissa Terras

This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI)…

Abstract

Purpose

This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI). With HTR now achieving high levels of accuracy, we consider its potential impact on our near-future information environment and knowledge of the past.

Design/methodology/approach

In undertaking a more constructivist analysis, we identified gaps in the current literature through a Grounded Theory Method (GTM). This guided an iterative process of concept mapping through writing sprints in workshop settings. We identified, explored and confirmed themes through group discussion and a further interrogation of relevant literature, until reaching saturation.

Findings

Catalogued as part of our GTM, 120 published texts underpin this paper. We found that HTR facilitates accurate transcription and dataset cleaning, while facilitating access to a variety of historical material. HTR contributes to a virtuous cycle of dataset production and can inform the development of online cataloguing. However, current limitations include dependency on digitisation pipelines, potential archival history omission and entrenchment of bias. We also cite near-future HTR considerations. These include encouraging open access, integrating advanced AI processes and metadata extraction; legal and moral issues surrounding copyright and data ethics; crediting individuals’ transcription contributions and HTR’s environmental costs.

Originality/value

Our research produces a set of best practice recommendations for researchers, data providers and memory institutions, surrounding HTR use. This forms an initial, though not comprehensive, blueprint for directing future HTR research. In pursuing this, the narrative that HTR’s speed and efficiency will simply transform scholarship in archives is deconstructed.

Open Access
Article
Publication date: 11 October 2023

Jin Gao, Julianne Nyhan, Oliver Duke-Williams and Simon Mahony

This paper presents a follow-on study that quantifies geolingual markers and their apparent connection with authorship collaboration patterns in canonical Digital Humanities (DH…

Abstract

Purpose

This paper presents a follow-on study that quantifies geolingual markers and their apparent connection with authorship collaboration patterns in canonical Digital Humanities (DH) journals. In particular, it seeks to detect patterns in authors' countries of work and languages in co-authorship networks.

Design/methodology/approach

Through an in-depth co-authorship network analysis, this study analysed bibliometric data from three canonical DH journals over a range of 52 years (1966–2017). The results are presented as visualised networks with centrality calculations.

Findings

The results suggest that while DH scholars may not collaborate as frequently as those in other disciplines, when they do so their collaborations tend to be more international than in many Science and Engineering, and Social Sciences disciplines. DH authors in some countries (e.g. Spain, Finland, Australia, Canada, and the UK) have the highest international co-author rates, while others have high national co-author rates but low international rates (e.g. Japan, the USA, and France).

Originality/value

This study is the first DH co-authorship network study that explores the apparent connection between language and collaboration patterns in DH. It contributes to ongoing debates about diversity, representation, and multilingualism in DH and academic publishing more widely.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Access

Year

Last 12 months (3)

Content type

1 – 3 of 3