Search results
1 – 10 of 42Joseph Nockels, Paul Gooding and Melissa Terras
This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI)…
Abstract
Purpose
This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI). With HTR now achieving high levels of accuracy, we consider its potential impact on our near-future information environment and knowledge of the past.
Design/methodology/approach
In undertaking a more constructivist analysis, we identified gaps in the current literature through a Grounded Theory Method (GTM). This guided an iterative process of concept mapping through writing sprints in workshop settings. We identified, explored and confirmed themes through group discussion and a further interrogation of relevant literature, until reaching saturation.
Findings
Catalogued as part of our GTM, 120 published texts underpin this paper. We found that HTR facilitates accurate transcription and dataset cleaning, while facilitating access to a variety of historical material. HTR contributes to a virtuous cycle of dataset production and can inform the development of online cataloguing. However, current limitations include dependency on digitisation pipelines, potential archival history omission and entrenchment of bias. We also cite near-future HTR considerations. These include encouraging open access, integrating advanced AI processes and metadata extraction; legal and moral issues surrounding copyright and data ethics; crediting individuals’ transcription contributions and HTR’s environmental costs.
Originality/value
Our research produces a set of best practice recommendations for researchers, data providers and memory institutions, surrounding HTR use. This forms an initial, though not comprehensive, blueprint for directing future HTR research. In pursuing this, the narrative that HTR’s speed and efficiency will simply transform scholarship in archives is deconstructed.
Details
Keywords
Guenter Muehlberger, Louise Seaward, Melissa Terras, Sofia Ares Oliveira, Vicente Bosch, Maximilian Bryan, Sebastian Colutto, Hervé Déjean, Markus Diem, Stefan Fiel, Basilis Gatos, Albert Greinoecker, Tobias Grüning, Guenter Hackl, Vili Haukkovaara, Gerhard Heyer, Lauri Hirvonen, Tobias Hodel, Matti Jokinen, Philip Kahle, Mario Kallio, Frederic Kaplan, Florian Kleber, Roger Labahn, Eva Maria Lang, Sören Laube, Gundram Leifert, Georgios Louloudis, Rory McNicholl, Jean-Luc Meunier, Johannes Michael, Elena Mühlbauer, Nathanael Philipp, Ioannis Pratikakis, Joan Puigcerver Pérez, Hannelore Putz, George Retsinas, Verónica Romero, Robert Sablatnig, Joan Andreu Sánchez, Philip Schofield, Giorgos Sfikas, Christian Sieber, Nikolaos Stamatopoulos, Tobias Strauß, Tamara Terbul, Alejandro Héctor Toselli, Berthold Ulreich, Mauricio Villegas, Enrique Vidal, Johanna Walcher, Max Weidemann, Herbert Wurster and Konstantinos Zagoris
An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR…
Abstract
Purpose
An overview of the current use of handwritten text recognition (HTR) on archival manuscript material, as provided by the EU H2020 funded Transkribus platform. It explains HTR, demonstrates Transkribus, gives examples of use cases, highlights the affect HTR may have on scholarship, and evidences this turning point of the advanced use of digitised heritage content. The paper aims to discuss these issues.
Design/methodology/approach
This paper adopts a case study approach, using the development and delivery of the one openly available HTR platform for manuscript material.
Findings
Transkribus has demonstrated that HTR is now a useable technology that can be employed in conjunction with mass digitisation to generate accurate transcripts of archival material. Use cases are demonstrated, and a cooperative model is suggested as a way to ensure sustainability and scaling of the platform. However, funding and resourcing issues are identified.
Research limitations/implications
The paper presents results from projects: further user studies could be undertaken involving interviews, surveys, etc.
Practical implications
Only HTR provided via Transkribus is covered: however, this is the only publicly available platform for HTR on individual collections of historical documents at time of writing and it represents the current state-of-the-art in this field.
Social implications
The increased access to information contained within historical texts has the potential to be transformational for both institutions and individuals.
Originality/value
This is the first published overview of how HTR is used by a wide archival studies community, reporting and showcasing current application of handwriting technology in the cultural heritage sector.
Details
Keywords
Foteini Valeonti, Melissa Terras and Andrew Hudson-Smith
In recent years, OpenGLAM and the broader open license movement have been gaining momentum in the cultural heritage sector. The purpose of this paper is to examine OpenGLAM from…
Abstract
Purpose
In recent years, OpenGLAM and the broader open license movement have been gaining momentum in the cultural heritage sector. The purpose of this paper is to examine OpenGLAM from the perspective of end users, identifying barriers for commercial and non-commercial reuse of openly licensed art images.
Design/methodology/approach
Following a review of the literature, the authors scope out how end users can discover institutions participating in OpenGLAM, and use case studies to examine the process they must follow to find, obtain and reuse openly licensed images from three art museums.
Findings
Academic literature has so far focussed on examining the risks and benefits of participation from an institutional perspective, with little done to assess OpenGLAM from the end users’ standpoint. The authors reveal that end users have to overcome a series of barriers to find, obtain and reuse open images. The three main barriers relate to image quality, image tracking and the difficulty of distinguishing open images from those that are bound by copyright.
Research limitations/implications
This study focusses solely on the examination of art museums and galleries. Libraries, archives and also other types of OpenGLAM museums (e.g. archaeological) stretch beyond the scope of this paper.
Practical implications
The authors identify practical barriers of commercial and non-commercial reuse of open images, outlining areas of improvement for participant institutions.
Originality/value
The authors contribute to the understudied field of research examining OpenGLAM from the end users’ perspective, outlining recommendations for end users, as well as for museums and galleries.
Details
Keywords
Paul Gooding, Melissa Terras and Linda Berube
To date, there has been little research into users of the Legal Deposit Libraries (Non-Print Works) Regulations 2013. This paper addresses that gap by presenting key findings from…
Abstract
Purpose
To date, there has been little research into users of the Legal Deposit Libraries (Non-Print Works) Regulations 2013. This paper addresses that gap by presenting key findings from the AHRC-funded Digital Library Futures project. Its purpose is to present a “user-centric” perspective on the potential future impact of the digital collections that are being created under electronic legal deposit regulations.
Design/methodology/approach
The study utilises a mixed methods case study of two academic legal deposit libraries in the United Kingdom: The Bodleian Libraries, University of Oxford; and Cambridge University Library. It combines surveys of users, web log analysis and expert interviews with librarians and cognate professionals.
Findings
User perspectives on NPLD were not fully considered in the planning and implementation of the 2013 regulations. The authors present findings from their user survey to show how contemporary tensions between user behaviour and access protocols risk limiting the instrumental value of NPLD collections, which have high perceived legacy value.
Originality/value
This is the first study to address the user context for UK Non-Print Legal Deposit. Its value lies in presenting a research-led user assessment of NPLD and in proposing “user-centric” analysis as an addition to the existing “four pillars” of legal deposit research.
Details
Keywords
Since its launch in 2007, research has been carried out on the popular social networking website Tumblr. The purpose of this paper is to identify published Tumblr-based research…
Abstract
Purpose
Since its launch in 2007, research has been carried out on the popular social networking website Tumblr. The purpose of this paper is to identify published Tumblr-based research, classify it to understand approaches and methods, and provide methodological recommendations for others.
Design/methodology/approach
Research regarding Tumblr was identified. Following a review of the literature, a classification scheme was adapted and applied, to understand research focus. Papers were quantitatively classified using open coded content analysis of method, subject, approach, and topic.
Findings
The majority of published work relating to Tumblr concentrates on conceptual issues, followed by aspects of the messages sent. This has evolved over time. Perceived benefits are the platform’s long-form text posts, ability to track tags, and the multimodal nature of the platform. Severe research limitations are caused by the lack of demographic, geo-spatial, and temporal metadata attached to individual posts, the limited Advanced Programming Interface, restricted access to data, and the large amounts of ephemeral posts on the site.
Research limitations/implications
This study focusses on Tumblr: the applicability of the approach to other media is not considered. The authors focus on published research and conference papers: there will be book content which was not found using the method. Tumblr as a platform has falling user numbers which may be of concern to researchers.
Practical implications
The authors identify practical barriers to research on the Tumblr platform including lack of metadata and access to big data, explaining why Tumblr is not as popular as Twitter in academic studies.
Social implications
This paper highlights the breadth of topics covered by social media researchers, which allows us to understand popular online platforms.
Originality/value
There has not yet been an overarching study to look at the methods and purpose of those who study Tumblr. The authors identify Tumblr-related research papers from the first appearing in 2011 July until 2015 July. The classification derived here provides a framework that can be used to analyse social media research, and in which to position Tumblr-related work, with recommendations on benefits and limitations of the platform for researchers.
Details
Keywords
Shirley A. Williams, Melissa M. Terras and Claire Warwick
Since its introduction in 2006, messages posted to the microblogging system Twitter have provided a rich dataset for researchers, leading to the publication of over a thousand…
Abstract
Purpose
Since its introduction in 2006, messages posted to the microblogging system Twitter have provided a rich dataset for researchers, leading to the publication of over a thousand academic papers. This paper aims to identify this published work and to classify it in order to understand Twitter based research.
Design/methodology/approach
Firstly the papers on Twitter were identified. Secondly, following a review of the literature, a classification of the dimensions of microblogging research was established. Thirdly, papers were qualitatively classified using open coded content analysis, based on the paper's title and abstract, in order to analyze method, subject, and approach.
Findings
The majority of published work relating to Twitter concentrates on aspects of the messages sent and details of the users. A variety of methodological approaches is used across a range of identified domains.
Research limitations/implications
This work reviewed the abstracts of all papers available via database search on the term “Twitter” and this has two major implications: the full papers are not considered and so works may be misclassified if their abstract is not clear; publications not indexed by the databases, such as book chapters, are not included. The study is focussed on microblogging, the applicability of the approach to other media is not considered.
Originality/value
To date there has not been an overarching study to look at the methods and purpose of those using Twitter as a research subject. The paper's major contribution is to scope out papers published on Twitter until the close of 2011. The classification derived here will provide a framework within which researchers studying Twitter related topics will be able to position and ground their work.
Details
Keywords
Abstract
Details
Keywords
Abstract
Details
Keywords
This study aims to update and extend previous efforts gauging the status of the quickly evolving field of digital humanities (DH). Based on a sample of directly relevant DH…
Abstract
Purpose
This study aims to update and extend previous efforts gauging the status of the quickly evolving field of digital humanities (DH). Based on a sample of directly relevant DH literature during 2005–2020 from Web of Science, the study conducts a longitudinal examination of the research output, intellectual structures and contributors.
Design/methodology/approach
The study applies bibliometric methods, social network analysis and visualization tools to conduct a longitudinal examination.
Findings
The research output and scope of DH topics has grown over time with a widening and deepening field in four major development stages. Through both term frequency and term co-occurrence relationship networks, this study further identifies four major reoccurring topics and themes of DH research: (1) collections and contents; (2) technologies, techniques, theories and methods; (3) collaboration, interdisciplinarity and support and (4) DH evolution. Finally, leading DH research contributors (authors, institutions and nations) are also identified.
Originality/value
This study utilizes a greater number of and richer subject sources than previous efforts to identify the overall intellectual structures of DH research based on key terms from titles, abstracts and author keywords. It expands on previous efforts and furthers our understanding of DH research with more recent DH literature and richer subject sources from the literature.
Details
Keywords
The purpose of this paper is to explore the current state of the text encoding initiative (TEI) community and suggests directions in which that community should strive based on…
Abstract
Purpose
The purpose of this paper is to explore the current state of the text encoding initiative (TEI) community and suggests directions in which that community should strive based on recommendations from experts in the field.
Design/methodology/approach
Looks at the history of, the present state of and future of TEI.
Findings
This column is simply exploratory, and examines issues regarding the TEI and the TEI consortium.
Practical implications
TEI is a very robust and expressive markup language used in the analysis of literature in the humanities fields. The community is encouraged to take proactive steps to ensure TEI as a viable markup language for the next 20 years, at least.
Originality/value
This column examines the enormous contribution that TEI has made to the humanities fields and explores ways in which the usage of TEI, even by non‐experts, can be expanded in order to enrich scholarship.
Details