Search results

1 – 10 of 41
Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 10 January 2024

Mario Gonzalez-Fuentes, Jonathan Ross Gilbert, Robert F. Scherer and Carlos Iglesias-Fernandez

A pronounced rise in postpandemic immigration is creating consumption opportunities and challenges for countries worldwide. Past research has shown that immigrant homeownership…

30

Abstract

Purpose

A pronounced rise in postpandemic immigration is creating consumption opportunities and challenges for countries worldwide. Past research has shown that immigrant homeownership indicates advanced consumer acculturation. However, critical factors which differentiate immigrant decisions to purchase a home remain underexplored. This study aims to examine the importance of different identity resources in determining homeownership gaps between immigrant groups in Spain during a dynamic decade.

Design/methodology/approach

A mixed methods research design with triangulation was used. First, the critical “historical research method” is used to empirically assess 15,465 household-level microdata files from the National Immigrant Survey of Spain. Second, the analysis is corroborated through informant interviews, an evaluation of digital news archives and other historical traces such as relevant advertisements in Spain from 2000 to 2009.

Findings

Results provided an account of immigrant homeownership whereby foreign-born consumers leveraged resources to promote social identities aligned with an advanced level of acculturation through housing investment during this period. Furthermore, marketing focused on specific targets of ethnic minority consumers coupled with government policies to promote immigrant homeownership reinforced the “Spanish Dream” as a new paradigm for housing market integration.

Originality/value

Spain provides an unprecedented historical context to explain marketing-related phenomena due to a perfect storm of immigration, job availability and integration supports. Contrary to popular wisdom, immigrant consumer homeownership gaps are not solely a result of differences in income and economic mobility, but rather an advanced acculturation outcome driven by personal and social investments in resources that lead to consumer identities.

Details

Journal of Historical Research in Marketing, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1755-750X

Keywords

Article
Publication date: 9 April 2024

Pia Borlund, Nils Pharo and Ying-Hsang Liu

The PICCH research project contributes to opening a dialogue between cultural heritage archives and users. Hence, the users are identified and their information needs, the search…

Abstract

Purpose

The PICCH research project contributes to opening a dialogue between cultural heritage archives and users. Hence, the users are identified and their information needs, the search strategies they apply and the search challenges they experience are uncovered.

Design/methodology/approach

A combination of questionnaires and interviews is used for collection of data. Questionnaire data were collected from users of three different audiovisual archives. Semi-structured interviews were conducted with two user groups: (1) scholars searching information for research projects and (2) archivists who perform their own scholarly work and search information on behalf of others.

Findings

The questionnaire results show that the archive users mainly have an academic background. Hence, scholars and archivists constitute the target group for in-depth interviews. The interviews reveal that their information needs are multi-faceted and match the information need typology by Ingwersen. The scholars mainly apply collection-specific search strategies but have in common primarily doing keyword searching, which they typically plan in advance. The archivists do less planning owing to their knowledge of the collections. All interviewees demonstrate domain knowledge, archival intelligence and artefactual literacy in their use and mastering of the archives. The search challenges they experience can be characterised as search system complexity challenges, material challenges and metadata challenges.

Originality/value

The paper provides a rare insight into the complexity of the search situation of cultural heritage archives, and the users’ multi-facetted information needs and hence contributes to the dialogue between the archives and the users.

Details

Journal of Documentation, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 16 April 2024

Abdel K. Halabi

This paper presents the biography of one of Australia’s earliest female accountants, Miss Evelyn Maude West (aka Eva). The paper uses this history sub-genre to understand the…

Abstract

Purpose

This paper presents the biography of one of Australia’s earliest female accountants, Miss Evelyn Maude West (aka Eva). The paper uses this history sub-genre to understand the significant impacts Eva West made across several fields. Eva West was not only a pioneer woman accountant but also an active philanthropist with an interest in social issues and a nature lover who promoted and encouraged an appreciation of the environment.

Design/methodology/approach

The paper leverages a diverse array of qualitative resources, responding to Carnegie and Napier's (1996) call to expand the concept of the accounting-based archive. Notably, rare nature study diaries and a book detailing camping adventures serve as poignant examples, illustrating Eva West's profound social and environmental engagement. Additionally, personal and business letters, digitised newspapers, pamphlets, annual reports, minute books and even poems contribute to the comprehensive exploration of Eva West's life and impact. Collectively, these varied sources offer a rich tapestry of evidence, facilitating the documentation of this unique narrative.

Findings

Throughout her life, Eva West made significant contributions as a pioneering woman in the field of accounting, a dedicated philanthropist and a passionate environmentalist. Together, these offer a multifaceted portrait of a well-rounded individual. With a solid foundation in accounting, Eva utilized her expertise to benefit numerous charitable organisations, leaving a lasting impact on the community. Moreover, her deep love for the environment is illustrated in nature study diaries and books documenting her camping adventures, highlighting the interconnectedness between her accounting pursuits and her commitment to environmental stewardship.

Practical implications

While previous studies briefly mention the additional contributions of early women to various organisations and movements, none provide the depth of insight seen in the portrayal of Miss Eva West. Rather than critiquing these earlier narratives, this observation presents an opportunity for further research to honour pioneering individuals for their multifaceted roles beyond accounting. Future studies could spotlight trailblazers as accountants with diverse interests and societal contributions, whether in social or environmental spheres. Additionally, this paper demonstrates how archives maintained by individuals, such as nature or travel diaries and camping books, can enrich accounting and accountability-based historical research.

Originality/value

Biographical studies in accounting have played a significant role in advancing historical research, yet there remains a call for additional studies to gain deeper insights into specific individuals. Few biographical narratives have explored how accountants integrate their professional careers with other interests, particularly highlighting the well-roundedness of individuals, especially women. Furthermore, this paper contributes to filling the gap in research that examines the intersection of accounting professionals and environmental concerns.

Details

Accounting, Auditing & Accountability Journal, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0951-3574

Keywords

Open Access
Article
Publication date: 31 July 2023

Sara Lafia, David A. Bleckley and J. Trent Alexander

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use…

Abstract

Purpose

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes the authors' approach using digital scanning, optical character recognition (OCR) and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen's Readjustment Act of 1944, also known as the G.I. Bill.

Design/methodology/approach

The authors used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers and the name and location of the bank handling the loan. The authors extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources.

Findings

The authors compared the flexible character accuracy of five OCR methods. The authors then compared the character error rate (CER) of three text extraction approaches (regular expressions, DIA and named entity recognition (NER)). The authors were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, the authors demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public.

Originality/value

The authors' workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR and DIA processes, the authors created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans and the institutions that implemented them.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 20 February 2024

Alenka Kavčič Čolić and Andreja Hari

The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To…

Abstract

Purpose

The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities.

Design/methodology/approach

In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality.

Findings

In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria.

Research limitations/implications

The trial implementations were limited to 13 project partners’ organizations only.

Originality/value

This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats.

Details

Digital Library Perspectives, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2059-5816

Keywords

Open Access
Article
Publication date: 18 April 2024

Joseph Nockels, Paul Gooding and Melissa Terras

This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI)…

Abstract

Purpose

This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI). With HTR now achieving high levels of accuracy, we consider its potential impact on our near-future information environment and knowledge of the past.

Design/methodology/approach

In undertaking a more constructivist analysis, we identified gaps in the current literature through a Grounded Theory Method (GTM). This guided an iterative process of concept mapping through writing sprints in workshop settings. We identified, explored and confirmed themes through group discussion and a further interrogation of relevant literature, until reaching saturation.

Findings

Catalogued as part of our GTM, 120 published texts underpin this paper. We found that HTR facilitates accurate transcription and dataset cleaning, while facilitating access to a variety of historical material. HTR contributes to a virtuous cycle of dataset production and can inform the development of online cataloguing. However, current limitations include dependency on digitisation pipelines, potential archival history omission and entrenchment of bias. We also cite near-future HTR considerations. These include encouraging open access, integrating advanced AI processes and metadata extraction; legal and moral issues surrounding copyright and data ethics; crediting individuals’ transcription contributions and HTR’s environmental costs.

Originality/value

Our research produces a set of best practice recommendations for researchers, data providers and memory institutions, surrounding HTR use. This forms an initial, though not comprehensive, blueprint for directing future HTR research. In pursuing this, the narrative that HTR’s speed and efficiency will simply transform scholarship in archives is deconstructed.

Article
Publication date: 18 March 2024

Shiv Shakti Ghosh and Sunil Kumar Chatterjee

This study presents a review based research framework that aims to influence memory institutions in their projects on digital storytelling from digitized ancient travel records…

Abstract

Purpose

This study presents a review based research framework that aims to influence memory institutions in their projects on digital storytelling from digitized ancient travel records. This study aims to influence research and policymaking related to design and delivery of services based on memory institutions’ collections of historical records.

Design/methodology/approach

The demonstrated research framework has been synthesized using inputs from a review of existing studies on the domain accompanied by a short survey created for collecting the opinion of selected experts. Studies demonstrating utilization of semantic web technologies and those that can influence policymaking related to digital storytelling were primarily reviewed.

Findings

The core tasks behind digital storytelling vary depending on the project goals. So, a two-part framework had to be proposed that covers the generic fundamental tasks with diverse applicability and digital storytelling related specific tasks separately. Also during the review, it was found that studies demonstrating the use of travel records for digital storytelling were less in number compared to studies using digital storytelling for tourism in general.

Originality/value

The demonstrated research framework can guide memory institutions in exposing their travel-related holdings to a wider audience using innovative semantic web technologies and open up avenues for future empirical research thereby adding to the novelty of the presented research. Also, reviews of articles on digital storytelling or digital humanities in general exist, but, review of digital storytelling initiatives focusing specifically on tourism and travel literature is scarce.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

Article
Publication date: 9 November 2023

Gustavo Candela, Nele Gabriëls, Sally Chambers, Milena Dobreva, Sarah Ames, Meghan Ferriter, Neil Fitzgerald, Victor Harbo, Katrine Hofmann, Olga Holownia, Alba Irollo, Mahendra Mahey, Eileen Manchester, Thuy-An Pham, Abigail Potter and Ellen Van Keer

The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part…

Abstract

Purpose

The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use.

Design/methodology/approach

The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data.

Findings

Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use.

Originality/value

While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collection as data principles in daily workflows following best practices and guidelines.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

1 – 10 of 41