Search results
1 – 10 of 41Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…
Abstract
Purpose
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.
Design/methodology/approach
This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.
Findings
The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.
Originality/value
To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.
Details
Keywords
Mario Gonzalez-Fuentes, Jonathan Ross Gilbert, Robert F. Scherer and Carlos Iglesias-Fernandez
A pronounced rise in postpandemic immigration is creating consumption opportunities and challenges for countries worldwide. Past research has shown that immigrant homeownership…
Abstract
Purpose
A pronounced rise in postpandemic immigration is creating consumption opportunities and challenges for countries worldwide. Past research has shown that immigrant homeownership indicates advanced consumer acculturation. However, critical factors which differentiate immigrant decisions to purchase a home remain underexplored. This study aims to examine the importance of different identity resources in determining homeownership gaps between immigrant groups in Spain during a dynamic decade.
Design/methodology/approach
A mixed methods research design with triangulation was used. First, the critical “historical research method” is used to empirically assess 15,465 household-level microdata files from the National Immigrant Survey of Spain. Second, the analysis is corroborated through informant interviews, an evaluation of digital news archives and other historical traces such as relevant advertisements in Spain from 2000 to 2009.
Findings
Results provided an account of immigrant homeownership whereby foreign-born consumers leveraged resources to promote social identities aligned with an advanced level of acculturation through housing investment during this period. Furthermore, marketing focused on specific targets of ethnic minority consumers coupled with government policies to promote immigrant homeownership reinforced the “Spanish Dream” as a new paradigm for housing market integration.
Originality/value
Spain provides an unprecedented historical context to explain marketing-related phenomena due to a perfect storm of immigration, job availability and integration supports. Contrary to popular wisdom, immigrant consumer homeownership gaps are not solely a result of differences in income and economic mobility, but rather an advanced acculturation outcome driven by personal and social investments in resources that lead to consumer identities.
Details
Keywords
Pia Borlund, Nils Pharo and Ying-Hsang Liu
The PICCH research project contributes to opening a dialogue between cultural heritage archives and users. Hence, the users are identified and their information needs, the search…
Abstract
Purpose
The PICCH research project contributes to opening a dialogue between cultural heritage archives and users. Hence, the users are identified and their information needs, the search strategies they apply and the search challenges they experience are uncovered.
Design/methodology/approach
A combination of questionnaires and interviews is used for collection of data. Questionnaire data were collected from users of three different audiovisual archives. Semi-structured interviews were conducted with two user groups: (1) scholars searching information for research projects and (2) archivists who perform their own scholarly work and search information on behalf of others.
Findings
The questionnaire results show that the archive users mainly have an academic background. Hence, scholars and archivists constitute the target group for in-depth interviews. The interviews reveal that their information needs are multi-faceted and match the information need typology by Ingwersen. The scholars mainly apply collection-specific search strategies but have in common primarily doing keyword searching, which they typically plan in advance. The archivists do less planning owing to their knowledge of the collections. All interviewees demonstrate domain knowledge, archival intelligence and artefactual literacy in their use and mastering of the archives. The search challenges they experience can be characterised as search system complexity challenges, material challenges and metadata challenges.
Originality/value
The paper provides a rare insight into the complexity of the search situation of cultural heritage archives, and the users’ multi-facetted information needs and hence contributes to the dialogue between the archives and the users.
Details
Keywords
Shaodan Sun, Jun Deng and Xugong Qin
This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…
Abstract
Purpose
This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.
Design/methodology/approach
According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.
Findings
This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.
Originality/value
Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.
Details
Keywords
This paper presents the biography of one of Australia’s earliest female accountants, Miss Evelyn Maude West (aka Eva). The paper uses this history sub-genre to understand the…
Abstract
Purpose
This paper presents the biography of one of Australia’s earliest female accountants, Miss Evelyn Maude West (aka Eva). The paper uses this history sub-genre to understand the significant impacts Eva West made across several fields. Eva West was not only a pioneer woman accountant but also an active philanthropist with an interest in social issues and a nature lover who promoted and encouraged an appreciation of the environment.
Design/methodology/approach
The paper leverages a diverse array of qualitative resources, responding to Carnegie and Napier's (1996) call to expand the concept of the accounting-based archive. Notably, rare nature study diaries and a book detailing camping adventures serve as poignant examples, illustrating Eva West's profound social and environmental engagement. Additionally, personal and business letters, digitised newspapers, pamphlets, annual reports, minute books and even poems contribute to the comprehensive exploration of Eva West's life and impact. Collectively, these varied sources offer a rich tapestry of evidence, facilitating the documentation of this unique narrative.
Findings
Throughout her life, Eva West made significant contributions as a pioneering woman in the field of accounting, a dedicated philanthropist and a passionate environmentalist. Together, these offer a multifaceted portrait of a well-rounded individual. With a solid foundation in accounting, Eva utilized her expertise to benefit numerous charitable organisations, leaving a lasting impact on the community. Moreover, her deep love for the environment is illustrated in nature study diaries and books documenting her camping adventures, highlighting the interconnectedness between her accounting pursuits and her commitment to environmental stewardship.
Practical implications
While previous studies briefly mention the additional contributions of early women to various organisations and movements, none provide the depth of insight seen in the portrayal of Miss Eva West. Rather than critiquing these earlier narratives, this observation presents an opportunity for further research to honour pioneering individuals for their multifaceted roles beyond accounting. Future studies could spotlight trailblazers as accountants with diverse interests and societal contributions, whether in social or environmental spheres. Additionally, this paper demonstrates how archives maintained by individuals, such as nature or travel diaries and camping books, can enrich accounting and accountability-based historical research.
Originality/value
Biographical studies in accounting have played a significant role in advancing historical research, yet there remains a call for additional studies to gain deeper insights into specific individuals. Few biographical narratives have explored how accountants integrate their professional careers with other interests, particularly highlighting the well-roundedness of individuals, especially women. Furthermore, this paper contributes to filling the gap in research that examines the intersection of accounting professionals and environmental concerns.
Details
Keywords
Sara Lafia, David A. Bleckley and J. Trent Alexander
Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use…
Abstract
Purpose
Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes the authors' approach using digital scanning, optical character recognition (OCR) and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen's Readjustment Act of 1944, also known as the G.I. Bill.
Design/methodology/approach
The authors used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers and the name and location of the bank handling the loan. The authors extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources.
Findings
The authors compared the flexible character accuracy of five OCR methods. The authors then compared the character error rate (CER) of three text extraction approaches (regular expressions, DIA and named entity recognition (NER)). The authors were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, the authors demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public.
Originality/value
The authors' workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR and DIA processes, the authors created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans and the institutions that implemented them.
Details
Keywords
Alenka Kavčič Čolić and Andreja Hari
The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To…
Abstract
Purpose
The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities.
Design/methodology/approach
In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality.
Findings
In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria.
Research limitations/implications
The trial implementations were limited to 13 project partners’ organizations only.
Originality/value
This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats.
Details
Keywords
Joseph Nockels, Paul Gooding and Melissa Terras
This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI)…
Abstract
Purpose
This paper focuses on image-to-text manuscript processing through Handwritten Text Recognition (HTR), a Machine Learning (ML) approach enabled by Artificial Intelligence (AI). With HTR now achieving high levels of accuracy, we consider its potential impact on our near-future information environment and knowledge of the past.
Design/methodology/approach
In undertaking a more constructivist analysis, we identified gaps in the current literature through a Grounded Theory Method (GTM). This guided an iterative process of concept mapping through writing sprints in workshop settings. We identified, explored and confirmed themes through group discussion and a further interrogation of relevant literature, until reaching saturation.
Findings
Catalogued as part of our GTM, 120 published texts underpin this paper. We found that HTR facilitates accurate transcription and dataset cleaning, while facilitating access to a variety of historical material. HTR contributes to a virtuous cycle of dataset production and can inform the development of online cataloguing. However, current limitations include dependency on digitisation pipelines, potential archival history omission and entrenchment of bias. We also cite near-future HTR considerations. These include encouraging open access, integrating advanced AI processes and metadata extraction; legal and moral issues surrounding copyright and data ethics; crediting individuals’ transcription contributions and HTR’s environmental costs.
Originality/value
Our research produces a set of best practice recommendations for researchers, data providers and memory institutions, surrounding HTR use. This forms an initial, though not comprehensive, blueprint for directing future HTR research. In pursuing this, the narrative that HTR’s speed and efficiency will simply transform scholarship in archives is deconstructed.
Details
Keywords
Shiv Shakti Ghosh and Sunil Kumar Chatterjee
This study presents a review based research framework that aims to influence memory institutions in their projects on digital storytelling from digitized ancient travel records…
Abstract
Purpose
This study presents a review based research framework that aims to influence memory institutions in their projects on digital storytelling from digitized ancient travel records. This study aims to influence research and policymaking related to design and delivery of services based on memory institutions’ collections of historical records.
Design/methodology/approach
The demonstrated research framework has been synthesized using inputs from a review of existing studies on the domain accompanied by a short survey created for collecting the opinion of selected experts. Studies demonstrating utilization of semantic web technologies and those that can influence policymaking related to digital storytelling were primarily reviewed.
Findings
The core tasks behind digital storytelling vary depending on the project goals. So, a two-part framework had to be proposed that covers the generic fundamental tasks with diverse applicability and digital storytelling related specific tasks separately. Also during the review, it was found that studies demonstrating the use of travel records for digital storytelling were less in number compared to studies using digital storytelling for tourism in general.
Originality/value
The demonstrated research framework can guide memory institutions in exposing their travel-related holdings to a wider audience using innovative semantic web technologies and open up avenues for future empirical research thereby adding to the novelty of the presented research. Also, reviews of articles on digital storytelling or digital humanities in general exist, but, review of digital storytelling initiatives focusing specifically on tourism and travel literature is scarce.
Details
Keywords
Gustavo Candela, Nele Gabriëls, Sally Chambers, Milena Dobreva, Sarah Ames, Meghan Ferriter, Neil Fitzgerald, Victor Harbo, Katrine Hofmann, Olga Holownia, Alba Irollo, Mahendra Mahey, Eileen Manchester, Thuy-An Pham, Abigail Potter and Ellen Van Keer
The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part…
Abstract
Purpose
The purpose of this study is to offer a checklist that can be used for both creating and evaluating digital collections, which are also sometimes referred to as data sets as part of the collections as data movement, suitable for computational use.
Design/methodology/approach
The checklist was built by synthesising and analysing the results of relevant research literature, articles and studies and the issues and needs obtained in an observational study. The checklist was tested and applied both as a tool for assessing a selection of digital collections made available by galleries, libraries, archives and museums (GLAM) institutions as proof of concept and as a supporting tool for creating collections as data.
Findings
Over the past few years, there has been a growing interest in making available digital collections published by GLAM organisations for computational use. Based on previous work, the authors defined a methodology to build a checklist for the publication of Collections as data. The authors’ evaluation showed several examples of applications that can be useful to encourage other institutions to publish their digital collections for computational use.
Originality/value
While some work on making available digital collections suitable for computational use exists, giving particular attention to data quality, planning and experimentation, to the best of the authors’ knowledge, none of the work to date provides an easy-to-follow and robust checklist to publish collection data sets in GLAM institutions. This checklist intends to encourage small- and medium-sized institutions to adopt the collection as data principles in daily workflows following best practices and guidelines.
Details