Search results

1 – 10 of 70
Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 31 July 2023

Sara Lafia, David A. Bleckley and J. Trent Alexander

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use…

Abstract

Purpose

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes the authors' approach using digital scanning, optical character recognition (OCR) and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen's Readjustment Act of 1944, also known as the G.I. Bill.

Design/methodology/approach

The authors used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers and the name and location of the bank handling the loan. The authors extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources.

Findings

The authors compared the flexible character accuracy of five OCR methods. The authors then compared the character error rate (CER) of three text extraction approaches (regular expressions, DIA and named entity recognition (NER)). The authors were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, the authors demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public.

Originality/value

The authors' workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR and DIA processes, the authors created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans and the institutions that implemented them.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 10 April 2023

Evagelos Varthis and Marios Poulos

This study aims to present metaGraphos, a crowdsourcing system that aids in the transcription and semantic enhancement of scanned documents by using a pool of volunteers or people…

Abstract

Purpose

This study aims to present metaGraphos, a crowdsourcing system that aids in the transcription and semantic enhancement of scanned documents by using a pool of volunteers or people willing to participate in exchange for a financial reward.

Design/methodology/approach

The metaGraphos can be used in circumstances where optical character recognition fails to produce satisfactory results, semantic tagging or assigning thematic headings to texts is considered necessary or even when ground-truth data has to be collected in raw form.

Findings

The system automatically provides a Web-based interface comprising a static HTML page and JavaScript code that displays the scanned images of the document, coupled with the corresponding incomplete texts side by side, allowing users to correct or complete the texts in parallel.

Social implications

By assisting the parallel transcription and the semantic enhancement of difficult scanned documents, the system further reveals the hidden cultural wealth and aids in knowledge dissemination, a fact that contributes significantly to the academic-scientific dialog and feedback.

Originality/value

Individual researchers, libraries and organizations in general may benefit from the system because it is cost-effective, practical and simple to set up client–server architecture that provides a reliable way to transcribe texts or revise transcriptions on a large scale.

Details

Collection and Curation, vol. 42 no. 4
Type: Research Article
ISSN: 2514-9326

Keywords

Open Access
Article
Publication date: 20 February 2024

Alenka Kavčič Čolić and Andreja Hari

The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To…

Abstract

Purpose

The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities.

Design/methodology/approach

In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality.

Findings

In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria.

Research limitations/implications

The trial implementations were limited to 13 project partners’ organizations only.

Originality/value

This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats.

Details

Digital Library Perspectives, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 10 April 2023

Santosh Abaji Kharat, Shubhada Nagarkar and Bhausaheb Panage

The purpose of this research is to introduce the Layar augmented reality (AR) application among library users and to understand the user’s satisfaction towards the information…

Abstract

Purpose

The purpose of this research is to introduce the Layar augmented reality (AR) application among library users and to understand the user’s satisfaction towards the information services provided by the Layar application with the help of the structural equation model (SEM).

Design/methodology/approach

According to Thomas (2016), action research is mainly undertaken to develop new skills or new approaches and to solve issues and problems with direct application to any applied setting. The present study helps to develop new skills and approaches to repackaging information using AR applications. Researchers have identified the question of what could be done to increase the awareness of Layar AR applications among students. Because the Layar augmented application is one of the new tools for an academic library to repackage information for mass accessibility. Therefore, in the present action research approach, researchers encompass two activities action and research. Researchers have used participatory action research methods by collecting data from 17 MBA institute libraries affiliated with Savitribai Phule Pune University. Researchers have systematically used the Layar application in the library by obtaining permission from each higher authority. Researchers have designed a Layar satisfaction model using the SEM with AMOS and SPSS.

Findings

The researcher found that the relationship between experience, performance and service quality is positively significant. The user’s experience is satisfied with the Layar application, but users are not satisfied with the service quality and performance of the Layar application.

Research limitations/implications

This study tested Layar AR application in MBA libraries affiliated with Savitribai Phule Pune University in the Pune and Pimpri Chinchwad areas.

Practical implications

The Layar app helps the academic library to convert selected print collections into an AR feel for library users. This is an additional method of providing information services to users through mobile devices. A total of 157 students downloaded the Layar application from their handsets and provided feedback through a questionnaire. Researchers have found that the relationships between users and Layar experience, performance and service quality are positively significant. The user experience is satisfied with the Layar application, but users are not satisfied with the service quality and performance of the Layar application.

Originality/value

This study examined the performance, service quality and user experience of Layar applications. Structural equation and Modelling theories were used to examine the relationship between user satisfaction and information services using the Layar application.

Article
Publication date: 25 January 2024

Yaolin Zhou, Zhaoyang Zhang, Xiaoyu Wang, Quanzheng Sheng and Rongying Zhao

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned…

Abstract

Purpose

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned from single modalities, such as text, images, audio and video, to integrated multimodal forms. This paper identifies key trends, gaps and areas of focus in the field. Furthermore, it proposes a theoretical organizational framework based on deep learning to address the challenges of managing archives in the era of big data.

Design/methodology/approach

Via a comprehensive systematic literature review, the authors investigate the field of multimodal archive resource organization and the application of deep learning techniques in archive organization. A systematic search and filtering process is conducted to identify relevant articles, which are then summarized, discussed and analyzed to provide a comprehensive understanding of existing literature.

Findings

The authors' findings reveal that most research on multimodal archive resources predominantly focuses on aspects related to storage, management and retrieval. Furthermore, the utilization of deep learning techniques in image archive retrieval is increasing, highlighting their potential for enhancing image archive organization practices; however, practical research and implementation remain scarce. The review also underscores gaps in the literature, emphasizing the need for more practical case studies and the application of theoretical concepts in real-world scenarios. In response to these insights, the authors' study proposes an innovative deep learning-based organizational framework. This proposed framework is designed to navigate the complexities inherent in managing multimodal archive resources, representing a significant stride toward more efficient and effective archival practices.

Originality/value

This study comprehensively reviews the existing literature on multimodal archive resources organization. Additionally, a theoretical organizational framework based on deep learning is proposed, offering a novel perspective and solution for further advancements in the field. These insights contribute theoretically and practically, providing valuable knowledge for researchers, practitioners and archivists involved in organizing multimodal archive resources.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 11 September 2023

Ying Gao, Qiang Zhang, Xiaoran Wang, Yanmei Huang, Fanshuang Meng and Wan Tao

Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between…

Abstract

Purpose

Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between resources. Therefore, this study aims to propose a multidimensional knowledge discovery solution for Tang tomb mural cultural relic resources.

Design/methodology/approach

Taking the Tang tomb murals collected by the Shaanxi History Museum as an example, based on clarifying the relevant concepts of Tang tomb mural resources and considering both dynamic and static dimensions, a top-down approach was adopted to first construct an ontology model of Tang tomb mural type cultural relics resources. Then, the actual case data was imported into the Neo4J graph database according to the defined pattern hierarchy to complete the static organization of knowledge, and presented in a multimodal form in knowledge reasoning and retrieval. In addition, geographic information system (GIS) technology is used to dynamically display the spatiotemporal distribution of Tang tomb mural resources, and the distribution trend is analysed from a digital humanistic perspective.

Findings

The multi-dimensional knowledge discovery of Tang tomb mural cultural relics resources can help establish the correlation and spatiotemporal relationship between resources, providing support for semantic retrieval and navigation, knowledge discovery and visualization and so on.

Originality/value

This study takes the murals in the collection of the Shaanxi History Museum as an example, revealing potential knowledge associations in a static and intelligent way, achieving knowledge discovery and management of Tang tomb murals, and dynamically presents the spatial distribution of Tang tomb murals through GIS technology, meeting the knowledge presentation needs of different users and opening up new ideas for the study of Tang tomb murals.

Details

The Electronic Library , vol. 42 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Open Access
Article
Publication date: 20 June 2023

William R. Illsley

By reconsidering the concept of the historic environment, the aim of this study is to better understand how heritage is expressed by examining the networks within which the…

Abstract

Purpose

By reconsidering the concept of the historic environment, the aim of this study is to better understand how heritage is expressed by examining the networks within which the cultural performances of the historic environment take place. The goal is to move beyond a purely material expression and seek the expansion of the cultural dimension of the historic environment.

Design/methodology/approach

Conceptually, the historic environment is considered a valuable resource for heritage expression and exploration. The databases and records that house historic environment data are venerated and frequented entities for archeologists, but arguably less so for non-specialist users. In inventorying the historic environment, databases fulfill a major role in the planning process and asset management that is often considered to be more than just perfunctory. This paper approaches historic environment records (HERs) from an actor network perspective, particularizing the social foundation and relationships within the networks governing the historic environment and the environment's associated records.

Findings

The paper concludes that the performance of HERs from an actor-network perspective is a hegemonic process that is biased toward the supply and input to and from professional users. Furthermore, the paper provides a schematic for how many of the flaws in heritage transmission have come about.

Originality/value

The relevance here is largely belied by the fact that HERs as both public digital resources and as heritage networks were awaiting to be addressed in depth from a theoretical point of view.

Details

Journal of Cultural Heritage Management and Sustainable Development, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2044-1266

Keywords

Article
Publication date: 16 February 2022

Pragati Agarwal, Sanjeev Swami and Sunita Kumari Malhotra

The purpose of this paper is to give an overview of artificial intelligence (AI) and other AI-enabled technologies and to describe how COVID-19 affects various industries such as…

3528

Abstract

Purpose

The purpose of this paper is to give an overview of artificial intelligence (AI) and other AI-enabled technologies and to describe how COVID-19 affects various industries such as health care, manufacturing, retail, food services, education, media and entertainment, banking and insurance, travel and tourism. Furthermore, the authors discuss the tactics in which information technology is used to implement business strategies to transform businesses and to incentivise the implementation of these technologies in current or future emergency situations.

Design/methodology/approach

The review provides the rapidly growing literature on the use of smart technology during the current COVID-19 pandemic.

Findings

The 127 empirical articles the authors have identified suggest that 39 forms of smart technologies have been used, ranging from artificial intelligence to computer vision technology. Eight different industries have been identified that are using these technologies, primarily food services and manufacturing. Further, the authors list 40 generalised types of activities that are involved including providing health services, data analysis and communication. To prevent the spread of illness, robots with artificial intelligence are being used to examine patients and give drugs to them. The online execution of teaching practices and simulators have replaced the classroom mode of teaching due to the epidemic. The AI-based Blue-dot algorithm aids in the detection of early warning indications. The AI model detects a patient in respiratory distress based on face detection, face recognition, facial action unit detection, expression recognition, posture, extremity movement analysis, visitation frequency detection, sound pressure detection and light level detection. The above and various other applications are listed throughout the paper.

Research limitations/implications

Research is largely delimited to the area of COVID-19-related studies. Also, bias of selective assessment may be present. In Indian context, advanced technology is yet to be harnessed to its full extent. Also, educational system is yet to be upgraded to add these technologies potential benefits on wider basis.

Practical implications

First, leveraging of insights across various industry sectors to battle the global threat, and smart technology is one of the key takeaways in this field. Second, an integrated framework is recommended for policy making in this area. Lastly, the authors recommend that an internet-based repository should be developed, keeping all the ideas, databases, best practices, dashboard and real-time statistical data.

Originality/value

As the COVID-19 is a relatively recent phenomenon, such a comprehensive review does not exist in the extant literature to the best of the authors’ knowledge. The review is rapidly emerging literature on smart technology use during the current COVID-19 pandemic.

Details

Journal of Science and Technology Policy Management, vol. 15 no. 3
Type: Research Article
ISSN: 2053-4620

Keywords

1 – 10 of 70