Search results

1 – 10 of 49
Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 31 July 2023

Sara Lafia, David A. Bleckley and J. Trent Alexander

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use…

Abstract

Purpose

Many libraries and archives maintain collections of research documents, such as administrative records, with paper-based formats that limit the documents' access to in-person use. Digitization transforms paper-based collections into more accessible and analyzable formats. As collections are digitized, there is an opportunity to incorporate deep learning techniques, such as Document Image Analysis (DIA), into workflows to increase the usability of information extracted from archival documents. This paper describes the authors' approach using digital scanning, optical character recognition (OCR) and deep learning to create a digital archive of administrative records related to the mortgage guarantee program of the Servicemen's Readjustment Act of 1944, also known as the G.I. Bill.

Design/methodology/approach

The authors used a collection of 25,744 semi-structured paper-based records from the administration of G.I. Bill Mortgages from 1946 to 1954 to develop a digitization and processing workflow. These records include the name and city of the mortgagor, the amount of the mortgage, the location of the Reconstruction Finance Corporation agent, one or more identification numbers and the name and location of the bank handling the loan. The authors extracted structured information from these scanned historical records in order to create a tabular data file and link them to other authoritative individual-level data sources.

Findings

The authors compared the flexible character accuracy of five OCR methods. The authors then compared the character error rate (CER) of three text extraction approaches (regular expressions, DIA and named entity recognition (NER)). The authors were able to obtain the highest quality structured text output using DIA with the Layout Parser toolkit by post-processing with regular expressions. Through this project, the authors demonstrate how DIA can improve the digitization of administrative records to automatically produce a structured data resource for researchers and the public.

Originality/value

The authors' workflow is readily transferable to other archival digitization projects. Through the use of digital scanning, OCR and DIA processes, the authors created the first digital microdata file of administrative records related to the G.I. Bill mortgage guarantee program available to researchers and the general public. These records offer research insights into the lives of veterans who benefited from loans, the impacts on the communities built by the loans and the institutions that implemented them.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 20 February 2024

Alenka Kavčič Čolić and Andreja Hari

The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To…

Abstract

Purpose

The current predominant delivery format resulting from digitization is PDF, which is not appropriate for the blind, partially sighted and people who read on mobile devices. To meet the needs of both communities, as well as broader ones, alternative file formats are required. With the findings of the eBooks-On-Demand-Network Opening Publications for European Netizens project research, this study aims to improve access to digitized content for these communities.

Design/methodology/approach

In 2022, the authors conducted research on the digitization experiences of 13 EODOPEN partners at their organizations. The authors distributed the same sample of scans in English with different characteristics, and in accordance with Web content accessibility guidelines, the authors created 24 criteria to analyze their digitization workflows, output formats and optical character recognition (OCR) quality.

Findings

In this contribution, the authors present the results of a trial implementation among EODOPEN partners regarding their digitization workflows, used delivery file formats and the resulting quality of OCR results, depending on the type of digitization output file format. It was shown that partners using the OCR tool ABBYY FineReader Professional and producing scanning outputs in tagged PDF and PDF/UA formats achieved better results according to set criteria.

Research limitations/implications

The trial implementations were limited to 13 project partners’ organizations only.

Originality/value

This research paper can be a valuable contribution to the field of massive digitization practices, particularly in terms of improving the accessibility of the output delivery file formats.

Details

Digital Library Perspectives, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 14 November 2023

Shaodan Sun, Jun Deng and Xugong Qin

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained…

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 10 April 2023

Santosh Abaji Kharat, Shubhada Nagarkar and Bhausaheb Panage

The purpose of this research is to introduce the Layar augmented reality (AR) application among library users and to understand the user’s satisfaction towards the information…

Abstract

Purpose

The purpose of this research is to introduce the Layar augmented reality (AR) application among library users and to understand the user’s satisfaction towards the information services provided by the Layar application with the help of the structural equation model (SEM).

Design/methodology/approach

According to Thomas (2016), action research is mainly undertaken to develop new skills or new approaches and to solve issues and problems with direct application to any applied setting. The present study helps to develop new skills and approaches to repackaging information using AR applications. Researchers have identified the question of what could be done to increase the awareness of Layar AR applications among students. Because the Layar augmented application is one of the new tools for an academic library to repackage information for mass accessibility. Therefore, in the present action research approach, researchers encompass two activities action and research. Researchers have used participatory action research methods by collecting data from 17 MBA institute libraries affiliated with Savitribai Phule Pune University. Researchers have systematically used the Layar application in the library by obtaining permission from each higher authority. Researchers have designed a Layar satisfaction model using the SEM with AMOS and SPSS.

Findings

The researcher found that the relationship between experience, performance and service quality is positively significant. The user’s experience is satisfied with the Layar application, but users are not satisfied with the service quality and performance of the Layar application.

Research limitations/implications

This study tested Layar AR application in MBA libraries affiliated with Savitribai Phule Pune University in the Pune and Pimpri Chinchwad areas.

Practical implications

The Layar app helps the academic library to convert selected print collections into an AR feel for library users. This is an additional method of providing information services to users through mobile devices. A total of 157 students downloaded the Layar application from their handsets and provided feedback through a questionnaire. Researchers have found that the relationships between users and Layar experience, performance and service quality are positively significant. The user experience is satisfied with the Layar application, but users are not satisfied with the service quality and performance of the Layar application.

Originality/value

This study examined the performance, service quality and user experience of Layar applications. Structural equation and Modelling theories were used to examine the relationship between user satisfaction and information services using the Layar application.

Article
Publication date: 25 January 2024

Yaolin Zhou, Zhaoyang Zhang, Xiaoyu Wang, Quanzheng Sheng and Rongying Zhao

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned…

Abstract

Purpose

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned from single modalities, such as text, images, audio and video, to integrated multimodal forms. This paper identifies key trends, gaps and areas of focus in the field. Furthermore, it proposes a theoretical organizational framework based on deep learning to address the challenges of managing archives in the era of big data.

Design/methodology/approach

Via a comprehensive systematic literature review, the authors investigate the field of multimodal archive resource organization and the application of deep learning techniques in archive organization. A systematic search and filtering process is conducted to identify relevant articles, which are then summarized, discussed and analyzed to provide a comprehensive understanding of existing literature.

Findings

The authors' findings reveal that most research on multimodal archive resources predominantly focuses on aspects related to storage, management and retrieval. Furthermore, the utilization of deep learning techniques in image archive retrieval is increasing, highlighting their potential for enhancing image archive organization practices; however, practical research and implementation remain scarce. The review also underscores gaps in the literature, emphasizing the need for more practical case studies and the application of theoretical concepts in real-world scenarios. In response to these insights, the authors' study proposes an innovative deep learning-based organizational framework. This proposed framework is designed to navigate the complexities inherent in managing multimodal archive resources, representing a significant stride toward more efficient and effective archival practices.

Originality/value

This study comprehensively reviews the existing literature on multimodal archive resources organization. Additionally, a theoretical organizational framework based on deep learning is proposed, offering a novel perspective and solution for further advancements in the field. These insights contribute theoretically and practically, providing valuable knowledge for researchers, practitioners and archivists involved in organizing multimodal archive resources.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 11 September 2023

Ying Gao, Qiang Zhang, Xiaoran Wang, Yanmei Huang, Fanshuang Meng and Wan Tao

Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between…

Abstract

Purpose

Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between resources. Therefore, this study aims to propose a multidimensional knowledge discovery solution for Tang tomb mural cultural relic resources.

Design/methodology/approach

Taking the Tang tomb murals collected by the Shaanxi History Museum as an example, based on clarifying the relevant concepts of Tang tomb mural resources and considering both dynamic and static dimensions, a top-down approach was adopted to first construct an ontology model of Tang tomb mural type cultural relics resources. Then, the actual case data was imported into the Neo4J graph database according to the defined pattern hierarchy to complete the static organization of knowledge, and presented in a multimodal form in knowledge reasoning and retrieval. In addition, geographic information system (GIS) technology is used to dynamically display the spatiotemporal distribution of Tang tomb mural resources, and the distribution trend is analysed from a digital humanistic perspective.

Findings

The multi-dimensional knowledge discovery of Tang tomb mural cultural relics resources can help establish the correlation and spatiotemporal relationship between resources, providing support for semantic retrieval and navigation, knowledge discovery and visualization and so on.

Originality/value

This study takes the murals in the collection of the Shaanxi History Museum as an example, revealing potential knowledge associations in a static and intelligent way, achieving knowledge discovery and management of Tang tomb murals, and dynamically presents the spatial distribution of Tang tomb murals through GIS technology, meeting the knowledge presentation needs of different users and opening up new ideas for the study of Tang tomb murals.

Details

The Electronic Library , vol. 42 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 16 February 2022

Pragati Agarwal, Sanjeev Swami and Sunita Kumari Malhotra

The purpose of this paper is to give an overview of artificial intelligence (AI) and other AI-enabled technologies and to describe how COVID-19 affects various industries such as…

3528

Abstract

Purpose

The purpose of this paper is to give an overview of artificial intelligence (AI) and other AI-enabled technologies and to describe how COVID-19 affects various industries such as health care, manufacturing, retail, food services, education, media and entertainment, banking and insurance, travel and tourism. Furthermore, the authors discuss the tactics in which information technology is used to implement business strategies to transform businesses and to incentivise the implementation of these technologies in current or future emergency situations.

Design/methodology/approach

The review provides the rapidly growing literature on the use of smart technology during the current COVID-19 pandemic.

Findings

The 127 empirical articles the authors have identified suggest that 39 forms of smart technologies have been used, ranging from artificial intelligence to computer vision technology. Eight different industries have been identified that are using these technologies, primarily food services and manufacturing. Further, the authors list 40 generalised types of activities that are involved including providing health services, data analysis and communication. To prevent the spread of illness, robots with artificial intelligence are being used to examine patients and give drugs to them. The online execution of teaching practices and simulators have replaced the classroom mode of teaching due to the epidemic. The AI-based Blue-dot algorithm aids in the detection of early warning indications. The AI model detects a patient in respiratory distress based on face detection, face recognition, facial action unit detection, expression recognition, posture, extremity movement analysis, visitation frequency detection, sound pressure detection and light level detection. The above and various other applications are listed throughout the paper.

Research limitations/implications

Research is largely delimited to the area of COVID-19-related studies. Also, bias of selective assessment may be present. In Indian context, advanced technology is yet to be harnessed to its full extent. Also, educational system is yet to be upgraded to add these technologies potential benefits on wider basis.

Practical implications

First, leveraging of insights across various industry sectors to battle the global threat, and smart technology is one of the key takeaways in this field. Second, an integrated framework is recommended for policy making in this area. Lastly, the authors recommend that an internet-based repository should be developed, keeping all the ideas, databases, best practices, dashboard and real-time statistical data.

Originality/value

As the COVID-19 is a relatively recent phenomenon, such a comprehensive review does not exist in the extant literature to the best of the authors’ knowledge. The review is rapidly emerging literature on smart technology use during the current COVID-19 pandemic.

Details

Journal of Science and Technology Policy Management, vol. 15 no. 3
Type: Research Article
ISSN: 2053-4620

Keywords

Article
Publication date: 24 November 2023

Ernesto William De Luca, Francesca Fallucchi, Bouchra Ghattas and Riem Spielhaus

This article aims to explore how the mapping strategies between user requirements expressed by the humanities researchers lead to a better customization of user-driven digital…

Abstract

Purpose

This article aims to explore how the mapping strategies between user requirements expressed by the humanities researchers lead to a better customization of user-driven digital humanities tools and to the creation of innovative functionalities, which can directly affect the way of doing research in a digital context.

Design/methodology/approach

It describes the user-driven development of a tool that helps researchers in the quantitative and qualitative analysis of large textbook collections.

Findings

This article presents an exemplary user journey map, which shows the different steps of the digital transformation process and how the humanities researchers are involved for (1) producing innovative research solutions, comprehensive and personalized reports, and (2) customizing access to content data used for the analysis of digital documents. The article is based on a case study on a German textbooks collection and content analysis functionalities.

Originality/value

The focus of this article is the reiterative research process, in which humanists (from the human centred point of view) starts from an initial research question, using quantitative and qualitative data and develops both the research question and the answers to it by with the aim to find patterns in the content and structure of educational media. Thus, from the viewpoint of digital transformation the humanist is part of the interaction between digitization and digitalization processes, where he/she uses digital data, metadata, reports and findings created and supported by the digital tools for research analysis.

Details

Journal of Documentation, vol. 80 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 24 April 2023

Priya Garg and Shivarama Rao K.

This paper aims to discuss the process of building a 24×7 reference platform for facilitating the farmers with the easy access of information at any time from any location. It…

Abstract

Purpose

This paper aims to discuss the process of building a 24×7 reference platform for facilitating the farmers with the easy access of information at any time from any location. It takes the text string as input and process it to respond with the desired result to the user.

Design/methodology/approach

An interactive Web-based chatbot named as AgriRef was developed using free version of Dialogflow. The intents were defined based on the conversation flow diagram. Furthermore, the application was integrated with website on local server and telegram application.

Findings

With this chatbot application, the farmers will able to get answers of their queries. It provides the human-like conversational interface to the farmers. It will also be useful for librarians of agricultural libraries to save time in answering common queries.

Originality/value

This paper describes the various steps involved in developing the chatbot application using Dialogflow.

Details

Library Hi Tech News, vol. 41 no. 2
Type: Research Article
ISSN: 0741-9058

Keywords

1 – 10 of 49