Search results

1 – 10 of 449
Article
Publication date: 8 September 2023

Oussama Ayoub, Christophe Rodrigues and Nicolas Travers

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…

Abstract

Purpose

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.

Design/methodology/approach

To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.

Findings

The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.

Originality/value

In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.

Details

International Journal of Web Information Systems, vol. 19 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 31 August 2023

Faycal Touazi and Amel Boustil

The purpose of this paper is to address the need for new approaches in locating items that closely match user preference criteria due to the rise in data volume of knowledge bases…

Abstract

Purpose

The purpose of this paper is to address the need for new approaches in locating items that closely match user preference criteria due to the rise in data volume of knowledge bases resulting from Open Data initiatives. Specifically, the paper focuses on evaluating SPARQL qualitative preference queries over user preferences in SPARQL.

Design/methodology/approach

The paper outlines a novel approach for handling SPARQL preference queries by representing preferences through symbolic weights using the possibilistic logic (PL) framework. This approach allows for the management of symbolic weights without relying on numerical values, using a partial ordering system instead. The paper compares this approach with numerous other approaches, including those based on skylines, fuzzy sets and conditional preference networks.

Findings

The paper highlights the advantages of the proposed approach, which enables the representation of preference criteria through symbolic weights and qualitative considerations. This approach offers a more intuitive way to convey preferences and manage rankings.

Originality/value

The paper demonstrates the usefulness and originality of the proposed SPARQL language in the PL framework. The approach extends SPARQL by incorporating symbolic weights and qualitative preferences.

Details

International Journal of Web Information Systems, vol. 19 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 6 November 2023

Daniel Coughlin, Andrew Dudash and Jacob Gordon

The purpose of this paper is to investigate the feasibility of automating Google Scholar searching to harvest citation data of monographs for collection analysis.

Abstract

Purpose

The purpose of this paper is to investigate the feasibility of automating Google Scholar searching to harvest citation data of monographs for collection analysis.

Design/methodology/approach

This study discusses the creation and refinement of a Scraper application programming interface query structure created to match library collection inventories to their Google Scholar listings to retrieve citation counts.

Findings

This paper indicates that Google Scholar is a feasible and usable tool for retrieving monograph citation data.

Originality/value

This study shows that Google Scholar citation data can be harvested for monographs in an automated fashion to serve as a source of bibliographic data, something not typically done outside of individual academics and writers tracking their personal academic impact factors.

Details

Library Hi Tech News, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0741-9058

Keywords

Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 18 March 2024

Raj Kumar Bhardwaj, Ritesh Kumar and Mohammad Nazim

This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest…

Abstract

Purpose

This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest level of precision and to identify the metasearch engine that is most likely to return the most relevant search results.

Design/methodology/approach

The research is divided into two parts: the first phase involves four queries categorized into two segments (4-Q-2-S), while the second phase includes six queries divided into three segments (6-Q-3-S). These queries vary in complexity, falling into three types: simple, phrase and complex. The precision, average precision and the presence of duplicates across all the evaluated metasearch engines are determined.

Findings

The study clearly demonstrated that Startpage returned the most relevant results and achieved the highest precision (0.98) among the four MSEs. Conversely, DuckDuckGo exhibited consistent performance across both phases of the study.

Research limitations/implications

The study only evaluated four metasearch engines, which may not be representative of all available metasearch engines. Additionally, a limited number of queries were used, which may not be sufficient to generalize the findings to all types of queries.

Practical implications

The findings of this study can be valuable for accreditation agencies in managing duplicates, improving their search capabilities and obtaining more relevant and precise results. These findings can also assist users in selecting the best metasearch engine based on precision rather than interface.

Originality/value

The study is the first of its kind which evaluates the four metasearch engines. No similar study has been conducted in the past to measure the performance of metasearch engines.

Details

Performance Measurement and Metrics, vol. 25 no. 1
Type: Research Article
ISSN: 1467-8047

Keywords

Article
Publication date: 19 April 2024

Hui-Min Lai, Shin-Yuan Hung and David C. Yen

Seekers who visit professional virtual communities (PVCs) are usually motivated by knowledge-seeking, which is a complex cognitive process. How do seekers search for knowledge…

Abstract

Purpose

Seekers who visit professional virtual communities (PVCs) are usually motivated by knowledge-seeking, which is a complex cognitive process. How do seekers search for knowledge, and how is their search linked to prior knowledge or PVC situation factors? From the cognitive process and interactional psychology perspectives, this study investigated the three-way interactions between seekers’ expertise, task complexity, and perceptions of PVC features (i.e. knowledge quality and system quality) on knowledge-seeking strategies and resultant outcomes.

Design/methodology/approach

A field experiment was conducted with 119 seekers in a PVC using a 2 × 2 factorial design of seekers’ expertise (i.e. expert versus novice) and task complexity (i.e. low versus high).

Findings

The study reveals three significant insights: (1) For a high-complexity task, experts adopt an ask-directed searching strategy compared to novices, whereas novices adopt a browsing strategy; (2) For a high-complexity task, experts who perceive a high system quality are more likely than novices to adopt an ask-directed searching strategy; and (3) Task completion time and task quality are associated with the adoption of ask-directed searching strategies, whereas knowledge seekers’ satisfaction is more associated with the adoption of browsing strategy.

Originality/value

We draw on the perspectives of cognitive process and interactional psychology to explore potential two- and three-way interactions of seekers’ expertise, task complexity, and PVC features on the adoption of knowledge-seeking strategies in a PVC context. Our findings provide deep insights into seekers’ behavior in a PVC, given the popularity of the search for knowledge in PVCs.

Details

Information Technology & People, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0959-3845

Keywords

Abstract

Details

Technology vs. Government: The Irresistible Force Meets the Immovable Object
Type: Book
ISBN: 978-1-83867-951-4

Article
Publication date: 11 September 2023

Ying Gao, Qiang Zhang, Xiaoran Wang, Yanmei Huang, Fanshuang Meng and Wan Tao

Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between…

Abstract

Purpose

Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between resources. Therefore, this study aims to propose a multidimensional knowledge discovery solution for Tang tomb mural cultural relic resources.

Design/methodology/approach

Taking the Tang tomb murals collected by the Shaanxi History Museum as an example, based on clarifying the relevant concepts of Tang tomb mural resources and considering both dynamic and static dimensions, a top-down approach was adopted to first construct an ontology model of Tang tomb mural type cultural relics resources. Then, the actual case data was imported into the Neo4J graph database according to the defined pattern hierarchy to complete the static organization of knowledge, and presented in a multimodal form in knowledge reasoning and retrieval. In addition, geographic information system (GIS) technology is used to dynamically display the spatiotemporal distribution of Tang tomb mural resources, and the distribution trend is analysed from a digital humanistic perspective.

Findings

The multi-dimensional knowledge discovery of Tang tomb mural cultural relics resources can help establish the correlation and spatiotemporal relationship between resources, providing support for semantic retrieval and navigation, knowledge discovery and visualization and so on.

Originality/value

This study takes the murals in the collection of the Shaanxi History Museum as an example, revealing potential knowledge associations in a static and intelligent way, achieving knowledge discovery and management of Tang tomb murals, and dynamically presents the spatial distribution of Tang tomb murals through GIS technology, meeting the knowledge presentation needs of different users and opening up new ideas for the study of Tang tomb murals.

Details

The Electronic Library , vol. 42 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Open Access
Article
Publication date: 18 January 2024

Puyu Yang and Giovanni Colavizza

Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of…

Abstract

Purpose

Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of external sources. News media outlets constitute nearly one-third of all citations within Wikipedia. However, embracing such a radically open approach also poses the challenge of the potential introduction of biased content or viewpoints into Wikipedia. The authors conduct an investigation into the integrity of knowledge within Wikipedia, focusing on the dimensions of source political polarization and trustworthiness. Specifically, the authors delve into the conceivable presence of political polarization within the news media citations on Wikipedia, identify the factors that may influence such polarization within the Wikipedia ecosystem and scrutinize the correlation between political polarization in news media sources and the factual reliability of Wikipedia's content.

Design/methodology/approach

The authors conduct a descriptive and regression analysis, relying on Wikipedia Citations, a large-scale open dataset of nearly 30 million citations from English Wikipedia. Additionally, this dataset has been augmented with information obtained from the Media Bias Monitor (MBM) and the Media Bias Fact Check (MBFC).

Findings

The authors find a moderate yet significant liberal bias in the choice of news media sources across Wikipedia. Furthermore, the authors show that this effect persists when accounting for the factual reliability of the news media.

Originality/value

The results contribute to Wikipedia’s knowledge integrity agenda in suggesting that a systematic effort would help to better map potential biases in Wikipedia and find means to strengthen its neutral point of view policy.

Details

Online Information Review, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 25 January 2024

Yaolin Zhou, Zhaoyang Zhang, Xiaoyu Wang, Quanzheng Sheng and Rongying Zhao

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned…

Abstract

Purpose

The digitalization of archival management has rapidly developed with the maturation of digital technology. With data's exponential growth, archival resources have transitioned from single modalities, such as text, images, audio and video, to integrated multimodal forms. This paper identifies key trends, gaps and areas of focus in the field. Furthermore, it proposes a theoretical organizational framework based on deep learning to address the challenges of managing archives in the era of big data.

Design/methodology/approach

Via a comprehensive systematic literature review, the authors investigate the field of multimodal archive resource organization and the application of deep learning techniques in archive organization. A systematic search and filtering process is conducted to identify relevant articles, which are then summarized, discussed and analyzed to provide a comprehensive understanding of existing literature.

Findings

The authors' findings reveal that most research on multimodal archive resources predominantly focuses on aspects related to storage, management and retrieval. Furthermore, the utilization of deep learning techniques in image archive retrieval is increasing, highlighting their potential for enhancing image archive organization practices; however, practical research and implementation remain scarce. The review also underscores gaps in the literature, emphasizing the need for more practical case studies and the application of theoretical concepts in real-world scenarios. In response to these insights, the authors' study proposes an innovative deep learning-based organizational framework. This proposed framework is designed to navigate the complexities inherent in managing multimodal archive resources, representing a significant stride toward more efficient and effective archival practices.

Originality/value

This study comprehensively reviews the existing literature on multimodal archive resources organization. Additionally, a theoretical organizational framework based on deep learning is proposed, offering a novel perspective and solution for further advancements in the field. These insights contribute theoretically and practically, providing valuable knowledge for researchers, practitioners and archivists involved in organizing multimodal archive resources.

Details

Aslib Journal of Information Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 10 of 449