Search results
1 – 10 of 733Oussama Ayoub, Christophe Rodrigues and Nicolas Travers
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data…
Abstract
Purpose
This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.
Design/methodology/approach
To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.
Findings
The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.
Originality/value
In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.
Details
Keywords
Faycal Touazi and Amel Boustil
The purpose of this paper is to address the need for new approaches in locating items that closely match user preference criteria due to the rise in data volume of knowledge bases…
Abstract
Purpose
The purpose of this paper is to address the need for new approaches in locating items that closely match user preference criteria due to the rise in data volume of knowledge bases resulting from Open Data initiatives. Specifically, the paper focuses on evaluating SPARQL qualitative preference queries over user preferences in SPARQL.
Design/methodology/approach
The paper outlines a novel approach for handling SPARQL preference queries by representing preferences through symbolic weights using the possibilistic logic (PL) framework. This approach allows for the management of symbolic weights without relying on numerical values, using a partial ordering system instead. The paper compares this approach with numerous other approaches, including those based on skylines, fuzzy sets and conditional preference networks.
Findings
The paper highlights the advantages of the proposed approach, which enables the representation of preference criteria through symbolic weights and qualitative considerations. This approach offers a more intuitive way to convey preferences and manage rankings.
Originality/value
The paper demonstrates the usefulness and originality of the proposed SPARQL language in the PL framework. The approach extends SPARQL by incorporating symbolic weights and qualitative preferences.
Details
Keywords
Daniel Coughlin, Andrew Dudash and Jacob Gordon
The purpose of this paper is to investigate the feasibility of automating Google Scholar searching to harvest citation data of monographs for collection analysis.
Abstract
Purpose
The purpose of this paper is to investigate the feasibility of automating Google Scholar searching to harvest citation data of monographs for collection analysis.
Design/methodology/approach
This study discusses the creation and refinement of a Scraper application programming interface query structure created to match library collection inventories to their Google Scholar listings to retrieve citation counts.
Findings
This paper indicates that Google Scholar is a feasible and usable tool for retrieving monograph citation data.
Originality/value
This study shows that Google Scholar citation data can be harvested for monographs in an automated fashion to serve as a source of bibliographic data, something not typically done outside of individual academics and writers tracking their personal academic impact factors.
Details
Keywords
Yi-Hung Liu, Sheng-Fong Chen and Dan-Wei (Marian) Wen
Online medical repositories provide a platform for users to share information and dynamically access abundant electronic health data. It is important to determine whether case…
Abstract
Purpose
Online medical repositories provide a platform for users to share information and dynamically access abundant electronic health data. It is important to determine whether case report information can assist the general public in appropriately managing their diseases. Therefore, this paper aims to introduce a novel deep learning-based method that allows non-professionals to make inquiries using ordinary vocabulary, retrieving the most relevant case reports for accurate and effective health information.
Design/methodology/approach
The dataset of case reports was collected from both the patient-generated research network and the digital medical journal repository. To enhance the accuracy of obtaining relevant case reports, the authors propose a retrieval approach that combines BERT and BiLSTM methods. The authors identified representative health-related case reports and analyzed the retrieval performance, as well as user judgments.
Findings
This study aims to provide the necessary functionalities to deliver relevant health case reports based on input from ordinary terms. The proposed framework includes features for health management, user feedback acquisition and ranking by weights to obtain the most pertinent case reports.
Originality/value
This study contributes to health information systems by analyzing patients' experiences and treatments with the case report retrieval model. The results of this study can provide immense benefit to the general public who intend to find treatment decisions and experiences from relevant case reports.
Details
Keywords
Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…
Abstract
Purpose
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.
Design/methodology/approach
This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.
Findings
The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.
Originality/value
To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.
Details
Keywords
Raj Kumar Bhardwaj, Ritesh Kumar and Mohammad Nazim
This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest…
Abstract
Purpose
This paper evaluates the precision of four metasearch engines (MSEs) – DuckDuckGo, Dogpile, Metacrawler and Startpage, to determine which metasearch engine exhibits the highest level of precision and to identify the metasearch engine that is most likely to return the most relevant search results.
Design/methodology/approach
The research is divided into two parts: the first phase involves four queries categorized into two segments (4-Q-2-S), while the second phase includes six queries divided into three segments (6-Q-3-S). These queries vary in complexity, falling into three types: simple, phrase and complex. The precision, average precision and the presence of duplicates across all the evaluated metasearch engines are determined.
Findings
The study clearly demonstrated that Startpage returned the most relevant results and achieved the highest precision (0.98) among the four MSEs. Conversely, DuckDuckGo exhibited consistent performance across both phases of the study.
Research limitations/implications
The study only evaluated four metasearch engines, which may not be representative of all available metasearch engines. Additionally, a limited number of queries were used, which may not be sufficient to generalize the findings to all types of queries.
Practical implications
The findings of this study can be valuable for accreditation agencies in managing duplicates, improving their search capabilities and obtaining more relevant and precise results. These findings can also assist users in selecting the best metasearch engine based on precision rather than interface.
Originality/value
The study is the first of its kind which evaluates the four metasearch engines. No similar study has been conducted in the past to measure the performance of metasearch engines.
Details
Keywords
Hui-Min Lai, Shin-Yuan Hung and David C. Yen
Seekers who visit professional virtual communities (PVCs) are usually motivated by knowledge-seeking, which is a complex cognitive process. How do seekers search for knowledge…
Abstract
Purpose
Seekers who visit professional virtual communities (PVCs) are usually motivated by knowledge-seeking, which is a complex cognitive process. How do seekers search for knowledge, and how is their search linked to prior knowledge or PVC situation factors? From the cognitive process and interactional psychology perspectives, this study investigated the three-way interactions between seekers’ expertise, task complexity, and perceptions of PVC features (i.e. knowledge quality and system quality) on knowledge-seeking strategies and resultant outcomes.
Design/methodology/approach
A field experiment was conducted with 119 seekers in a PVC using a 2 × 2 factorial design of seekers’ expertise (i.e. expert versus novice) and task complexity (i.e. low versus high).
Findings
The study reveals three significant insights: (1) For a high-complexity task, experts adopt an ask-directed searching strategy compared to novices, whereas novices adopt a browsing strategy; (2) For a high-complexity task, experts who perceive a high system quality are more likely than novices to adopt an ask-directed searching strategy; and (3) Task completion time and task quality are associated with the adoption of ask-directed searching strategies, whereas knowledge seekers’ satisfaction is more associated with the adoption of browsing strategy.
Originality/value
We draw on the perspectives of cognitive process and interactional psychology to explore potential two- and three-way interactions of seekers’ expertise, task complexity, and PVC features on the adoption of knowledge-seeking strategies in a PVC context. Our findings provide deep insights into seekers’ behavior in a PVC, given the popularity of the search for knowledge in PVCs.
Details
Keywords
Frendy and Fumiko Takeda
Partners are responsible for allocating audit tasks and facilitating knowledge sharing among team members. This study considers changes in the composition of partners to proxy for…
Abstract
Purpose
Partners are responsible for allocating audit tasks and facilitating knowledge sharing among team members. This study considers changes in the composition of partners to proxy for the continuity of the audit team. This study examines the effect of audit team continuity on audit outcomes (audit quality and report lags), pricing and its determinant (lead partner experience), which have not been thoroughly examined in previous studies.
Design/methodology/approach
This study employs string similarity metrics to measure audit team continuity. The study employs multivariate panel data regression empirical models to estimate a sample of 26,007 firm-years of listed Japanese companies from 2008 to 2019.
Findings
The study reveals that audit team continuity is negatively associated with audit fees, regardless of the auditor’s size. This finding contributes to the existing literature by showing that audit team continuity represents one of the determinant factors of audit fee. For clients of large audit firms, companies with higher (lower) audit team continuity issue audit reports in less (more) time. The experience of lead partners is a strong predictor of audit team continuity, irrespective of audit firm size. Audit quality is not associated with audit team continuity for either large or small audit firms.
Originality/value
This study proposes and examines audit team continuity measures that employ string similarity metrics to quantify changes in the composition of partners in consecutive audit engagements. Audit team continuity expands upon the tenure of individual audit partners, which is commonly used in prior literature as a measure of client–partner relationships.
Details
Keywords
Ying Gao, Qiang Zhang, Xiaoran Wang, Yanmei Huang, Fanshuang Meng and Wan Tao
Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between…
Abstract
Purpose
Currently, the Tang tomb mural cultural relic resources are presented in a multi-source and heterogeneous manner, with a lack of effective organization and sharing between resources. Therefore, this study aims to propose a multidimensional knowledge discovery solution for Tang tomb mural cultural relic resources.
Design/methodology/approach
Taking the Tang tomb murals collected by the Shaanxi History Museum as an example, based on clarifying the relevant concepts of Tang tomb mural resources and considering both dynamic and static dimensions, a top-down approach was adopted to first construct an ontology model of Tang tomb mural type cultural relics resources. Then, the actual case data was imported into the Neo4J graph database according to the defined pattern hierarchy to complete the static organization of knowledge, and presented in a multimodal form in knowledge reasoning and retrieval. In addition, geographic information system (GIS) technology is used to dynamically display the spatiotemporal distribution of Tang tomb mural resources, and the distribution trend is analysed from a digital humanistic perspective.
Findings
The multi-dimensional knowledge discovery of Tang tomb mural cultural relics resources can help establish the correlation and spatiotemporal relationship between resources, providing support for semantic retrieval and navigation, knowledge discovery and visualization and so on.
Originality/value
This study takes the murals in the collection of the Shaanxi History Museum as an example, revealing potential knowledge associations in a static and intelligent way, achieving knowledge discovery and management of Tang tomb murals, and dynamically presents the spatial distribution of Tang tomb murals through GIS technology, meeting the knowledge presentation needs of different users and opening up new ideas for the study of Tang tomb murals.
Details