Search results

1 – 10 of over 12000
Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 14 October 2020

Haihua Chen, Yunhan Yang, Wei Lu and Jiangping Chen

Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues…

Abstract

Purpose

Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues and thus cannot cover the broad range of user interests. To address this gap, the paper aims to propose a novelty task that can recommend a set of diverse citation contexts extracted from a list of citing articles. This will assist users in understanding how other scholars have cited an article and deciding which articles they should cite in their own writing.

Design/methodology/approach

This research combines three semantic distance algorithms and three diversification re-ranking algorithms for the diversifying recommendation based on the CiteSeerX data set and then evaluates the generated citation context lists by applying a user case study on 30 articles.

Findings

Results show that a diversification strategy that combined “word2vec” and “Integer Linear Programming” leads to better reading experience for participants than other diversification strategies, such as CiteSeerX using a list sorted by citation counts.

Practical implications

This diversifying recommendation task is valuable for developing better systems in information retrieval, automatic academic recommendations and summarization.

Originality/value

The originality of the research lies in the proposal of a novelty task that can recommend a diversification context list describing how other scholars cited an article, thereby making citing decisions easier. A novel mixed approach is explored to generate the most efficient diversifying strategy. Besides, rather than traditional information retrieval evaluation, a user evaluation framework is introduced to reflect user information needs more objectively.

Article
Publication date: 9 March 2010

Isabella Peters and Wolfgang G. Stock

Many Web 2.0 services (including Library 2.0 catalogs) make use of folksonomies. The purpose of this paper is to cut off all tags in the long tail of a document‐specific tag…

3269

Abstract

Purpose

Many Web 2.0 services (including Library 2.0 catalogs) make use of folksonomies. The purpose of this paper is to cut off all tags in the long tail of a document‐specific tag distribution. The remaining tags at the beginning of a tag distribution are considered power tags and form a new, additional search option in information retrieval systems.

Design/methodology/approach

In a theoretical approach the paper discusses document‐specific tag distributions (power law and inverse‐logistic shape), the development of such distributions (Yule‐Simon process and shuffling theory) and introduces search tags (besides the well‐known index tags) as a possibility for generating tag distributions.

Findings

Search tags are compatible with broad and narrow folksonomies and with all knowledge organization systems (e.g. classification systems and thesauri), while index tags are only applicable in broad folksonomies. Based on these findings, the paper presents a sketch of an algorithm for mining and processing power tags in information retrieval systems.

Research limitations/implications

This conceptual approach is in need of empirical evaluation in a concrete retrieval system.

Practical implications

Power tags are a new search option for retrieval systems to limit the amount of hits.

Originality/value

The paper introduces power tags as a means for enhancing the precision of search results in information retrieval systems that apply folksonomies, e.g. catalogs in Library 2.0 environments.

Details

Library Hi Tech, vol. 28 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 8 May 2017

Christiane Behnert and Dirk Lewandowski

The purpose of this paper is to demonstrate how to apply traditional information retrieval (IR) evaluation methods based on standards from the Text REtrieval Conference and web…

2070

Abstract

Purpose

The purpose of this paper is to demonstrate how to apply traditional information retrieval (IR) evaluation methods based on standards from the Text REtrieval Conference and web search evaluation to all types of modern library information systems (LISs) including online public access catalogues, discovery systems, and digital libraries that provide web search features to gather information from heterogeneous sources.

Design/methodology/approach

The authors apply conventional procedures from IR evaluation to the LIS context considering the specific characteristics of modern library materials.

Findings

The authors introduce a framework consisting of five parts: search queries, search results, assessors, testing, and data analysis. The authors show how to deal with comparability problems resulting from diverse document types, e.g., electronic articles vs printed monographs and what issues need to be considered for retrieval tests in the library context.

Practical implications

The framework can be used as a guideline for conducting retrieval effectiveness studies in the library context.

Originality/value

Although a considerable amount of research has been done on IR evaluation, and standards for conducting retrieval effectiveness studies do exist, to the authors’ knowledge this is the first attempt to provide a systematic framework for evaluating the retrieval effectiveness of twenty-first-century LISs. The authors demonstrate which issues must be considered and what decisions must be made by researchers prior to a retrieval test.

Article
Publication date: 1 February 2000

Pia Borlund

This paper presents a set of basic components which constitutes the experimental setting intended for the evaluation of interactive information retrieval (IIR) systems, the aim of

1990

Abstract

This paper presents a set of basic components which constitutes the experimental setting intended for the evaluation of interactive information retrieval (IIR) systems, the aim of which is to facilitate evaluation of IIR systems in a way which is as close as possible to realistic IR processes. The experimental setting consists of three components: (1) the involvement of potential users as test persons; (2) the application of dynamic and individual information needs; and (3) the use of multidimensional and dynamic relevance judgements. Hidden under the information need component is the essential central sub‐component, the simulated work task situation, the tool that triggers the (simulated) dynamic information needs. This paper also reports on the empirical findings of the metaevaluation of the application of this sub‐component, the purpose of which is to discover whether the application of simulated work task situations to future evaluation of IIR systems can be recommended. Investigations are carried out to determine whether any search behavioural differences exist between test persons‘ treatment of their own real information needs versus simulated information needs. The hypothesis is that if no difference exists one can correctly substitute real information needs with simulated information needs through the application of simulated work task situations. The empirical results of the meta‐evaluation provide positive evidence for the application of simulated work task situations to the evaluation of IIR systems. The results also indicate that tailoring work task situations to the group of test persons is important in motivating them. Furthermore, the results of the evaluation show that different versions of semantic openness of the simulated situations make no difference to the test persons’ search treatment.

Details

Journal of Documentation, vol. 56 no. 1
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 5 February 2018

Sanjeev K. Sunny and Mallikarjun Angadi

The purpose of this study is to carry out a systematic literature review for evidence-based assessment of the effectiveness of thesaurus in digital information retrieval systems…

1736

Abstract

Purpose

The purpose of this study is to carry out a systematic literature review for evidence-based assessment of the effectiveness of thesaurus in digital information retrieval systems. It also aimed to identify the evaluation methods, evaluation measures and data collection tools which may be used in evaluating digital information retrieval systems.

Design/methodology/approach

A systematic literature review (SLR) of 344 publications from LISA and 238 from Scopus has been carried out to identify the evaluation studies for analysis, and 15 evaluation studies have been analyzed.

Findings

This study presents evidences for the effectiveness of thesaurus in digital information retrieval systems. Various methods for evaluating digital information systems have been identified. Also, a wide range of evaluation measures and data collection tools have been identified.

Research limitations/implications

The study was limited to the literature published in English language and indexed in LISA and Scopus. The evaluation methods, evaluation measures and data collection tools identified in this study may be used to design more cognizant evaluation studies for digital information retrieval systems.

Practical implications

The findings have significant implications for the administrators of any type of digital information retrieval systems in making more informed decisions toward implementation of thesaurus in resource description and access to digital collections.

Originality/value

This study extends our knowledge on the potentials of thesauri in digital information retrieval systems. It also provides cues for designing more cognizant evaluation studies for digital information systems.

Book part
Publication date: 13 December 2017

Qiongwei Ye and Baojun Ma

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to…

Abstract

Internet + and Electronic Business in China is a comprehensive resource that provides insight and analysis into E-commerce in China and how it has revolutionized and continues to revolutionize business and society. Split into four distinct sections, the book first lays out the theoretical foundations and fundamental concepts of E-Business before moving on to look at internet+ innovation models and their applications in different industries such as agriculture, finance and commerce. The book then provides a comprehensive analysis of E-business platforms and their applications in China before finishing with four comprehensive case studies of major E-business projects, providing readers with successful examples of implementing E-Business entrepreneurship projects.

Internet + and Electronic Business in China is a comprehensive resource that provides insights and analysis into how E-commerce has revolutionized and continues to revolutionize business and society in China.

Details

Internet+ and Electronic Business in China: Innovation and Applications
Type: Book
ISBN: 978-1-78743-115-7

Abstract

Details

Automated Information Retrieval: Theory and Methods
Type: Book
ISBN: 978-0-12266-170-9

Book part
Publication date: 10 February 2012

Ben Carterette, Evangelos Kanoulas and Emine Yilmaz

Purpose — The overall quality of an information retrieval system depends on many different aspects of the system and its users' information seeking behaviour, such as the speed of

Abstract

Purpose — The overall quality of an information retrieval system depends on many different aspects of the system and its users' information seeking behaviour, such as the speed of the system, the user interface, the query language and the features provided by the engine. One of the most important aspects is the effectiveness of the retrieval system, i.e. its ability to retrieve items that are relevant to the information need of an end user. This chapter focuses on methods for measuring effectiveness, in particular focusing on recent work that more directly models the utility of an engine to its users.

Methodology/approach — We discuss traditional approaches to effectiveness evaluation based on test collections, then transition to approaches based on test collections along with explicit models of user interaction with search results. We contrast this with approaches for which the user is ‘in the loop’, such as user studies and online evaluations.

Research limitations/implications — If it were possible to model users perfectly, we could directly estimate the utility of a search engine to its users; this would undoubtedly have a transformative effect on information retrieval and web search research. In practice, this goal will never be achievable because users exhibit far too much variability in how they approach the search engine, and furthermore provide valuable feedback that models and simulations cannot provide. Nevertheless, better models of user interaction will help develop better web search engines for a wider variety of tasks more rapidly.

Originality/value of paper — This is the first work that surveys recent work on user model-based evaluation and places it in a context with traditional evaluation based on the Cranfield paradigm.

Details

Web Search Engine Research
Type: Book
ISBN: 978-1-78052-636-2

Keywords

Article
Publication date: 18 November 2013

Jorge Luis Morato, Sonia Sanchez-Cuadrado, Christos Dimou, Divakar Yadav and Vicente Palacios

– This paper seeks to analyze and evaluate different types of semantic web retrieval systems, with respect to their ability to manage and retrieve semantic documents.

1441

Abstract

Purpose

This paper seeks to analyze and evaluate different types of semantic web retrieval systems, with respect to their ability to manage and retrieve semantic documents.

Design/methodology/approach

The authors provide a brief overview of knowledge modeling and semantic retrieval systems in order to identify their major problems. They classify a set of characteristics to evaluate the management of semantic documents. For doing the same the authors select 12 retrieval systems classified according to these features. The evaluation methodology followed in this work is the one that has been used in the Desmet project for the evaluation of qualitative characteristics.

Findings

A review of the literature has shown deficiencies in the current state of the semantic web to cope with known problems. Additionally, the way semantic retrieval systems are implemented shows discrepancies in their implementation. The authors analyze the presence of a set of functionalities in different types of semantic retrieval systems and find a low degree of implementation of important specifications and in the criteria to evaluate them. The results of this evaluation indicate that, at the moment, the semantic web is characterized by a lack of usability that is derived by the problems related to the management of semantic documents.

Originality/value

This proposal shows a simple way to compare requirements of semantic retrieval systems based in DESMET methodology qualitatively. The functionalities chosen to test the methodology are based on the problems as well as relevant criteria discussed in the literature. This work provides functionalities to design semantic retrieval systems in different scenarios.

Details

Library Hi Tech, vol. 31 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of over 12000