Search results

1 – 10 of over 13000
Article
Publication date: 1 April 1993

Niël van der Merwe

This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an…

Abstract

This paper will discuss the integration of document image processing and text retrieval principles in order to process and load existing paper documents automatically in an electronic document database that broadens the user's capability to retrieve relevant information more accurately, without going through costly processes to get paper documents into electronic text. The principles of document image processing systems, as well as the problems and shortcomings of most of today's document image processing systems, will be discussed. Then concept retrieval as the latest development in text retrieval will be discussed, with specific reference to the ability of the TOPIC intelligent text retrieval system to allow users to build up a knowledge base of search objects or concepts that can be used at any point in time by all users for the system. This paper will further specifically look at the automatic processing of paper documents by converting the scanned document image pages through to electronic text. The use of optical character recognition technology, the indexing and loading of the documents in a text database, the automatic linking of the documents to the related document images and the retrieval technology available in TOPIC, specifically the TYPO operator that was developed to handle so‐called dirty data such as the common misspellings, character transpositions and ‘dirty’ text received as output from the OCR process, will be discussed. A possible solution to load paper documents quickly and cost‐effectively into an electronic document database will be discussed and demonstrated in detail. The advantages and disadvantages of this approach will be discussed with specific reference to an electronic news clipping service application.

Details

The Electronic Library, vol. 11 no. 4/5
Type: Research Article
ISSN: 0264-0473

Article
Publication date: 1 March 1991

Suliman Al‐Hawamdeh, Rachel de Vere, Geoff Smith and Peter Willett

Full‐text documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In…

Abstract

Full‐text documents are usually searched by means of a Boolean retrieval algorithm that requires the user to specify the logical relationships between the terms of a query. In this paper, we summarise the results to date of a continuing programme of research at the University of Sheffield to investigate the use of nearest‐neighbour retrieval algorithms for full‐text searching. Given a natural‐language query statement, our methods result in a ranking of the paragraphs comprising a full‐text document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query. A full‐text document test collection has been created to allow systematic tests of retrieval effectiveness to be carried out. Experiments with this collection demonstrate that nearest‐neighbour searching provides a means for paragraph‐based access to full‐text documents that is of comparable effectiveness to both Boolean and hypertext searching and that index term weighting schemes which have been developed for the searching of bibliographical databases can also be used to improve the effectiveness of retrieval from full‐text databases. A current project is investigating the extent to which a paragraph‐based full‐text retrieval system can be used to augment the explication facilities of an expert system on welding.

Details

Online Review, vol. 15 no. 3/4
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 January 1996

PETER INGWERSEN

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a…

2458

Abstract

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full‐text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data fusion in IR. By explicitly incorporating all the cognitive structures participating in the interactive communication processes during IR, the cognitive theory provides a comprehensive view of these processes. It encompasses the ad hoc theories of text retrieval and IR techniques hitherto developed in mainstream retrieval research. It has elements in common with van Rijsbergen and Lalmas' logical uncertainty theory and may be regarded as compatible with that conception of IR. Epistemologically speaking, the theory views IR interaction as processes of cognition, potentially occurring in all the information processing components of IR, that may be applied, in particular, to the user in a situational context. The theory draws upon basic empirical results from information seeking investigations in the operational online environment, and from mainstream IR research on partial matching techniques and relevance feedback. By viewing users, source systems, intermediary mechanisms and information in a global context, the cognitive perspective attempts a comprehensive understanding of essential IR phenomena and concepts, such as the nature of information needs, cognitive inconsistency and retrieval overlaps, logical uncertainty, the concept of ‘document’, relevance measures and experimental settings. An inescapable consequence of this approach is to rely more on sociological and psychological investigative methods when evaluating systems and to view relevance in IR as situational, relative, partial, differentiated and non‐linear. The lack of consistency among authors, indexers, evaluators or users is of an identical cognitive nature. It is unavoidable, and indeed favourable to IR. In particular, for full‐text retrieval, alternative semantic entities, including Salton et al.'s ‘passage retrieval’, are proposed to replace the traditional document record as the basic retrieval entity. These empirically observed phenomena of inconsistency and of semantic entities and values associated with data interpretation support strongly a cognitive approach to IR and the logical use of polyrepresentation, cognitive overlaps, and both data fusion and data diffusion.

Details

Journal of Documentation, vol. 52 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 May 1985

C.R. Watters, M.A. Shepherd, E.W. Grundke and P. Bodorik

Although the Boolean combination of keywords and/or subject codes is the predominant access method for the retrieval of passages from full‐text databases, menu access is an…

Abstract

Although the Boolean combination of keywords and/or subject codes is the predominant access method for the retrieval of passages from full‐text databases, menu access is an attractive alternative. The selection of an access method and the ensuing satisfaction with the results is based on the type of query and on the experience and knowledge of the user. This paper describes a prototype system which has integrated Boolean, menu, and direct access methods for the retrieval of passages from full‐text databases. The integration is based on the hierarchical structure inherent in such databases as legal statutes and regulations and engineering standards. The user may switch freely among access methods in order to develop the most appropriate search strategy. The retrieved passages are presented to the user within the context of the hierarchical structure.

Details

Online Review, vol. 9 no. 5
Type: Research Article
ISSN: 0309-314X

Article
Publication date: 1 August 1997

A. Macfarlane, S.E. Robertson and J.A. Mccann

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text

Abstract

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for text retrieval. We analyse parallel IR systems using a classification defined by Rasmussen and describe some parallel IR systems. We give a description of the retrieval models used in parallel information processing. We describe areas of research which we believe are needed.

Details

Journal of Documentation, vol. 53 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 January 1993

Ankie Visschedijk and Forbes Gibb

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…

Abstract

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.

Details

Online and CD-Rom Review, vol. 17 no. 1
Type: Research Article
ISSN: 1353-2642

Keywords

Article
Publication date: 28 September 2021

Zahra Alvandi Poor, Mahdieh Mirzabeigi and Majid Nabavi

The purpose of this study aims to identify the impact of verbal-visual cognitive styles on the level of satisfaction and behavior in the textual and content search of Google…

Abstract

Purpose

The purpose of this study aims to identify the impact of verbal-visual cognitive styles on the level of satisfaction and behavior in the textual and content search of Google Images.

Design/methodology/approach

“Riding” cognitive style test and satisfaction questionnaire were used as data collection tools. Also, to collect data related to the image search behavior, the subjects’ transaction files were recorded using Camtasia software and then the files observed and reviewed. The research sample was 90 postgraduate students of Shiraz University.

Findings

The results showed that cognitive styles in interaction with the text-based and content-based search system of “Google Images” affected user’s satisfaction. Text-based image retrieval, in which vocabulary-based information needs were expressed, was more compatible with the verbal cognitive style and resulted in greater satisfaction. In contrast, in content-based image retrieval, where it was possible to express information needs in the form of images, users were more satisfied with the visual cognitive style. Verbal users performed more positively in text-based search and visual users in content-based search.

Originality/value

Considering the research gap, which has identified the performance of visual text-based and content-based systems in terms of satisfaction and cognitive style search behavior, the present study could be considered a small effort to promote science.

Details

Aslib Journal of Information Management, vol. 74 no. 1
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 6 November 2017

Yanti Idaya Aspura M.K. and Shahrul Azman Mohd Noah

The purpose of this study is to reduce the semantic distance by proposing a model for integrating indexes of textual and visual features via a multi-modality ontology and the use…

Abstract

Purpose

The purpose of this study is to reduce the semantic distance by proposing a model for integrating indexes of textual and visual features via a multi-modality ontology and the use of DBpedia to improve the comprehensiveness of the ontology to enhance semantic retrieval.

Design/methodology/approach

A multi-modality ontology-based approach was developed to integrate high-level concepts and low-level features, as well as integrate the ontology base with DBpedia to enrich the knowledge resource. A complete ontology model was also developed to represent the domain of sport news, with image caption keywords and image features. Precision and recall were used as metrics to evaluate the effectiveness of the multi-modality approach, and the outputs were compared with those obtained using a single-modality approach (i.e. textual ontology and visual ontology).

Findings

The results based on ten queries show a superior performance of the multi-modality ontology-based IMR system integrated with DBpedia in retrieving correct images in accordance with user queries. The system achieved 100 per cent precision for six of the queries and greater than 80 per cent precision for the other four queries. The text-based system only achieved 100 per cent precision for one query; all other queries yielded precision rates less than 0.500.

Research limitations/implications

This study only focused on BBC Sport News collection in the year 2009.

Practical implications

The paper includes implications for the development of ontology-based retrieval on image collection.

Originality value

This study demonstrates the strength of using a multi-modality ontology integrated with DBpedia for image retrieval to overcome the deficiencies of text-based and ontology-based systems. The result validates semantic text-based with multi-modality ontology and DBpedia as a useful model to reduce the semantic distance.

Details

The Electronic Library, vol. 35 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 April 2001

D.C. Veal Doverton

Present and possible future developments in the techniques of document management are reviewed, the major ones being text retrieval and scanning and OCR. Acquisition, indexing and…

1504

Abstract

Present and possible future developments in the techniques of document management are reviewed, the major ones being text retrieval and scanning and OCR. Acquisition, indexing and thesauri, publishing and dissemination and the document management industry are also addressed. The emerging standards are reviewed and the impact of the Internet is analysed.

Details

Journal of Documentation, vol. 57 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 February 1993

BRIAN VICKERY and ALINA VICKERY

There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…

Abstract

There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.

Details

Journal of Documentation, vol. 49 no. 2
Type: Research Article
ISSN: 0022-0418

1 – 10 of over 13000