Search results
1 – 10 of 20Ankie Visschedijk and Forbes Gibb
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…
Abstract
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.
Current approaches to text retrieval based on indexing by words or index terms and on retrieving by specifying a Boolean combination of keywords are well known, as are their…
Abstract
Current approaches to text retrieval based on indexing by words or index terms and on retrieving by specifying a Boolean combination of keywords are well known, as are their limitations. Statistical approaches to retrieval, as exemplified in commercial products like STATUS/IQ and Personal Librarian, are slightly better but still have their own weaknesses. Approaches to the indexing and retrieval of text based on techniques of automatic natural language processing (NLP) may soon start to realise their undoubted potential in terms of improving the quality and effectiveness of information retrieval. In this article we will explore what that potential is. We will divide information retrieval functionality into conceptual and traditional information retrieval and we will examine some of the current attempts at using various NLP techniques in both the indexing and retrieval operations.
In the third paragraph, the author states that ‘Conventional text retrieval systems suffer from a number of problems. First, indexing terms and / or classificators have normally…
Abstract
In the third paragraph, the author states that ‘Conventional text retrieval systems suffer from a number of problems. First, indexing terms and / or classificators have normally to be assigned manually, which is a very time‐consuming process and can lead to severe problems with regard to inter‐indexer consistency.’ To what types of systems does this refer? From a content perspective it would appear to be addressing the problems of a keyword system, also referred to as a document coding system. Yet, they are referred to as ‘conventional text retrieval systems.’ Manual indexing is not a component of today's text retrieval system, elementary or advanced.
Text retrieval is not a new technology — it has been familiar to library and information professionals for many years. For example, among the vendors we will look at here, PLS…
Abstract
Text retrieval is not a new technology — it has been familiar to library and information professionals for many years. For example, among the vendors we will look at here, PLS dates from 1983 and Fulcrum from 1984 — children compared to, say, IBM and Microsoft, but venerable in the general terms of the IT industry. Recently, however, several companies offering text retrieval services have begun to raise their profile — so much so that Delphi Consulting reports the text retrieval market has recently broken the half billion dollar barrier. Many of these companies are gaining the financial clout to add features to their products, or diversify, or head off in a completely new direction. Here we give a round‐up of some of them.
This paper will discuss the development of an information management system providing access to radio archive material. A formal television and radio archive agreement between the…
Abstract
This paper will discuss the development of an information management system providing access to radio archive material. A formal television and radio archive agreement between the BBC in Northern Ireland and the Ulster Folk and Transport Museum in 1989 led to the decision to use STATUS for the management of the radio archive. An example of the record structure is given and some details of ways of searching.
UK body for Internet registration. A new national body in the UK responsible for registering Internet names has held its first meeting. Nominet UK is a not‐for‐profit company set…
Abstract
UK body for Internet registration. A new national body in the UK responsible for registering Internet names has held its first meeting. Nominet UK is a not‐for‐profit company set up with the support of all sections of the UK Internet industry and which derives its authority from the Internet Assigned Numbers Authority. Until its creation Internet name registration was done on a voluntary basis by the UK Education & Research Networking Association but the Internet's increasing popularity, with 200 new registrations per week, put a strain on this arrangement.
BRIAN VICKERY and ALINA VICKERY
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…
Abstract
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.
The AIDA project is a program of research being carried out by Computer Power in Canberra, Australia, in collaboration with the Australian Parliament. Its primary objective is to…
Abstract
The AIDA project is a program of research being carried out by Computer Power in Canberra, Australia, in collaboration with the Australian Parliament. Its primary objective is to develop practical methods for carrying out document content analysis with minimal human intervention. Following a very successful independent assessment of the techniques, the first commercial‐strength tool has now been developed. It links the different AIDA analyses (point form summary, keywords, and so on) with the original document to form a “complete” hyperdocument. The different techniques employed by AIDA to achieve its results are described.
Term position information, as provided in some Boolean systems in the form of field restriction and term proximity, is reviewed and its value assessed. Non‐Boolean retrieval in…
Abstract
Term position information, as provided in some Boolean systems in the form of field restriction and term proximity, is reviewed and its value assessed. Non‐Boolean retrieval in the form of the ranked output experiment has not so far used term position information but has concentrated on schemes of term weighting. The use of term proximity devices is proposed here by analogy with Boolean techniques and seven algorithms are devised to incorporate the ideas of sentence matching, proximate terms, term order specification and term distance computations. It is hypothesised that term position will act as a precision device. A new search experiment is then described in which a test collection is processed into sentences and then output ranking using term position is obtained. Results are given for five algorithms compared against quorum searching as the benchmark. The best result increased the precision ratio by 18% and used proximate matching term pairs in sentences plus a distance component.
The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a…
Abstract
The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full‐text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data fusion in IR. By explicitly incorporating all the cognitive structures participating in the interactive communication processes during IR, the cognitive theory provides a comprehensive view of these processes. It encompasses the ad hoc theories of text retrieval and IR techniques hitherto developed in mainstream retrieval research. It has elements in common with van Rijsbergen and Lalmas' logical uncertainty theory and may be regarded as compatible with that conception of IR. Epistemologically speaking, the theory views IR interaction as processes of cognition, potentially occurring in all the information processing components of IR, that may be applied, in particular, to the user in a situational context. The theory draws upon basic empirical results from information seeking investigations in the operational online environment, and from mainstream IR research on partial matching techniques and relevance feedback. By viewing users, source systems, intermediary mechanisms and information in a global context, the cognitive perspective attempts a comprehensive understanding of essential IR phenomena and concepts, such as the nature of information needs, cognitive inconsistency and retrieval overlaps, logical uncertainty, the concept of ‘document’, relevance measures and experimental settings. An inescapable consequence of this approach is to rely more on sociological and psychological investigative methods when evaluating systems and to view relevance in IR as situational, relative, partial, differentiated and non‐linear. The lack of consistency among authors, indexers, evaluators or users is of an identical cognitive nature. It is unavoidable, and indeed favourable to IR. In particular, for full‐text retrieval, alternative semantic entities, including Salton et al.'s ‘passage retrieval’, are proposed to replace the traditional document record as the basic retrieval entity. These empirically observed phenomena of inconsistency and of semantic entities and values associated with data interpretation support strongly a cognitive approach to IR and the logical use of polyrepresentation, cognitive overlaps, and both data fusion and data diffusion.