There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.
Current OPACs show their weakness in terms of ease of use and comprehension of user requests, and more generally in man/machine dialogue. Most OPAC searches are for…
Current OPACs show their weakness in terms of ease of use and comprehension of user requests, and more generally in man/machine dialogue. Most OPAC searches are for subjects and these give the word results. Natural language processing techniques exist to reduce these difficulties. In France, natural language processing has been used to access the yellow pages (headings) of the French phone directory and the telematics services directory; examples are included. No doubt the future library systems will use these techniques to make the new OPACs really Open, Public, Accessible and Co‐operative (user‐friendly).
The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human‐computer interaction. While natural…
The field of natural language processing (NLP) demonstrates rapid changes in the design of information retrieval systems and human‐computer interaction. While natural language is being looked on as the most effective tool for information retrieval in a contemporary information environment, the systems using it are only beginning to emerge. This study attempts to evaluate the current state of NLP IR systems from the user’s point of view: what techniques are used by these systems to guide their users through the search process? The analysis focused on the structure and components of the systems’ help mechanisms. Results of the study demonstrated that systems which claimed to be using natural language searching in fact used a wide range of information retrieval techniques from real natural language processing to Boolean searching. As a result, the user assistance mechanisms of these systems also varied. While pseudo‐NLP systems would suit a more traditional method of instruction, real NLP systems primarily utilised the methods of explanation and user‐system “dialogue”.
This article is a contribution to the development of a comprehensive interdisciplinary theory of LIS in the hope of giving a more precise evaluation of its current…
This article is a contribution to the development of a comprehensive interdisciplinary theory of LIS in the hope of giving a more precise evaluation of its current problems. The article describes an interdisciplinary framework for lis, especially information retrieval (IR), in a way that goes beyond the cognitivist ‘information processing paradigm’. The main problem of this paradigm is that its concept of information and language does not deal in a systematic way with how social and cultural dynamics set the contexts that determine the meaning of those signs and words that are the basic tools for the organisation and retrieving of documents in LIS. The paradigm does not distinguish clearly enough between how the computer manipulates signs and how librarians work with meaning in practice when they design and run document mediating systems. The ‘cognitive viewpoint’ of Ingwersen and Belkin makes clear that information is not objective, but rather only potential, until it is interpreted by an individual mind with its own internal mental world view and purposes. It facilitates further study of the social pragmatic conditions for the interpretation of concepts. This approach is not yet fully developed. The domain analytic paradigm of Hjørland and Albrechtsen is a conceptual realisation of an important aspect of this area. In the present paper we make a further development of a non‐reductionistic and interdisciplinary view of information and human social communication by texts in the light of second‐order cybernetics, where information is seen as ‘a difference which makes a difference’ for a living autopoietic (self‐organised, self‐creating) system. Other key ideas are from the semiotics of Peirce and also Warner. This is the understanding of signs as a triadic relation between an object, a representation and an interpretant. Information is the interpretation of signs by living, feeling, self‐organising, biological, psychological and social systems. Signification is created and con‐trolled in a cybernetic way within social systems and is communicated through what Luhmann calls generalised media, such as science and art. The modern socio‐linguistic concept ‘discourse communities’ and Wittgenstein's ‘language game’ concept give a further pragmatic description of the self‐organising system's dynamic that determines the meaning of words in a social context. As Blair and Liebenau and Backhouse point out in their work it is these semantic fields of signification that are the true pragmatic tools of knowledge organ‐isation and document retrieval. Methodologically they are the first systems to be analysed when designing document mediating systems as they set the context for the meaning of concepts. Several practical and analytical methods from linguistics and the sociology of knowledge can be used in combination with standard methodology to reveal the significant language games behind document mediation.
In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.
This review reports on the current state and the potential of tools and systems designed to aid online searching, referred to here as online searching aids. Intermediary…
This review reports on the current state and the potential of tools and systems designed to aid online searching, referred to here as online searching aids. Intermediary mechanisms are examined in terms of the two stage model, i.e. end‐user, intermediary, ‘raw database’, and different forms of user — system interaction are discussed. The evolution of the terminology of online searching aids is presented with special emphasis on the expert/non‐expert division. Terms defined include gateways, front‐end systems, intermediary systems and post‐processing. The alternative configurations that such systems can have and the approaches to the design of the user interface are discussed. The review then analyses the functions of online searching aids, i.e. logon procedures, access to hosts, help features, search formulation, query reformulation, database selection, uploading, downloading and post‐processing. Costs are then briefly examined. The review concludes by looking at future trends following recent developments in computer science and elsewhere. Distributed expert based information systems (debis), the standard generalised mark‐up language (SGML), the client‐server model, object‐orientation and parallel processing are expected to influence, if they have not done so already, the design and implementation of future online searching aids.
The purpose of this paper is to describe the creation and exploitation of a historical corpus in an attempt to contribute to the preservation and availability of cultural…
The purpose of this paper is to describe the creation and exploitation of a historical corpus in an attempt to contribute to the preservation and availability of cultural heritage documents.
At first, the digitization process and attempts to the availability and awareness of the books and manuscripts in a historical library in Greece are presented. Then, processing and exploitation, taking into account natural language processing techniques of the digitized corpus, are discussed.
In the course of the project, methods that take into account the state of the documents and the particularities of the Greek language were developed.
In its present state, the use of the corpus facilitates the work of theologians, historians, philologists, paleographers, etc. and in the same time, prevents the original documents from further damage.
The results of this undertaking can give useful insights as for the creation of corpora of cultural heritage documents and as for the methods for the processing and exploitation of the digitized documents which take into account the language in which the documents are written.
Current approaches to text retrieval based on indexing by words or index terms and on retrieving by specifying a Boolean combination of keywords are well known, as are…
Current approaches to text retrieval based on indexing by words or index terms and on retrieving by specifying a Boolean combination of keywords are well known, as are their limitations. Statistical approaches to retrieval, as exemplified in commercial products like STATUS/IQ and Personal Librarian, are slightly better but still have their own weaknesses. Approaches to the indexing and retrieval of text based on techniques of automatic natural language processing (NLP) may soon start to realise their undoubted potential in terms of improving the quality and effectiveness of information retrieval. In this article we will explore what that potential is. We will divide information retrieval functionality into conceptual and traditional information retrieval and we will examine some of the current attempts at using various NLP techniques in both the indexing and retrieval operations.
The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977 conference on the same theme both included substantial sections on operational and experimental machine translation systems, and in its Plan of action the Commission announced its intention to introduce an operational machine translation system into its departments and to support research projects on machine translation. This revival of interest in machine translation may well have surprised many who have tended in recent years to dismiss it as one of the ‘great failures’ of scientific research. What has changed? What grounds are there now for optimism about machine translation? Or is it still a ‘utopian dream’ ? The aim of this review is to give a general picture of present activities which may help readers to reach their own conclusions. After a sketch of the historical background and general aims (section I), it describes operational and experimental machine translation systems of recent years (section II), it continues with descriptions of interactive (man‐machine) systems and machine‐assisted translation (section III), (and it concludes with a general survey of present problems and future possibilities section IV).
The paper describes techniques developed by Tome Associates to process natural language queries into search statements suitable for transmission to online text database…
The paper describes techniques developed by Tome Associates to process natural language queries into search statements suitable for transmission to online text database systems. The problems discussed include word identification, the handling of unknown words, the contents and structure of system dictionaries, the use of semantic categories and classification, disambiguation of multi‐meaning words, stemming and truncation, noun compounds and indications of relationship between search terms.