Search results

11 – 20 of 48
Article
Publication date: 1 May 1970

KAREN SPARCK JONES

My research over the last few years has been concerned with the use of automatically‐obtained keyword classifications for information retrieval. Such a classification can be…

Abstract

My research over the last few years has been concerned with the use of automatically‐obtained keyword classifications for information retrieval. Such a classification can be described as a thesaurus, but those classifications which have been most successful in my experiments do not resemble the normal kind of manually‐constructed thesaurus, and the bases on which automatic and manual thesauri are constructed are quite different. Human beings explicitly consider the meanings of words in grouping them, but word meanings are not accessible to computers. Automatic word classification is therefore based on information about the distributional behaviour of words in documents, on the assumption that words which behave in similar ways in terms of document occurrences are semantically related. That is to say, groups of words which are based on the statistical associations of their members in documents should reflect their meaning relations, at least sufficiently for the purposes of retrieval.

Details

Aslib Proceedings, vol. 22 no. 5
Type: Research Article
ISSN: 0001-253X

Article
Publication date: 1 April 1979

W.B. CROFT and D.J. HARPER

Most probabilistic retrieval models incorporate information about the occurrence of index terms in relevant and non‐relevant documents. In this paper we consider the situation…

Abstract

Most probabilistic retrieval models incorporate information about the occurrence of index terms in relevant and non‐relevant documents. In this paper we consider the situation where no relevance information is available, that is, at the start of the search. Based on a probabilistic model, strategies are proposed for the initial search and an intermediate search. Retrieval experiments with the Cranfield collection of 1,400 documents show that this initial search strategy is better than conventional search strategies both in terms of retrieval effectiveness and in terms of the number of queries that retrieve relevant documents. The intermediate search is shown to be a useful substitute for a relevance feedback search. Experiments with queries that do not retrieve relevant documents at high rank positions indicate that a cluster search would be an effective alternative strategy.

Details

Journal of Documentation, vol. 35 no. 4
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 January 1972

KAREN SPARCK JONES

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted…

4736

Abstract

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing in particular that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

Details

Journal of Documentation, vol. 28 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 October 2004

Stephen Robertson

The term‐weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a…

5582

Abstract

The term‐weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon's Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in the traditional probabilistic model of information retrieval.

Details

Journal of Documentation, vol. 60 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 October 2004

Karen Spärck Jones

Robertson comments on the theoretical status of IDF term weighting. Its history illustrates how ideas develop in a specific research context, in theory/experiment interaction, and…

1095

Abstract

Robertson comments on the theoretical status of IDF term weighting. Its history illustrates how ideas develop in a specific research context, in theory/experiment interaction, and in operational practice.

Details

Journal of Documentation, vol. 60 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 February 1970

KAREN SPARCK JONES

The suggestion that classifications for retrieval should be constructed automatically raises some serious problems concerning the sorts of classification which are required, and…

Abstract

The suggestion that classifications for retrieval should be constructed automatically raises some serious problems concerning the sorts of classification which are required, and the way in which formal classification theories should be exploited, given that a retrieval classification is required for a purpose. These difficulties have not been sufficiently considered, and the paper therefore attempts an analysis of them, though no solutions of immediate application can be suggested. Starting with the illustrative proposition that a polythetic, multiple, unordered classification is required in automatic thesaurus construction, this is considered in the context of classification in general, where eight sorts of classification can be distinguished, each covering a range of class definitions and class‐finding algorithms. The problem which follows is that since there is generally no natural or best classification of a set of objects as such, the evaluation of alternative classifications requires cither formal criteria of goodness of fit, or, if a classification is required for a purpose, a precise statement of that purpose. In any case a substantive theory of classification is needed, which does not exist; and since sufficiently precise specifications of retrieval requirements are also lacking, the only currently available approach to automatic classification experiments for information retrieval is to do enough of them.

Details

Journal of Documentation, vol. 26 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 January 1996

MARISTELLA AGOSTI, MICHELINE BEAULIEU, CYRIL CLEVERDON, HANS‐PETER FREI, NORBERT FUHR, DAVID HARPER, PETER INGWERSEN, MICHAEL KEEN, RAINER KUHLEN, STEPHEN ROBERTSON, ALAN SMEATON, KAREN SPARCK JONES, KEITH VAN RUSBERGEN and PETER WILLETT

Sir, We write to record our debt, and that of our colleagues, to one of the founding fathers of information retrieval, Gerard (Gerry) Salton, who died on 28th August 1995 in…

Abstract

Sir, We write to record our debt, and that of our colleagues, to one of the founding fathers of information retrieval, Gerard (Gerry) Salton, who died on 28th August 1995 in Ithaca, ny at the age of 68. Information retrieval was established as a new academic discipline by a small number of pioneers, Gerry among them, who recognised the need for, and the research challenges presented by, the automated indexing, storage and retrieval of text documents. He brought academic rigour and scholarship to establishing the foundations of this discipline, and we acknowledge his influential contributions to the theory, experimental methods, and practice of information retrieval.

Details

Journal of Documentation, vol. 52 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 July 1972

Karen Sparck Jones

An information retrieval thesaurus is a familiar object: it consists of a set of more or less controlled terms functioning as conceptual labels for sets of natural language entry…

Abstract

An information retrieval thesaurus is a familiar object: it consists of a set of more or less controlled terms functioning as conceptual labels for sets of natural language entry words, with some indication of relations between terms in the form of references from terms to other BT's, NT's or RT's. Such an indexing language differs from a set of natural language keywords in involving vocabulary normalization; from a list of subject headings in being designed for post‐co‐ordination; and from a classification in having a limited structure.

Details

Aslib Proceedings, vol. 24 no. 7
Type: Research Article
ISSN: 0001-253X

Article
Publication date: 1 February 1983

KAREN SPARCK JONES

Major initiatives in advanced information technology are under way or have been proposed in the UK, Europe and Japan. ‘Information technology’ is an umbrella expression with…

Abstract

Major initiatives in advanced information technology are under way or have been proposed in the UK, Europe and Japan. ‘Information technology’ is an umbrella expression with different interpretations: it has been adopted in these policy contexts to refer to all areas of computer (and communications) technology, including hardware and software and both to computing in itself and to the applications of computers; it refers in the broadest sense to the technology of and for information processing.

Details

Journal of Documentation, vol. 39 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 October 2004

Karen Spärck Jones

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted…

3450

Abstract

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

Details

Journal of Documentation, vol. 60 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

11 – 20 of 48