Image Retrieval: Theory and Research

Peter Enser (School of Computing, Mathematical and Information Sciences, University of Brighton, Brighton, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 June 2004




Enser, P. (2004), "Image Retrieval: Theory and Research", Journal of Documentation, Vol. 60 No. 3, pp. 321-323.



Emerald Group Publishing Limited

Copyright © 2004, Emerald Group Publishing Limited

This work is a most important contribution to the literature of visual image retrieval. Its special significance lies in the author's recognition of the various communities of researchers and practitioners who contribute to this vibrant topic within the broad field of information retrieval. The research endeavours of each such community have been investigated with commendable thoroughness, and reported with exemplary clarity.

The work takes the form of a research review, the provenance of which is described as a prelude to the author's dissertation research, in which she sought to discover the range of attributes recognised by humans when describing images. A succinct and scholarly preface establishes that the primacy of human communication through the visual medium was lost only during the last 500 of the 30,000 years of recorded human experience, but that there has been a resurgence due to technological developments. This is followed by a short chapter that gives focussed consideration to still images and the distinction between manual and automatic methods for their subject indexing. Attention is drawn to the lack of collaboration between the practitioner community who are concerned with manual indexing methods, and the research communities in computer science and computer vision who focus on automatic methods.

This short chapter sits a little oddly between the preface and the succeeding, substantive chapters, and leads one to wonder if there would not have been some advantage in blending the preface with this first chapter.

There follows a sequence of chapters that review research in image retrieval from a number of different viewpoints. In the first of these chapters the cognitive foundations of image processing are considered, and a comprehensive account is provided of the relationship between the physiological processes of visual perception and the cognitive and contextual processes which produce image comprehension and understanding in the human.

The juxtaposition of such material with a detailed treatment of image retrieval as exercised by the information science and computer science communities is valuable. Theories of human perception and preconceptual vision, by explaining our perception of low‐level features, such as colour, texture and shape, inform our view on content‐based retrieval techniques that operate on these quantifiable attributes of the image. Interestingly, mid‐level processing, which builds mental constructs from visual components, and global perceptual processing by which we recognise the semantics of a viewed artefact, has no adequate theory of vision to explain them.

The role of context, attention and emotion are considered, and eye‐tracking experiments are given quite extensive treatment, as are theories of image understanding and human image memory. There is also a whole section on the psychology of art.

The following chapter, which deals with organising and providing access to images, is an excellent, highly researched review of techniques for the manual processing of still image material. Vocabulary tools for image indexing are evaluated using a standardised procedure previously published by the author. The problems inherent in textual indexing are discussed, including consideration of the need to take context into account. The unpredictability of retrieval utility and the innately entropic nature of images underpin the argument, although not expressed in those terms by the author.

There are guidelines for standardisation and metadata initiatives, including coverage of VRA Core, Dublin Core and mark‐up languages. Also considered is the theoretical basis for image classification, and this chapter also contains an exhaustive consideration of user studies and studies of queries and searches reported in the literature.

In the fourth chapter the author enters the realm of content‐based image retrieval. The evolution of this approach is discussed, together with a thoroughly researched overview of new approaches at the cutting edge of research endeavour. This is impressive, not least because the volume of publication is very high, but also because it is of a relentlessly technical nature that the author has managed to encompass without obfuscation. There is some cross‐referencing to the human perceptual processes discussed in the second chapter, furthermore.

The author takes a balanced view of the contribution that the current generation of CBR techniques can make to the broad field of image retrieval. Having observed that user testing of CBR features has demonstrated that users do not find low‐level features particularly intuitive to search or relevant to their queries, she goes on to consider cutting‐edge techniques in “semantic” retrieval, and the need to integrate text‐based retrieval and CBR in order to achieve adequate system performance.

This chapter moves on to consider image databases, considering how to adopt basic database architecture and querying techniques for image applications. Evaluation methods and exemplars are given, and the chapter concludes with an excellent summary, which offers a critical appraisal of progress and a clear‐sighted view of the way ahead. The call for cross‐disciplinary awareness of pertinent work and incorporation of a “common, sensible and easily understood framework that will facilitate these new research directions”, although worthy, seems a little optimistic, however.

The fifth chapter appears to be a reprise of the author's research dissertation, concerned with the discovery of typical image attributes reflecting the different tasks of describing and sorting. I do feel that the material in this chapter does not sit comfortably with the rest of the book. The treatment is over‐extensive, given that the work has all been published previously; citation and much more determined summarisation of findings would have sufficed. The experiments certainly have their points of interest in that they provide insights into what people see in images, and the connotations reflected in their verbalisation. The sort task is interesting because it reflects categorical thinking and the preferential viewing order of objects/concepts within images.

The material in the sixth chapter draws heavily on that presented in the earlier chapters, and is a scholarly discussion of the “very incomplete match” between the attributes addressed by major textual indexing systems and the attributes described by participants in empirical research.

The work concludes with a research agenda for the future, including a wish list in which an evaluation testbed of images drawn from diverse domains, and a meaningful set of benchmark queries feature prominently.

The scope of this book is refreshingly broad, to which a bibliography containing no fewer than 759 entries, complemented by copious notes and Web site addresses, stands testament. At this point in time it must stand as the definitive work in the field. One can but sympathise with the author's realisation that, as a review of research in a particularly dynamic subject, her monumental effort will have a limited shelf life. As she says, “writing it has most certainly been pursuing a moving target”.

This impressive, clearly‐written and interdisciplinary review of cutting edge theory and practice confirms Corinne Jörgensen as one of today's leading authorities in the field of visual information retrieval.

Related articles