Multiple View Geometry in Computer Vision

Kybernetes

ISSN: 0368-492X

Article publication date: 1 December 2001

3391

Keywords

Citation

Andrew, A.M. (2001), "Multiple View Geometry in Computer Vision", Kybernetes, Vol. 30 No. 9/10, pp. 1333-1341. https://doi.org/10.1108/k.2001.30.9_10.1333.2

Publisher

:

Emerald Group Publishing Limited

Copyright © 2001, MCB UP Limited


Recent developments in the more speculative areas of robotics, as described by Brooks (1999) and by Pfeifer and Scheier (1999), with implications for theories of biological processing, have tended to de‐emphasise the rigorous geometric analysis of images. An earlier view of biological processing, particularly associated with David Marr, assumed rather complete reconstruction of the visible environment at an early stage of processing. It is not difficult to show that something other than this is needed in an animal or a robot operating in real time in a non‐static environment, and these workers describe schemes having relatively direct coupling between sensory inputs and effector mechanisms.

The new methods have allowed robots to drive vehicles, deliver mail in offices, collect empty drinks cans, and to perform various other useful tasks. This certainly does not mean that rigorous geometric analysis plays no part in biological processing, nor that it has no value in robotics. There are important areas of activity in which it cannot be assumed, to quote Brooks (1999), that “the world is its own best model” for all purposes. The versatility of biological processing is summed up by the quotation from Fischler and Firschein (1987):

I suspect that the representational system with which we think, if that’s the right way to describe it, is so rich that if you think up any form of symbolism at all, it probably plays some role in thinking.

Given that rigorous geometric analysis is wanted, it is difficult to imagine how it could be treated more comprehensively than in the book being reviewed, which will certainly become a standard work of reference. As the title indicates, it is particularly concerned with interpretation of multiple views. The coverage is indicated by the following two paragraphs on the opening page:

A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Techniques used in the book for solving this are taken from projective geometry and photogrammetry. The distinctive flavour here, though, is that the approach is uncalibrated – it is not necessary to know or to have to compute the camera’s internal parameters before getting an answer to the problem. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework.

The book covers the geometric principles and their algebraic representation in terms of camera projection matrices, the fundamental matrix and the trifocal tensor. The theory and methods of computation of these entities is discussed with real examples, as is their use in the reconstruction of scenes from multiple images. The authors provide comprehensive background material, so a reader familiar with linear algebra and basic numerical methods will be able to understand the projective geometry and estimation algorithms presented, and implement the algorithms directly from the book.

The treatment is the culmination of studies over the whole history of AI and the comment is made in a Foreword that in the 1960s the difficulty of making a computer see was enormously underestimated. It is acknowledged that even now it is impossible to be sure that this work is a step in the right direction in pursuit of this “holy grail”, but it is certainly a major contribution that no one concerned with computer vision can afford to ignore.

If the “holy grail” is seen as the imitation of all aspects of human visual perception there are obvious ways in which a purely geometrical and purely visual treatment falls short. The real situation is even more complicated since visual perception often operates in conjunction with other senses, and utilises a variety of clues such as information from shadows, and the need for solid objects to have support, and a general knowledge of forms such that, as has been said: “if we see a leg we know where to look for a foot”. These considerations undoubtedly complicate the theoretical picture, at the same time as possibly relaxing the demands that have to be made on the purely geometrical approach.

The material is developed systematically, starting with projection in two dimensions then extending to three, and so on. The presentation is supplemented by high‐quality figures, some of them line drawings or graphs, and others showing images as processed. In five appendices, special mathematical aspects are treated, including an introduction to the use of tensors and a review of necessary statistical theory. Algorithms that can be implemented are also set out in detail, though not in a programming language. This is a very important and well‐prepared book.

References

Brooks, R.A. (1999), Cambrian Intelligence: The Early History of the New AI, MIT Press, Cambridge, MA.

Fischler, M.A. and Firschein, O. (1987), Intelligence: The Eye, the Brain and the Computer, Addison‐Wesley, Reading, MA, p. 308.

Pfeifer, R. and Scheier, C. (1999), Understanding Intelligence, MIT Press, Cambridge, MA.

Related articles