CitationDownload as .RIS
Emerald Group Publishing Limited
Copyright © 1998, MCB UP Limited
New vision algorithms and tools
"New" vision algorithms and tools
Don Braggins is the Guest Associate Editor for this issue and is based at Machine Vision Systems Consultancy, Royston, Herts, UK. Tel + 44 (0) 1763 260333; Fax: +44 (0) 1763 261961; E-mail: firstname.lastname@example.org
Keywords Algorithms, Computers, Sensors
Bill Silver, Chief Technology Officer of Cognex Corporation, speaking on "Vision technology in the twenty-first century" at the Automated Imaging Association's annual Business Conference in Orlando in February, made the rather confrontational claim that qualitative improvement in industrial machine vision is rare. He claimed that image capture and image processing have hardly changed in 15 years, and image analysis has hardly changed in ten years. Can this really be true? Surely things have advanced enormously in that period? Well, it all depends on your definitions. He took it, for instance, that just because you could buy a vidicon camera in 1981 with an effective resolution of around 640 × 480 pixels (of course there are no actual pixels in a thermionic imaging device), and many of today's CCD cameras are designed to give the same resolution, there has been "no advance". He admitted that processing speeds for tasks such as histogramming, image arithmetic, edge detection, morphology and convolution have risen from a maximum of 30 frames per second in 1985 to 200fps (if you can use that speed!) in 1998, at much lower cost thanks to advances such as MMX technology from Intel. Indeed, a demonstration of "real time" edge detection with different colours for different edge directions, seen at the Gothenberg Robotics conference and exhibition in 1984, remains etched on the author's memory, though the company showing it is has long since disappeared.
The choice of ten years for "no advance" in image analysis was not accidental; it was Cognex which first found how to perform normalised correlation at speeds useful enough for industrial vision, and this really was a breakthrough. The principle was not new, it was just that the process was so computationally expensive that it had not been practical to use it for industrial applications. The important point about this algorithm is that it operates on the grey level image directly there is no binarisation or edge finding involved all the intensity and position information from every pixel is used, or capable of being used. The normalisation process removes from the comparison process any differences in the absolute intensity levels between the search subject and the "master" image of what is being sought, and it also removes the effect of differing contrast between the two. Unlike methods based on centre of mass of a fiducial mark, normalised correlation is quite robust against missing parts of such marks, though it can warn that the fit is not as good as could be expected.
With advances in processing power, and successful efforts by other companies to find ways of rapidly performing the necessary algorithms, Cognex lost their practical monopoly of normalised correlation and have now developed a proprietary technique (described in detail elsewhere in this issue) which is even more robust than normalised correlation and which is easier to apply to situations where rotation is part of the problem.
It is difficult to challenge the assertion that there have not been any major developments in image analysis algorithms for industrial vision applications since the use of normalised correlation became practical. (In the wider field of computer vision for tasks such as medical imaging and tracking of people and vehicles, there have been significant advances in the period.) If we look at the more general field of image understanding, there has been a significant development (not by any means confined to image-related applications) with the development of "synergetic computing". This has been described as a mathematical-physical modelling of a natural phenomenon, namely self organisation. It describes the principles responsible for the spontaneous forming of structures and higher-order organisations from an unstructured number of similar, basically independent systems like molecules, cells or organisms. One can see this when an audience applauds after a performance and the irregular clapping of individuals switches over to a rhythmic beat with no external influence.
Synergetic computing shares some of the characteristics of artificial neural networks (which have been applied to image processing and vision for well over 20 years) in that it enables systems to "learn by seeing" but it has some distinctive characteristics which particularly suit its application to vision. Unlike neural networks, synergetic computing systems have very short training and recognition times, they suppress all information common to all training objects and emphasise the differences between them (see Plate 1). They are well suited to handling high-dimensional feature vectors commonly found in image and sound processing. The Fraunhofer Institute for Integrated Circuits at Erlangen in Germany has been particularly active in applying synergetic computing in these fields, producing prototype access control systems making use of both facial images and voice analysis and applying the technique to vision tasks such as recognition of different types of returned empty bottles, and analysis of X-ray tests of cast aluminium wheel rims. Their work has been licensed to a local vision supplier, Cam Control of Nuremberg.
Plate 1 Synergetic computing being used to identify keys as distinct in this case from mugs and banknotes. Only a few examples were needed for the system to learn generically the different classes
There have of course been many other advances in vision technology over the past ten to 15 years, but many of these are associated with improved tools and techniques for image acquisition rather than the algorithms used to analyse the images once one has got them into the computer. One area where this is particularly noticeable is in colour image analysis. The cost of colour cameras, even three-chip ones which ensure that each pixel contains colour information from absolutely identical parts of the object, has become much more affordable, and it is now possible to convert from RGB colour data to HIS (hue, intensity, saturation) at frame rates (see Figure 1). Pekka Parnanen of the Finnish company Temet Vision Industry Oy (which makes some unique three-chip linear array cameras) pointed out at a conference on "New image processing techniques and applications" in Munich, June 1997, that provided one has genuine "pixel identical" colour information it can in fact be far easier to isolate areas of interest by hue and/or saturation than on the intensity information which would be found in a monochrome image of the same scene; he cited detection of knots in timber as being particularly simple when using colour information compared with trying to do the same thing using a monochrome image.
Figure 1 Principles of three-chip cameras
Advances in camera technology have been quite amazing and continue apace. With the vidicon tube camera, and indeed with many of the CCD cameras offered as direct replacements for tube cameras, you have to take the image when the system wants to give it to you, and the exposure period is the same as the readout rate typically 20 milliseconds for a single-field acquisition from an interlaced European-standard camera. Today, as a vision system builder, you can select from a variety of cameras offering triggering of image acquisition when you want (so that the object fills the field of view, making best use of resolution); the exposure can be adjusted independently of the frame repetition rate (down to microseconds if needs be, and if you have enough illumination), and "progressive scan" means that you do not need to either discard one of the interlaced fields or painstakingly combine two such fields only to find "jagged edges" from motion between fields.
Being able to trigger image acquisition at the required moment is helpful when using light emitting diodes as the illumination source. An array of such diodes can be energised very briefly to give much higher light outputs than they could give when running continuously; if the camera is triggered simultaneously and then electronically shuttered to prevent further exposure, moving objects can be imaged with effectively "stop motion" capability. If the LEDs emit near-infra-red light and the camera sensor is not prevented from "seeing" this infra-red (a filter is often incorporated because of problems of differential focus for IR wavelengths), the "flash" illumination will not be annoying to personnel working in proximity to the system. A particularly striking use of this infra-red illumination approach can be seen if retroreflective car number plates or traffic signs are imaged in this way.
Although "CCD" has become a kind of shorthand for any solid state camera, there are other technologies which have some very useful characteristics. The charge injection device (CID) differs from the CCD in several respects. It will never "bloom" through overexposure, and can be read out selectively because each pixel is individually addressed. One can even look at the signal which has built up at a given sensor site, in a non-destructive way, and decide whether or not to re-set the site to zero or to continue to receive photons for a longer exposure.
Very high dynamic range image sensors, whose construction also provides pixel-addressable capability, have been developed independently by IMS Chips, Stuttgart and IMEC, Leuven, Belgium, and cameras containing these chips are now being sold commercially by other organisations. These can, for instance, capture an image of the printing on the end of a clear-envelope light bulb while also showing the lit filament. The sensors are based on CMOS technology which means that any "silicon foundry" can be used to make them; they do not require the special fabrication processes needed for the manufacture of CCD sensors.
The use of CMOS construction for sensors also allows other processing circuitry to be built into the sensor chip. VLSI Vision Ltd of Edinburgh was one of the first companies to exploit this capability in its Imputer, an "imaging computer on a chip". The Swedish firm Integrated Vision Products has applied these principles to incorporate processing of light-stripe position information on the sensor, giving very rapid 3-D image capture, and other functions such as edge detection and run length encoding have been integrated at chip level.
The inspection of fast-moving Web materials has been a challenge to vision technology for many years. If you reduce the exposure time of a linear array to prevent motion blur, you do not get enough light onto the sensor sites to be sure of detecting flaws. Efforts starting in the 1970s led to the development of laser scanners which could use large photon-collecting devices and which enabled images to be constructed because one knows where the laser beam is pointing at any instant, analogous to the imaging principles used in the scanning electron microscope. However, laser scanners are bulky, mechanically complex, and expensive. Technology originally developed for use in satellite imaging has helped solid state sensors to replace laser scanners for many applications where previously it was impossible to provide sufficient illumination to work at the required speeds. Time delay and integration sensors look like area arrays, but are used as if they were a series of adjacent linear arrays. As a given line on the fast moving Web moves forwards under a TDI camera, its image moves backwards down the array, at a lower speed dictated by the de-magnification factor between scene and image (see Figure 2). By matching the readout rate of individual lines to the speed of movement in the image, it is possible to take the image data from the same line on the Web many times in successive lines of the array, and this can be summed to improve signal to noise ratio just as one might do for a "noisy" static image of any kind. Some blurring is bound to occur as the Web material moves fractionally from side to side, and imperfections in speed synchronisation can also contribute to blurring, so a practical limit seems to be the addition of about 100 successive lines, which should give a tenfold improvement in signal to noise ratio.
Another challenge concerning Web material has been the astonishing ability of the human visual system to almost instantly perceive flaws in patterned or textured fabrics, wallpapers, and similar decorative materials, even though the information content of the pattern overwhelms that of the flaw, and in the presence of inconsistencies in the pattern due to stretching and shrinkage. In the past two years' work at the University of Darmstadt in conjunction with the local firm ISRA has resulted in algorithms capable of detecting flaws in such materials with no foreknowledge of the underlying pattern.
Figure 2 Principles of TDI
The principles of acquisition of 3-D information by 2-D camera systems using triangulation and "structured light" have been known and used in vision for well over 15 years. A robot guidance system installed at General Motors in St Catherine's, Ontario used a form of triangulation as long ago as 1981. Thanks to developments in liquid crystal light valve technology, in recent years it has become possible to convert virtually any 2-D vision system into one capable of yielding the 3-D position of every point in the field of view represented by a pixel. A series of about ten successive projections of nominally black and white grid lines at varying grid spacings allows an unambiguous "z" position to be gathered corresponding to the x,y position of every pixel, a far more rapid process than the alternative of scanning with a single light edge or stripe. Augmented accuracy in the "z" direction can be achieved by subsequently projecting a sinusoidally varying (multi-tonal) grid and using phase-shifting Moiré techniques, with any ambiguities being overcome by use of the coarser data provided by the black and white grid projections.
Progress in industrial machine vision is not measured by algorithms alone there have been many small steps in hardware and software which have brought us to the current position where (almost) anything is possible!
Contact telephone numbers
Cognex UK +44 (0) 1707 828 018.
CID Technologies Inc +1 315 451 9410.
IMEC vzw +32 16 281 518.
C-Cam Technologies (VHDR cameras) +32 16 398300.
IMS Chips +49 711 685 7333.
Kamera Werke Noble GmbH (VHDR cameras) +49 351 2806 0.
Image Industries Ltd (LED) +44 (0) 1372 726150.
Dalsa Europe (TDI) +49 8142 46770.
Fraunhofer-Institut IIS +49 9131 776 0.
Cam-Control GmbH +49 911 616 0233.
Temet Vision Industry Oy +358 9 759001.
VLSI Vision Ltd +44 (0) 131 539 7111.
Integrated Vision Products (IVP) +46 13 21 15 00.
ISRA Systemtechnik GmbH +49 6151 9480.