Machine vision in assembly automation

Assembly Automation

ISSN: 0144-5154

Article publication date: 1 September 2005

297

Citation

Davies, E.R. (2005), "Machine vision in assembly automation", Assembly Automation, Vol. 25 No. 3. https://doi.org/10.1108/aa.2005.03325caa.002

Publisher

:

Emerald Group Publishing Limited

Copyright © 2005, Emerald Group Publishing Limited


Machine vision in assembly automation

Machine vision has had a long association with assembly automation. From the 1980s and earlier it was envisaged as helping both with the process of assembly and with the incidental process of automated visual inspection. Many thought at that time that inspection and assembly required different sorts of vision process, but it soon came to be realised that common image analysis tools such as edge detection, fiducial point identification and location, and associated tasks such as boundary pattern analysis and graph matching, were near enough identical in the two cases. In any case, a necessary part of assembly is to check that the components being assembled are the right ones, and that they are correctly orientated and also (at least as important) that they are not defective in any way. For example, it makes special sense to check that all screws have threads, that the hole is in the right place on the workpiece, and also that there is no burr present. Far better to check the components before and during assembly than add a lot of value to the workpiece only to have to reject it at the very end of the process.

Over these last 25 years much has been achieved. Many assembly systems are in place in industry and many also have inspection systems attached to them. But progress may not have been quite as rapid as had been envisaged. Indeed, one of the problems of applying machine vision to the assembly task is that each case seems different – indeed so different that there is a tendency to have to start afresh each time (at least with the vision part of the task), and so the whole process is very manpower intensive. This has the consequence that fewer applications can be tackled, and those applications that are tackled cannot be taken as far as might be hoped and sometimes in the past have reeked of short-cuts that lack robustness.

One reason for this problem arises from the needs of real-time processing: for assembly is by its nature a viciously real-time task – and within it vision, with its huge data rate, is an exceptionally computation intensive process. As a result, in the 1980s we had to slave away producing lots of dedicated hardware to implement our vision algorithms. Not only did this take something like three quarters of the manpower, but also the cost of the final vision system was high – typically in the £20,000 bracket. Not least, the frame grabbers used to acquire the images were expensive and many had all-too-limited lifetimes. Gradually, the situation got better, with the advent of bit-slices, then DSP chips, then ASICs, and most recently ASICs with several microprocessors on the same chip. And this is no idle sort of technology: at a price it is dynamically reconfigurable in real time, and is the way of the future. As a result of these advances, the manpower needed to implement the vision system is down to something like a quarter of the overall vision system, and the cost is down to the £5,000 bracket. (Naturally, such figures depend on the particular task.) Likewise, frame grabbers and cameras are now so cheap that we hardly have to consider the cost, and can throw in additional cameras and processing without too much worry: reliability is also so high that it hardly deserves a mention.

With all these advances, and many more to come by the end of the decade, we are at the close of the era when real-time processing was a limitation. However, this leaves us with other problems. It brings home the fact that we still cannot do all we want to: specifically, we are limited by our knowledge of how to produce highly accurate, robust, adaptable vision algorithms at an ideal rate. And flexible automation is an important subject. It is limited in its turn by the flexibility of the vision algorithms, which still have to be engineered individually. What ought to have happened long ago is that such systems should have become trainable, so that they are intrinsically flexible and do not need the expert vision programmer to help them along. While this has proved possible to a worthwhile extent (particularly with the help of artificial neural networks and other types of learning machine), the problem has by no means been solved. We need AI, but more than AI, we need cognitive vision systems. The trouble is that the problem is actually quite complicated. It is not for nothing that babies spend something like a year learning about the world, feeling objects, fondling pieces of material, beads, utensils, their food, and even soil, getting a first hand understanding of what the world is like, vision only taking a minor place – though eventually it is probably the strongest of our senses. And until robots can experience the world and learn about it in the same way as babies, how can they learn enough of the real world to eliminate vision programmers?

The purpose of the present discussion is not to be negative or depressing, but to see quite why progress has been limited, and with understanding maybe we shall find the best ways forward. Meanwhile, the scope of the work has been accelerating and broadening. The articles in this special issue show how much we have been able to advance. Whereas in the 1980s we were doing boundary tracking of binary images and checking flat objects and surfaces, today we are attacking 3D objects and scenes with gusto, in real time, and solving much more grown-up tasks – and finding excellent solutions. There is still the question of the extent to which any new work represents trainable solutions or classes of solutions rather than “mere” parochial results. I am hoping for breakthroughs (and trying to produce them!), but wonder if there are ways of solving classes of problems in the way the human visual system does. There are undoubtedly surprising things to learn from the way the HVS solves problems: this is evidenced by the fact that when a fielder catches a cricket ball, he is apparently not calculating trajectories in 3D and solving complicated trigonometric problems – he is simply moving in such a way as to keep the ball at a constant bearing until it ends up in his hands. Is this a trick, or evidence of a systematic approach the like of which we have not yet managed to focus upon?

E.R. Daviesis based at the Royal Holloway, University of London, Egham, UK

Related articles