Second‐order Methods for Neural Networks: : Fast and Reliable Training Methods for Multi‐Layer Perceptrons (Perspectives in Neural Computing Series)

Kybernetes

ISSN: 0368-492X

Article publication date: 1 March 1998

150

Citation

Andrew, A.M. (1998), "Second‐order Methods for Neural Networks: : Fast and Reliable Training Methods for Multi‐Layer Perceptrons (Perspectives in Neural Computing Series)", Kybernetes, Vol. 27 No. 2, pp. 201-203. https://doi.org/10.1108/k.1998.27.2.201.3

Publisher

:

Emerald Group Publishing Limited


This book treats the subject‐area denoted by its title in a meticulous and thorough fashion. It is a little startling to find that the term “neural net” is used to refer purely to artificial nets, with no attempt to spell out any connection with biological studies. The fast and reliable training methods do not seem plausible as models of nervous‐system functioning (though it is dangerous to be dogmatic on such issues!).

The concentration on artificial networks is surprising in view of the author’s affiliation to a department of Biochemistry and Molecular Biology. On the other hand, in the spirit of much work in artificial intelligence, a case can be made for tackling a problem without any restriction to biologically‐plausible models. The insight that is gained may still eventually lead, indirectly, to better understanding of biological processes. The emphasis in this book is strongly on methods having direct practical utility.

The perceptron principle, as originally devised by Rosenblatt, involved changes in weight values at only a single functional layer, that of the synaptic connections to output neurons. A layer of “association units” was placed between the input or sensory units and the output or response ones but in the standard perceptron these were not modified by learning. The suggestion that the network might include feedback paths allowing useful adjustment of the “hidden units” is implicit in the Pandemonium scheme of Selfridge (1959), and was generalised by Andrew (1965) to include a mechanism corresponding to the now‐popular backpropagation‐of‐errors algorithm.

In the present book the operation of the backpropagation algorithm is analysed in detail, with the assumption, customary in the theoretical treatment of perceptrons, of a finite set of training patterns. The training process can be seen as automatic optimisation, where the independent variables are the synaptic weights throughout the net and the quantity to be optimised (here, minimised) is the mean (or mean‐square, etc.) error of the network, over a batch of input patterns.

What the author calls classic methods for function optimisation are discussed in considerable detail. They depend on the local expansion of the function to be minimised as a Taylor’s series of terms including derivatives, except that for multivariate optimisation the first derivative becomes a vector and the second a matrix. The backpropagation algorithm amounts to steepest‐descent optimisation, based on the Taylor’s expansion taken only to the first derivative. Such operation constitutes a first‐order method.

First‐order methods can be slow to converge, especially where the multidimensional response surface contains ravines (in the case of minimisation) or ridges (in the case of maximisation). The difficulties are discussed at length by Rosenbrock (1960) and Box (1965), both of whom give optimisation methods not using derivatives. For the purpose of neural‐net adjustment, however, the backpropagation algorithm allows evaluation of first derivatives, and the second‐order methods advocated in the book utilise these along with second‐derivative values. Just as the backpropagation method requires a reverse activation of the network following each forward activation, the second‐order methods require two or more such reverse activations, in some versions interleaved with additional forward activations.

Numerical results are given for experiments with specified small networks using the various methods, on a variety of tasks, and the effectiveness of second‐order methods is amply demonstrated. The operation is directed at weight‐adjustment in networks whose patterns are preset, although there is mention of schemes allowing dynamic change of the number of hidden nodes. Rearrangement of the network by elimination and introduction of elements is one of the main features of the Pandemonium scheme of Selfridge (1959) and would probably be needed in a network capable of, for example, learning as a general principle the equivalence of inputs under particular transformations. An interesting future development is likely to be the appropriate combination of a feedback of “worth” as postulated by Selfridge and backpropagation of error.

In his final chapter the present author tackles the thorny question of global and local optima. Rosenbrock and Box offer no solution to this, except to suggest that the optimisation process be restarted from a number of different initial conditions, so that if there are multiple optima successive runs may converge on distinct choices. For optimisation in general, it is difficult to see how to improve on this, but for the special case of optimisation of neural‐net weight values, especially by second‐order methods, means of moving from one minimum to another have been devised and are reviewed here.

The treatment in the book is an admirable combination of rigorous mathematics with helpful comments and advice on method selection obviously stemming from practical experience and from acquaintance with what is seen as the state‐of‐the‐art. The focus is definitely on artificial nets rather than live ones, and problems of computer implementation are treated, including the effect of finite digital resolution. All of the test runs reported are with double precision (15‐digit) on a PC. Matters of computer run time, storage requirements, and program complexity are taken into consideration. The small book is crammed with important and useful material, including many references to the findings of other workers and, supporting this, an extensive bibliography.

References

Andrew, A.M. (1965, Significance Feedback in Neural Nets, Report of Biological Computer Laboratory, University of Illinois, reprinted with additions inInternational Journal of Systems Research and Information Science, Vol. 6, 1993, pp. 5967.

Box, M.J. (1965, “A new method of constrained optimization and a comparison with other methods”, Computer Journal, Vol. 8, pp. 4252.

Rosenbrock, H.H. (1960, “An automatic method for finding the greatest or least value of a function”, Computer Journal, Vol. 3, pp. 17584.

Selfridge, O.G. (1959, “Pandemonium: a paradigm for learning”, Mechanisation of Thought Processes, HMSO, London, pp. 51131.

Related articles