Reinforcement Learning

Kybernetes

ISSN: 0368-492X

Article publication date: 1 July 1999

483

Keywords

Citation

Andrew, A.M. (1999), "Reinforcement Learning", Kybernetes, Vol. 28 No. 5, pp. 70-72. https://doi.org/10.1108/k.1999.28.5.70.1

Publisher

:

Emerald Group Publishing Limited


This is an exact reprint of the special issue, with the pages carrying double numbering, starting from one in the lower outer corners as appropriate for the book, and from 225 in the upper corners to allow reference to the journal. There is a short introduction by the editor, followed by seven papers, all dealing with important aspects of the current “hot topic” of reinforcement learning.

In the introduction it is stated that there is as yet no good tutorial textbook on reinforcement learning. This, of course, is written before the appearance of the very fine book by Sutton and Barto (1998) recently reviewed in these pages. Even so, the present book is well worth perusing as it treats a number of important topics that are either omitted or not fully developed in the later text. Its editor has seen it filling the role of an introductory text, and in his introduction he quotes a number of other papers, several of them having Andrew Barto as one of the authors, that should be read to supplement the present volume to get a comprehensive overview.

Rather surprisingly, the first of the seven papers treats reinforcement learning in the case where only an immediate return need be considered. It is on “Simple statistical gradient‐following algorithms for connectionist reinforcement learning”. The algorithms derived thus serve the same purpose as is served by the widely‐used backpropagation method, and the author acknowledges compatibility between the two. However, the new algorithms operate without the knowledge of network connectivity and of feed‐forward transformations that is assumed in backpropagation. The network elements are seen as totally stochastic, with only parameters of the output distribution of each element controlled as a function of its inputs. A comprehensive mathematical analysis is given, though without the convergence proof that the author would clearly have liked to provide. The type of algorithm is indicated by the acronym REINFORCE, formed by taking letters from the equation: “REward Increment = Nonnegative Factor times Offset Reinforcement times Characteristic Eligibility”.

The second paper focuses on delayed reward and gives a very thorough discussion of the difficulties that may be encountered in applying reinforcement learning in this case. The application of the Temporal Difference method in a program to learn to play backgammon is described in detail. The program has performed extremely well, and after developing its playing strategy by self‐play, starting with no information except the rules of the game, it performed much better than existing commercial programs including one based on an elaborate analysis of human play.

Two other papers contribute to the basic theory by offering, in each case for the first time, convergence proofs for reinforcement learning methods applicable with delayed reward. One confers this accolade on the variant termed Q‐learning. The other confers it on the Temporal Difference method, for which a proof had been given earlier by Sutton in the TD(0) case, i.e. where the time horizon is just one step. A paper in this volume extends the result to the general TD($\lambda$) case.

Two other papers acknowledge that the established methods of reinforcement learning with delayed reward, while effective, can be slow to converge in complicated tasks. In a paper by Lin a number of composite techniques are discussed, including the combination of the methods with teaching‐by‐example and re‐using past experience.

The other paper, by Singh, gives another approach based on the decomposition of long sequential tasks into smaller units. As he points out, everyday activity of a person involves repetition of many sub‐tasks such as lifting an object, opening a door, sitting down, walking, and so on. The sub‐tasks have features in common and in Singh′s method the learning process is merged usefully across them. His illustrative examples refer to robot navigation on a grid, and so the component tasks are less varied than the example of everyday activities suggests, but a general method is developed. Both of these papers receive favourable comment in the introduction, where that of Lin is said to be one of the largest systematic comparisons of machine learning methods, and that of Singh is said to open up important new directions in which to extend reinforcement learning methods.

In the final paper in the book, reinforcement learning methods are applied in a novel way to robot path finding, in what is termed a non‐maze‐like environment. The environment is a square area, unobstructed except for a number of circular obstacles, and the paths are traversed by a point robot. The system is able to form situation‐action rules that allow the choice of efficient paths. Besides being interesting for its potential application to robotics, this study breaks new ground by being the first that the editor knows of that combines reinforcement learning with continuous actions.

There is a great deal in this book that continues to be strongly relevant even though the introductory text of Sutton and Barto (1998) has appeared in the meantime. For one thing, convergence proofs are not given in the later text although their existence is mentioned. These authors attach quite a lot of importance to convergence proofs, but it is easy to feel sceptical about their significance when remembering that the “simple perceptron” principle was sometimes considered to be vindicated because its convergence had been proved in numerous different ways. The significance of such proofs depends strongly on the assumptions made, and it is necessary to have the proofs in full to judge it.

In detailing the convergence proofs, and in many other ways, it is clear that this book contains many pointers to future developments of the subject area, as well as first steps towards achieving them.

Reference

<BB>Sutton, R.S. and Barto, A.G. (1998, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA (reviewed in Kybernetes, Vol. 27 No. 9, 1998, pp. 093‐6).

Related articles