Boosting: Foundations and Algorithms

Kybernetes

ISSN: 0368-492X

Article publication date: 4 January 2013

2344

Citation

Schapire, R.E. and Freund, Y. (2013), "Boosting: Foundations and Algorithms", Kybernetes, Vol. 42 No. 1, pp. 164-166. https://doi.org/10.1108/03684921311295547

Publisher

:

Emerald Group Publishing Limited

Copyright © 2013, Emerald Group Publishing Limited


The term “boosting” denotes a powerful means of facilitating machine learning that was invented by the book's authors 20 years ago and intensively developed since. Despite this, the term, with this connotation, is not as widely known as it should be and in fact was not known to the present reviewer prior to receiving the review copy. A glance at the extensive bibliography, of 238 items, shows such ignorance to be reprehensible (and I apologise!) since an enormous number of significant papers have been published with the term in their titles, many of them in the journals Machine Learning and Journal of Machine Learning Research, and others in conference proceedings and in journals concerned with statistical theory. A short introduction by the book's authors is available online[1].

As explained in the introductory chapter of the book, the term “boosting” refers to a general and provably effective method of producing a very accurate prediction rule by combining rough and moderately inaccurate “rules of thumb”. The result is achieved by making successive scans of a training set of data, and forcing the learning program to focus on the examples that have proved to be “hardest” in the sense of being misclassified in previous scans. This allows weighted combination of the decisions from the weak “rules of thumb” to produce a result having greater accuracy.

The idea is embodied in a fairly simple algorithm called AdaBoost (where the first syllable is an abbreviation of “adaptive” and not meant to refer to Lady Lovelace or the computing language named in her honour). The working of the algorithm is considered in detail, using a variety of approaches, for example with attention to the possibility of overfitting to the training set, a danger with any scheme of this sort. One of the possible enhancements is the evaluation of measures of “confidence” for the contributing weak classifiers and attention to these allows the discrimination rule to be improved even after the training set comes to be correctly classified. There is advantage in allowing the formation of decision trees, following the example of Quinlan[2].

As an illustration of use of the method, a scheme to detect spam in received e‐mails is considered. Reference is also made to identification of letters of the alphabet from handwritten copies, and to face recognition and medical diagnosis. Applications to game‐playing are also considered, with reference to penny‐matching and stone‐scissors‐paper, both simple games but allowing complex analyses of strategy. It is acknowledged that the illustrations using AdaBoost have the limitation of depending on binary or yes/no responses from the environment or “teacher” and this is remedied in a later algorithm called RankBoost that deals with rankings. An application of this is to directing incoming queries received by a large organisation towards what are judged to be appropriate departments, depending on words in the query. The parsing of natural‐language sentences is considered, with comments on language ambiguity.

The book deals with its topic with great thoroughness and with attention to matters of efficiency and accuracy and risk. It is essentially mathematical, with a sprinkling of equations on most pages, though no highly advanced mathematical knowledge is required. It is prepared in a textbook fashion with exercises for the student at the end of each chapter and there is clearly the intention that it should be the basis of taught courses. A student who had worked through all of it would be well equipped to make fresh applications as well as perhaps extending the theory. The moderate selling price, for a book of the size and quality, suggests that the publisher expects substantial sales. The authors acknowledge that the treatment is not all based on their own work and that some chapters stem entirely from work of others, always with acknowledgement in a bibliographic note. It means that this is an extremely valuable compilation of the “state of the art” in this area.

Impressive though this is, there is something strange about the sharpness with which the limits of the area of interest seem to be drawn. Even within the field of machine learning attention is restricted to the “with a teacher” situation where training depends on availability of a “correct answer” or a ranking of possible answers. There are forms of learning, such as skill acquisition through practice and the “genetic learning” implicit in evolution that have a different character. Obviously one book cannot cover everything but the existence of other aspects might usefully have been acknowledged.

Although the bibliography of 238 items is impressive there are surprising omissions. What has been termed GOFAI or “good old‐fashioned AI” hardly appears, nor does the enormous literature of two or three decades back on Expert Systems, even though medical diagnosis is discussed in the text. Neural nets feature only slightly and the word “fuzzy” does not appear. What is perhaps most surprising of all is that there is no reference to learning other than machine learning.

To some extent these gaps in coverage can be defended by again observing that one book cannot cover everything, but it is easy to get the feeling that there is an attempt to make a new beginning with a new and restricted practical emphasis in AI studies. Whether or not this is so, this impressive book will certainly remain a standard work of reference in the area it covers, for a long time to come.

It is in fact easy to feel that an important part of the book's value will result from extending the region of interest beyond the limits that the authors seem to have set themselves. Within GOFAI the pioneering work of Samuel (1963) allows a messy game situation to be transformed into a learning task to which AdaBoost would seem to be ideally suited and it would be surprising if no improvement accrued from its use. The identification of difficult examples for weighting by AdaBoost has some similarity to the identification of “turbulent” situations in the lookahead explorations of chess‐playing programs and the two might be considered jointly. It also seems rather likely that principles of boosting could play a part in the backpropagation schemes that are central to most applications of artificial neural nets.

It is surprising that no mention is made of biological learning, especially since invention of the boosting principle must have involved introspection. The book will provide much food for thought for experimental psychologists, and ideas for research projects. It is even possible that the restriction to “with a teacher” situations could be overcome and that boosting could prove to be relevant to the study of evolution and skill acquisition. Criteria for identifying a difficult decision point, where attention has to be boosted, can be on grounds other than accuracy of classification.

However, this is rather wild speculation. The essential message is that this is an important and valuable book, perhaps a good deal more thought‐provoking than its authors realised.

Alex M. Andrew

Reading University (retired ), Reading, UK

Notes

www.yorku.ca/gisweb/eats4400/boost.pdf (accessed 24 October 2012).

http://en.wikipedia.org/wiki/Ross_Quinlan (accessed 24 October 2012).

Further Reading

Samuel, A.L. (1963), “Some studies in machine learning using the game of checkers”, in Feigenbaum, E.A. and Feldman, J. (Eds), Computers and Thought, McGraw‐Hill, New York, NY, pp. 71105.

Related articles