Friday, February 03, 2017

Adaptive Boosting

Machine learning studies the design of automatic methods for making predictions about the future based on past experiences. In the context of classification problems, machine-learning methods attempt to learn to predict the correct grouping of unseen examples. In the mid-1990s, Freund and Schapire introduced the meta-heuristic, Adaptive Boosting (AdaBoost), “…an approach to machine learning based on the idea of creating a highly accurate prediction rule by combining many relatively weak and inaccurate rules… Indeed, at its core, boosting solves hard machine-learning problems by forming a very smart committee of grossly incompetent but carefully selected members…” (Boosting: Foundations And Algorithms). As context for how boosting might work, the authors introduce the following toy problem in the opening paragraph to A Short Introduction To Boosting: “A horse-racing gambler, hoping to maximize his winnings, decides to create a computer program that will accurately predict the winner of a horse race based on the usual information...” As discovered by the early pioneers of expert systems in the 70s and 80s and as acknowledged by the authors, the biggest stumbling block to using experts is that many of them are unable to detail their decision process or even to rank order the importance of key variables. In light of this issue, Freund and Schapire point out that the beauty of boosting is that it builds on what experts can do not on what they cannot do, namely, given a specific scenario they are usually able to make a judgment in favor or against a particular outcome. For instance, we can ask an expert handicapper if a specific scenario - course and distance winner in last outing a week ago - would lead him to believe that it was more likely to win again or to finish out of the money. Note the phrase “more likely to” – this is a key strength of boosting - it asks for the balance of probabilities and not for the highly probable. The boosting phase combines many such simple, scenario-based rules into an overall weighted decision for an upcoming event. In its original specification, the defining quality of boosting was that it aggregated an incomplete set of “if-then” rules (decision stumps) that recursively address unexplored regions (areas for which previously chosen rules would give incorrect predictions) of existing data sets. The inherent strength of this approach is that it automatically leverages the key dimensions of Wisdom Of Crowds, namely, diversity, independence, decentralization, and aggregation For a worked example applied to NFL prediction, see James McCaffrey’s Classification And Prediction Using Adaptive Boosting. For the underlying theory of why wisdom of crowds works, see Scott Page’s excellent The Difference. And finally, Malacaria And Smeraldi explore the relationship between the AdaBoost weight update procedure and Kelly’s theory of betting and also establish a connection between AdaBoost and Information theory in On AdaBoost And Optimal Betting Strategies.