Monday, February 29, 2016

Handicapping Twenty Questions Benford's Law And Shannon Entropy

Imagine a horse-race (i.e. Benford Law Stakes) with the following distribution of win probabilities.

This toy example is not as unrealistic as you might expect at first glance. Look at the very good approximation by Benford's Law of Starting Price position for roughly 20,000 GB flat races 2004-2013 inclusive.

Then, in simplest terms, the inherent uncertainty of the Benford Law Stakes race outcome is best represented by Shannon's Entropy: H(x) = -SUM((x)*log(x)) = 2.87, which number also suggests (under optimal conditions) the minimum number of yes/no questions (i.e. 3) the handicapper should ask himself to identify a potential winner. Taking our lead from Shannon-Fano Coding, we should iteratively divide the entrants into two approximately equal groups of win probabilities (i.e. 50%) and use Pairwise Comparison to eliminate the non-contenders using at most four questions.
Once again, this restriction is not as unrealistic as it might first appear. Slovic And Corrigan (1973), in a study of expert handicappers, found that with only five items of information the handicappers' confidence was well calibrated with their accuracy but that they became overconfident as additional information was received. This finding was confirmed in a follow-up study by Tsai et al (2008).