Why do some trading systems underperform their backtesting results when used for live trading? Many times, its because they were tested across a small sample size, which created misleading results.
When it comes to backtesting your system, you want to make sure it can handle anything the markets throw at it. This means backtesting it over dozens of different markets and time periods.
The problem is that the human mind is designed to rationalize away concerns about small sample size. Our System 1 thinking is excellent at making up arguments that the small sample results are likely to continue. We naturally attempt to prove the current results true and will continue.
This phenomenon shows up in a number of different fields. There are countless examples where we can find people making decisions based on small sample sizes and their flawed results.
The Upton Start
Fantasy baseball is another passion of mine, and it is shocking how often I see trading principles applied in what most would believe to be an unrelated field.
One of the most exploitable inefficiencies in all of fantasy baseball is a general ignorance of small sample size. If a player gets off to a hot start in the first two weeks of the season, he will command a premium value. However, if the same player has the same two week hot streak in August, hardly anyone will even notice. This is because there are four months of other stats that likely even out the player’s performance for the year.
Atlanta Braves outfielder Justin Upton provided a great example of this inefficiency this spring. At the end of April, Upton had posted a batting average of .298 and had hit 11 home runs. At that point, he was widely considered the best player in fantasy baseball and was commanding incredible trade prices. In the almost three months since then, his average has crashed to .252 and he has only his five more home runs.
Clearly, it would have been unrealistic to expect Upton to continue that level of performance through an entire season. Despite that fact being so obvious, there were still a huge number of people who traded for him, paying prices that indicated that they believed he would be able to keep up his incredible performance. These people allowed their System 1 thinking to rationalize reasons why he could continue his incredible performance. They ignored the fact that the sample size was only one month out of a seven year professional career.
The Puig Call-up
This same phenomenon happened again in June when the Los Angeles Dodgers called up outfield prospect Yasiel Puig. Puig went on an incredible tear during the month of June, hitting .436 with seven home runs. With that kind of production, he was featured on SportsCenter almost every night and the hype surrounding him exploded.
The problem was that, while Puig may be a special talent, no one in baseball could possibly maintain a .436 average. (Not one single player has hit over .400 for an entire season since Ted Williams did it in 1941.) There was no possible way that Puig would be able to maintain that pace. However, there were a number of people who traded for him as if he could.
They chose to completely ignore the small sample size and allowed their System 1 thinking to rationalize reasons that he was worth that value. He has since gone on to hit .269 with only one home run so far in the month of July. While he is still a very valuable player, most of the people who acquired him at the end of June paid far too much.
Small Town Cancer
Daniel Kahneman further developed this concept in his book, Thinking, Fast and Slow. He uses the following example:
“A study of new diagnoses of kidney cancer in the 3,141 counties of the United States reveals a remarkable pattern. The counties in which the incidence of kidney cancer is lowest are mostly rural, sparsely populated, and located in traditionally Republican states in the Midwest, the South, and the West.”
He continues by describing how the human mind will automatically try to use the information provided to form a reason why these counties have the lowest incidences of cancer. Then he reveals that very similar counties were discovered to have the highest incidences of kidney cancer.
The results actually have nothing to do with the location or political affiliation of the counties. Those counties reported the highest and lowest incidences of kidney cancer because they are small sample sizes. Because of this, they are more likely to have skewed results. Just like Justin Upton and Yasiel Puig’s hot starts, there simply isn’t enough data to make a rational argument.
In order to trade successfully, whether in financial markets or fantasy baseball, we need to base our decisions on sufficient data. Results based on small sample sizes are far too likely to be misleading.