The amount of data that is available today has the ability to change how the world operates. Companies like Google are continually inventing creative new ways to collect and process the data that we submit across the web.
At the same time, creative quantitative traders are constantly looking for new ways to find an edge in today’s markets. It was only a matter of time before these two forces collided.
Damien Challet and Ahmed Bel Hadj Ayed published a paper in July of 2013 titled Predicting financial markets with Google Trends and not so random keywords. The goal of the paper was to evaluate the credibility of claims that search data available from Google Trends could be used to predict major market moves.
At the end of the paper, Challet and Bel Hadj Ayed conclude that groups of keywords, whether they were related to finance or not, did not offer any reasonable evidence that they could robustly predict market movements. However, there was evidence that specific keywords applied to specific assets could produce profitable trading strategies.
This is an incredibly exciting and very outside-the-box approach to trading that I found to be a fascinating read. I was particularly impressed with the amount of attention that Challet and Bel Hadj Ayed paid to all of the different biases that were present in both their backtesting results and the previous study that they were citing.
Here are some of the interesting biases that they addressed:
The authors referred to tool bias as “the most overlooked bias.” They explained that it describes the situation where one is using computer hardware or software that was not available at the time of the data being analyzed.
This type of bias is cited as the reason that many backtests look good before 2003, but fail to produce the same results after. The authors explain a bit further:
Finding predictability in old data with modern tools is indeed easier than it ought to be.
Challet and Bel Hadj Ayed spent a large portion of their biases section breaking down all of the different types of data biases that were affecting the backtesting of this strategy.
The most obvious data bias was the fact that Google Trends data only dates back to August of 2008. Any analysis before that time is completely speculative.
The authors also address the fact that many times data is adjusted after the fact. They use the example of GDP numbers being released and then adjusted later. The backtest will only perceive the final number, so it won’t account for the original and then the adjustment.
There are also cases of tweaks in the format of data over time. The authors explain that Google Trends adjusted the type of data that they use in 2012. Therefore, the data from before 2012 is structured differently from the data produced after 2012.
Challot and Bel Hadj Ayed also point out the impact that survivorship bias has on the backtest.
The keyword biases were the most interesting biases that the paper covered. The original paper that the authors were citing chose keywords that were intentionally biased towards the financial markets.
Challot and Bel Hadj Ayed took this a step further by creating three non-biased keyword groups to compare the financial keyword with. This led to interesting results based on random keywords like “bone cancer” and “Moon Patrol” compared to specificaly selected keywords like “debt.”
The authors continued by identifying a number of other biases that were present, including coding errors, future keywords, data snooping, and a lack of transaction fees.
However, despite all of these biases that are affecting the backtesting, Challet and Bel Hadj Ayed still found that a profitable strategy could be built on the idea of using Google Trends data as a trading signal.