Signifigance Testing
I recently had someone from Portugal approach me with an Expert Advisor that he programmed. He felt that it was successful and wanted to get my opinion on its viability. I only know a small amount of information about the system, yet I was able to confidently reject it as unproven.
Statistics are the key. It provides a relatively straightforward toolbox to easily dismiss the simplest of systems. The measure in particular that I care about is called a student’s t-test.
Let’s first talk about when this test is appropriate. If you have fixed take profits and stop losses in place on 100% of all trades, then it is appropriate to use this test. The reason is that these values provide fixed limits on the outcome of trades. The distribution of your trades is likely to form a nice bell curve. In math terms, this is called a Gaussian distribution.
If your strategy uses dynamic exits based on market conditions, then this test is not appropriate. The distribution of forex, stock and futures prices do not follow a bell curve. The test depends on the assumption that the distribution under study follows a bell curve. If you make assumptions that don’t match what you’re measuring, then results are useless at best and dangerously misleading at worst.
If you’d like to go into the mathematics involved, then you can find numerous sources on the internet that explain t-tests. I also really enjoy the book Statistics by Freedman, Pisani and Purves.
Most trading systems do not follow bell curves, which makes the details of the t-test largely irrelevant. What’s useful about it is to show that you can generally feel better about the outcomes and predictions based on the number of results in the backtest.
Restrictions and Degrees of Freedoms
The goal of any test is to ensure that the results are accurate. The more frequently that you test a concept, the more confident that you feel about the outcome repeating itself consistently. The idea of predictability largely matches our intuitive expectations. If my co-worker shows up on time regularly, then I feel confident about him showing up on time tomorrow. If he shows up late regularly, then I know that he’s likely to show up late in the future.
The increase in experience increases the level of confidence. Eventually, the number gets so big that we feel very comfortable with the probabilities.
The Portuguese client presented a system based on moving averages with 3 filters, a stop loss and a take profit. Let’s assume that each filter only had one parameter. The number of restrictions for the buy trade is the moving average period (1), the one parameter for each of the three filters (3), the stop loss distance (1) and the take profit distance (1). This yields a total of 6 restrictions for buy trades. Assuming that sell trades use the same inputs, then we have a total of 12 restrictions.
We can’t begin counting our total number of trades (i.e., degrees of freedom) until the backtest shows at least 12 trades to account for our restrictions. It’s a good idea to not infer anything about a trading system until 30+ trades elapse. With the 12 restrictions in place, that sets the threshold for the minimum number of trades to reach a conclusion at 30 + 12 = 42.
Generally, it’s a good idea to see 300-400 trades before drawing conclusions about any system. The reason for this is that some events are very rare. They may only occur once every couple of hundred trials. Allowing the amount of information to approach this threshold allows the trader to more comfortably evaluate what hidden risks may be present.
The backtest that the Portuguese individual submitted only contained 27 trades. Knowing what we know about basic analysis, I comfortably decided that there is nowhere near enough information on the system to consider evaluating it.
A Word of Caution
An algorithm’s trading statistics are almost certain to change with time. Unless you have a mathematically sophisticated model for evaluating volatility, best practice demands that you evaluate the trading results in light of the type of volatility experienced. Making money in 2008 when the markets nose dived does not mean that you would have made money in 2010, which was much quieter in comparison. Any signal in 2008 that indicated a short would almost certainly show returns that include a handful of monster winners. If those monsters fail to show up because the volatility does not cooperate, then your expert advisor will more than likely flop.
Backtesting for efficiency
Your forex backtests are absolutely worthless if you do not test the statistical entry efficiency and exit efficiency of the strategy. Everyone that runs a backtest inevitably reports the dollars earned as the outcome. Other factors exist like the average win to loss, the profit factor and the Sharpe ratio, but they do not tell you anything useful until the final step of designing an automated trading system.
The correct approach to testing a strategy should focus on the question, “is my strategy a piece of garbage?” Most people try to prove themselves right. The real test is to not be able to prove yourself wrong. The only way to do that is through a statistical approach.
Entry and Exit Efficiency
Efficiency puts a hard number to what percentage of an available trading range that a strategy captures. The trading window starts on the bar where a trade entered the market. The window closes when the trade exits.
The total available window is the highest high minus the lowest low in the window. Calculating the entry and exit efficiency simply measures what percentage of that window that your strategy tends to capture. Take the average of all of the trades and you get the overall efficiency.
Entry efficiency formula
Formula for a long trade: (Highest high – entry price) ÷ (Highest high – lowest low)
Formula for a short trade: (entry price – lowest low) ÷ (Highest high – lowest low)
Exit efficiency formula
Formula for a long trade: (Exit price – lowest low) ÷ (Highest high – lowest low)
Formula for a short trade: (Highest high – exit price) ÷ (Highest high – lowest low)
Take an example where you buy a hypothetical currency at 150 and sell it at 170. The lowest low between the time of entry and exit was 140. The price then ran all the way up to 200 before settling back down to 170, which is where the exit took place.
The entry efficiency is (200-150) ÷ (200-140) = 50 ÷ 60 = 83%. Nearly anyone would agree that this makes for a great entry.
The exit efficiency is (170-140) ÷ (200-140) = 30 ÷ 60 = 50%. Most would agree that the exit would have ideally occurred sooner than it did.
Efficiencies do not change by instrument or time frame
One major problem that we encounter with forex backtests is the limited data set. This is especially true for those interested in testing long term strategies like those on the H4 or D1 charts. The wonderful thing about entry and exit efficiencies is that they do not vary from chart to chart or even period to period.
I like to jump down to M1 charts for efficiency testing. The data is nearly endless. I never have to worry about running out. The great thing is that I know when I shift back to the H4 chart, the efficiencies should not change more than ±5%.
If you see the efficiency vary too much, then you may not have enough trades to form a statistically significant group. My experience tells me that 75 trades usually gets very close to the actual efficiency. 100 trades or more is better. When I run tests on M1 charts, I often get several thousand trades over the course of a few months. Numbers that large can tell you with a great deal of confidence just how robust a strategy’s parameters truly are.
Usually, you can assume that any results that fall within 45-55% are the result of a random, stochastic process. When I see backtests that creep right up to those barriers like 54.9% or even 55.1%, the results inevitably tank back to around the 50% mark.
Random trade outcomes and dollar profits
I wish this section was about how to make money with a random efficiency. Alas, we must cover how randomness can result in unjustified eurphoria.
I’ve been interested in the concept of randomness for several years now. Mathematicians refer to it with the more opaque name of a “stochastic process”. Despite the non-sensical name, it’s just a fancy way of saying the study of randomness – how it changes, its distribution, how far it “walks”, etc.
Yesterday, I used the analogy of coin flips to describe how Martingale strategies are probabilistically doomed to failure. One interesting concept that I did not mention relates to Brownian motion. Even with a set of random outcomes, trades will go on a random walk away from the starting point.
Einstein gets the real credit for solving the math behind the concept, even though his name is not on the term. He demonstrated that the distance a random process will follow is the square root of the number of trials. If we decide to flip a coin 60 times, we know that 50% of the time should fall on heads and the other 30 on tails.
It actually turns out that we should expect a very slight bias in the number of either winners or losers, although we do not know which one. It’s random. The precise bias, whichever way it prefers to go, should equal √60, which works out ~7. The heads outcomes should typically range from 23-37, with the tails outcomes making up the difference.
Seven trades out of sixty strongly alters the percentages, even if we know that it’s really supposed to be 50%. If heads only came up 23 times out of 60, that’s 38%. The problem is not with the coin. It’s with the number of trials. As you do an increasing large number of trails, the random bias decreases in significance in terms of the percent accuracy. 50,000 trades, for example, should show a surplus of roughly 223 trades in favor of winning or losing. The accuracy range falls within 1% of 50% on either side, a dramatic improvement.
Risks of curve fitting
Curve fitting a random efficiency relates to the idea of Brownian motion. Let’s say that we use a strategy that I know will never show an entry or exit efficiency: the moving average crossover. I’ve gone through this strategy six ways from Sunday, almost exclusively at the behest of clients. It does not work as a fully automated strategy. There is no secret set of fast and slow periods that will unlock the hidden keys to profit.
Most traders, experienced or not, abuse the backtester by searching for a set of parameters that yield the most dollar profit. They curve fit their test to optimize for maximum profit. What really happens is that the traders optimize for amount of random drift that already occurred.
When I used the example of 50,000 trades creating a natural drift of 223, I cited it with the purpose of showing how little it reduces the error in the real percent accurracy. The other consequence for trading systems is that as the error percentage decreases, the natural bias in your outcomes increases. Blindly running the optimizer only selects the set of combination that yields a combination of two criteria:
- The drift that happened to work out in favor of that set of parameters
- The profit and loss that varies with those parameters. The dollar profit naturally changes because the two moving averages cross at different points
You need a tool like efficiency to guard against these types of random outcomes. It’s the only method that I know of that definitively states whether or not a strategy behaves in a random manner. I especially like the fact that it breaks those elements down into two of the three basic components of a trading strategy: the entry, the exit, and the position sizing.
Efficient strategies do not work all of the time
Position sizing marks the final obstacle to building your fully automated trading strategy. A set of rules that yields a statistically efficient entry that is paired with an efficient exit does not necessarily make money. The value of each trading setup can vary, too.
Each strategy contains different sets of winners and losers. Each winner and loser varies in its dollar value. Whatever money management approach that you take requires balancing the ratio of the winners and losers in a way that normalizes the outcome of each trade. You ideally want to eliminate the variation in dollar value. 20 pip trades should earn or lose you exactly as much as the 100 pip trades.
That seems counter-intuitive. Most traders want to win in proportion with the size of the opportunity. It’s better from a system perspective to entirely ignore the size of the opportunity and to make each trade worth the same amount. Betting more or less with each trade effectively normalizes the value of each trade.
Using a stop loss stands out as an obvious candidate to fix how much a trade is worth. The severe disadvantage is that it almost always negatively affects the exit efficiency. Whenever I can get away with it, I always recommend using a market based exit instead of an arbitrary stop loss. Traders usually scream at the top of their lungs when they hear me say this. I’m just speaking as a systems developer. The numbers are what they are.



