When we are
backtesting automated trading systems, accidental data snooping or look forward
errors are an easy mistake to make. The nature of the error in this context is
making our predictions using the data we are trying to predict. Typically, it
comes from a mistake with our calculations of time offsets somewhere.
However, it
can be a useful tool. If we give our system perfect forward knowledge:
1) We
establish an upper bound for performance.
2) We can get
a quick read if something is worth pursuing further, and
3) It can
help highlight other coding errors.
The first two
are pretty closely related. If our wonderful model is built using the values it
is trying to predict, and still performs no better than random guessing, it’s
probably not worth the effort trying to salvage it.
The flip side
is when it performs well, that will be as good as it will ever get.
There are two
main ways it can help identifying errors. Firstly, if our subsequent testing on
non-snooped data provides comparable performance, we probably have another look
ahead bug lurking somewhere.
Secondly, things
like having amazing accuracy yet still performing poorly is another sign of a
bug lurking somewhere.
Example
I wanted to
compare SVM models when trained with actual prices vs a series of log returns,
using the rolling model code I put up earlier. As a baseline, I also added in a
200 day simple moving average model.
(S) Indicates
snooped data
A few things
strike me about this.
For the SMA
system, peeking ahead by a day only provides a small increase in accuracy.
Given the longer-term nature of the 200 day SMA this is probably to be
expected.
For the SVM trained
systems, the results are somewhat contradictory.
For the look
forward models, training on price data had much lower accuracy than the log
returns, and the log return model performed much better. Note that both could
have achieved 100% accuracy by predicting its first column of training data.
However, when
not snooping, the models trained on closing prices did much better than those
trained on returns. I’m not 100% sure there isn’t still some bug lurking
somewhere, but hey if the code was off it would’ve shown up in the forward
tested results no?
Feel free to
take a look and have a play around with the code, which is up here.
Taking biases and using them to your advantage is a really interesting concept. Nice work.
ReplyDeleteThanks Andrew, I appreciate the comment!
DeleteGod blesss
ReplyDelete