Sunday, May 20, 2012

Another cut at market randomness

I have some background in computer security and one day found myself tasked with assessing the quality of randomness for session id tokens generated by popular web frameworks (namely Java and .NET). As it turns out, NIST have developed a series of tests for just this purpose detailed here.

As a non-believer in the absolute randomness of markets, I thought I might take a look at the series of returns from SPX/GSPC to see how they held up.

About the tests

The NIST tests are designed to be run on bit streams from a random number generator, so some liberties had to be taken. I am mainly interested in the predictability of up days vs down days, so I first generated a series of returns using quantmod then encoded them using a 1 for an up day and -1 for a down day and run the tests on the encoded stream. The NIST tests use -1 in place of a 0.

The main test is called the monobit test, to quote from the document:
The focus of the test is the proportion of zeroes and ones for the entire sequence.  The purpose of this test is to determine whether the number of ones and zeros in a sequence are approximately the same as would be expected for a truly random sequence.  The test assesses the closeness of the fraction of ones to ½, that is, the number of ones and zeroes in a sequence should be about the same.  
The tests are done using a significance level of 0.01, so come with a good degree in confidence assuming the underlying method is sound. 

One caveat is the length of the runs compared and how it relates to the distributions used to model the results. For the monobit test, the suggested input size is 100, requiring 101 days of data to determine 100 up or down days. If we were looking at 32 bit integers, 100 bits would only be 3 "full" random numbers, so arguably we would want to look at shorter time periods (e.g. 3-5 days of data). Given the difficulties around distributions which require a large n, I thought I would vary the significance level instead of a lower n, as our requirements are not as stringent as those for cryptographic random numbers.   


At a basic level, this series does appear to be random, at least the vast majority of the time with n = 100 and alpha = 0.01. My confirmation bias was very upset with this. 

However, if we plot the proportion of runs deemed random vs the significance level, we see the proportion rising as one might expect. One thing that remains unexplained is why this appears to rise in steps rather than something more linear, though I expect this to be a side affect of either methodology or the normalisation done by the tests. I also took a look at the weekly data, which tends to a greater proportion of non-random runs quicker than daily data.

I am interested in the applications of machine learning to financial markets. Close to close returns that we have been looking at here are not the only information we have available, nor are they what I trade on a personal level. Also this is only one price series, and one could argue that in practise it is not actually a tradable series. 

Close to close returns are very useful in lots of applications, but if we are trying to build some predictive model we might need to look for more predictable pastures. Machine learning algorithms are great, but can only do so much. Finding some better potential inputs is what I will take a look at next.

The test code is available on github here: R Monobit test. Would be very interested to hear if anyone else takes a look.

I also took a visual and binomial look at randomness of the series in this post: A visual look at market randomness.

Oh and in case you were wondering, the web session tokens all turned out to be very strong.

A visual look at market randomness

I recently did some statistical testing to see if markets were random (details in the post Another cut at market randomness). It turns out they were, at least close to close returns for SPX/GSPC. My confirmation bias wasn't going to stand for that, so I thought about taking a different look.

Two things interested me. Firstly, I am looking at up vs down (i.e a higher or lower close than previous), rather than trying to predict an exact price. If markets are random then up or down have a probability of 0.5 each and are independent. A run of 5 consecutive ups or downs has a probability of 0.03125 or roughly 3%. How would that pan out looking at historical data?

Secondly, how could one visualise seemingly random data without it ending up looking like noise?

I came up with the following chart:

Each square represents one week and each line represents one year. If the close was higher than the previous week, it is blue, otherwise it is red. As the count of successive higher or lower weeks rise, the boxes get deeper in colour, up to a maximum of 5. As a side effect of date calculations and the definition of "week" some years have 53 weeks, which is why some lines are longer than others.

In total there were 138 runs of 5 weeks in the same direction out of 2208 samples, or around 6%, roughly double what we might expect.

Looking at that, I wondered what it would look like comparing weeks across years, comparing week 1 of year n with week 1 of year n + 1. That lead to the second chart:

This time we had 175 runs of length 5 out of 2208, just under 8%, again quite a bit more than the 3% we were expecting.  

That is all well and good, but these charts only represent the direction of the week to week moves, not the magnitude of the moves which is probably more important. Finally I took a look at the return over 5 periods. 

Again if it is positive the squares are blue, negative they are red. The colours are scaled as a proportion of the largest positive and negative returns for blue and red squares respectively. The very pale squares are where the returns were proportionally so close to zero they would not otherwise be visible, so I set a minimum level to ensure they displayed.

We can see that positive returns tend to follow positive returns and vice versa, at least for this 5 week look back period. This is somewhat deceptive as a negative return, though negative, may still be higher than the previous one implying a loss. 

What does all this mean? Not too much in practise, as it is another thing to know in advance if a series will have consecutive up days or down days. In this case a tradable edge is not so easily won.

However, it does reflect my understanding of how prices move a little better, in that they trend for a while then range for a while and vice versa, and things may not be as random as we might expect. My confirmation bias somewhat sated. 

The charts were done in Processing using the free weekly data from Yahoo! finance for GSPC. If you would like a chart for a given ticker, let me know.