## Friday, December 26, 2014

### Fitting a mixture of independent Poisson distributions

This is an example from Zucchini & MacDonald’s book on Hidden Markov Models for Time Series (exercise 1.3).

The data is annual counts of earthquakes of magnitude 7 or greater, which exhibits both overdispersion for a Poisson (where the mean should equal the variance) as well as serial dependence.

The aim is to fit a mixture of m independent Poisson distributions to this data, using the non-linear minimizer nlm to minimize the negative log-likelihood of the Poisson mixture.

Sounds easy right? I have not done much optimisation stuff so it took me longer than it probably should have. There’s little better for the ego then getting stuck on things by page 10.

### Constrained to unconstrained

There are two sets of parameters, the lambda parameters to the Poisson distributions and the delta mixing distribution, the latter giving the proportions of each distribution to mix.

The Poisson parameters must be greater than zero and there are m of them. The mixing distribution values must all sum to one, and the first mixing value is implied by the m-1 subsequent values.

These are the constraints (lambdas greater than zero, deltas sum to one), and there are 2m – 1 values (m lambdas, m-1 deltas). We need to transform these to unconstrained values for use with nlm (becoming eta and tau respectively).

The formulas for transformation are reproduced from the book here

These transformed values are combined in to one parameter vector and passed to nlm. We also pass a function that calculates the negative log likelihood.

In the function, we first convert the parameters from the unconstrained “working” form to the constrained “natural” form using the inverse transform, and then use these values to do the likelihood calculation.

You can see the code here.

### Outro

Now that I look over the code and write this out it does seem all fairly straightforward and relatively trivial, but it was new to me and took a while, so I figured other people might find the example useful as well.

The code has three examples for the cases m = 2, 3, and 4. Obviously feel free to mess around with the initial parameter estimates to see what comes up.

It does seem to match the fitted models from the book, which I have copied here:

For all the details, check out the book, chapter 1!

## Tuesday, December 23, 2014

### Does Trend Following Work?

Since then, he has tracked well-known trend following systems and reported monthly performance figures. These are things like moving average crossovers, Bollinger band breakouts and stuff like that.

The systems had a very good month in November, and a very good year, up ~45% YTD.

I thought I would take a look and how things have gone over the time it has been tracked. Helpfully, Jez posts annual summaries for each system and a composite average, which is what I used. You can find the links to specific posts in the script linked under the table below. N.B the first two years are the average of actual trend following funds not the generic strategies.

There are results for 6 years, from 2009 to 2014, using the YTD November figure for 2014. We end up with a series like this, starting from 100.

Source [R]

We can see that 4 of the 6 years ended under water. Over the same time there has been a huge surge in US equities. A simple buy and hold in 2009 would have more than doubled equity.

### The pain of losing

I have been following Jez posting results every month for 5 – 6 years now. When I think back to where I was in 2009 versus where I am today, five years is a really, really long time.

You can get historical data from 1920s and even further back in time. If you run a backtest over 80 – 90 years of data, a 3 – 5 year period of underperformance is barely noticeable on an equity curve.

Sometimes I see people get excited about their backtests even though they include these periods of poor performance. There is plenty of research and evidence that humans in general find it very hard to stick with a losing system.

How seriously would you take someone touting a system that is underwater after five years, but they assured you a 45% year was just around the corner?

### Professional CTA

I believe the Au.Tra.Sy data is based on commodities and futures trading, so thought I would take a look at the Barclay CTA Index as a comparison. The results seem somewhat similar, it has struggled somewhat since 2010, though both the upside and downside returns are smaller.

In the 29 years from 1980 to 2008 inclusive, there were only 3 down years in total, none of them consecutive.

In the period we are examining here, the 6 years from 2009 – 2014, four of them have been down years, including three down years in a row (2011 - 2013).

### Proselytizing

In some ways, you can think of mechanical trend following as being long tail risk. Effectively it wants a big sustained move in either direction. If you think about the distribution of returns, the big moves that are “pay days” are out in the tails.

I don’t follow commodities vol at all, but apparently it has been low for several years, according to this article from September, 2014. It probably shouldn’t be a surprise that strategies dependent on big moves have not performed well in an environment of low volatility.

I personally dislike these indicator-based trend following systems, even though I think they give entries as good as any. The problem is they are too slow to close positions and give back too much profit.

They only really make serious money when a megatrend eventuates, and these are relatively rare and IMO getting rarer. In the mean time they can get chopped around and experience significant drawdowns.

I don’t want to draw any final conclusions about classical trend following, but I know I would have a lot of trouble sticking to systems like these in practise, even though a long term backtest might look really nice.

However, if you understand the conditions under which your system does well or is likely to underperform, it is a lot easier to stick with it during periods of underperformance, which are inevitable over the long run.

It is worth spending the time thinking about the conditions in which your system does well and does poorly.

## Tuesday, December 16, 2014

### Book Review: R Object-oriented Programming

Packt sent me a copy of R Object-oriented Programming by Kelly Black to review so here we go:

The aim of this book is to provide a basic understanding of R and programming in R. It is suggested that readers are already familiar programming, but the book does seem to start from square one and I feel is probably accessible to most people using R.

The contents are roughly split 50/50 between general R stuff like working with data and the various data types R has and then OO. The first half provides good coverage of the basics, and I thought it was a good introduction to R. It includes a chapter on working with time variables which I think would be very useful to people new to R and working with time series data.

The second half is about objected orientated programming. S3 and S4 classes are covered, and some decent examples are provided that go beyond the usual trivial hello world or Shape, Circle, Square type examples, which is nice to see.

Strangely though, no mention is made of reference classes. For a book titled R Object-oriented Programming, making no mention about one of the ways to do OO in R seems like a rather large oversight.

Another nitpick I have is the indent style and formatting. It is mostly Whitesmiths style (which I personally dislike), but beyond that it is used inconsistently throughout the examples. This is generally considered a rather bad practice in programming circles. Pick one style and stick with it.

It is challenging to present source code and text together, but some of it is laid out very poorly, with lines wrapping in strange places. It doesn’t really affect the content because most people will likely download the code to experiment with it, but it does detract from what is otherwise a good book.

Arguing about whitespace may seem petty, but the reality is most programmers spend most of their day looking at source code, so it is important. There's plenty of projects that wont even look at your code unless it is properly formatted. Again its not the choice of style, its the inconsistent use and poor formatting that bothers me. There's a lot of terrible R code out there and a book about programming should do better.

Despite these issues I think it gives good coverage of relevant topics for those starting to get more into the programming side of R. If you are starting out in statistics and looking for a book on R to get going with programming (versus say how to do linear regression etc) it's a useful one.

You can see some other reviews herehere, and on Amazon.

## Sunday, September 21, 2014

### HMM example with depmixS4

On a scale of one to straight up voodoo, Hidden Markov Models (HMMs) are definitely up there for me.

They have all sorts of applications, and as the name suggests, they can be very useful when you wish to use a Markovian approach to represent some stochastic process.

In loose terms this just means we wish to represent our process as some set of states and probabilistic transitions between them.

For example if I am in state 1, there may be a 85% chance of staying in state 1, and a 15% chance of moving to state 2.

To complete this simple two state model, we would also have to define the transitions for state 2, namely what is the probability we will stay in state 2 if we are already in state 2, and what is the probability we will transition from state 2 to state 1.

But what if we don’t know this information? What if we just have a set of realizations from some process and a vague idea of the number of states that might be behind it?

Enter HMMs. Among other things, they can be used to determine the state transition probabilities that underlie some process, as well as the distributional parameters for each state.

That is, it can work out all the details that are "hidden" from an observer of a process. All we really need to do is specify how many states we think there are.

### Example

To test this out, and to get more familiar with the depmixS4 package, I made a small test program. It creates an observation series with:

a) a known number of states
b) known state transition matrices
c) known underlying distribution parameters

The aim is to use depmixS4 to estimate all this information, which should help get a grip on how to use the package, and also let us see if HMMs are actually any good.

We have two states, lets call them S1 and S2. They were setup with the following properties

Both states provide realizations from the normal distribution, in this case S1 has mean -1, standard deviation 1, and S2 has mean 1, standard deviation 1. We can also see the transition matrix between S1 and S2 in the last two columns.

A trajectory was sampled, and depmixS4 used to fit a HMM. It provided the following estimates

You can see it did a pretty good job of recovering the true values.

In the following plot, you can see the sample trajectory, the estimated state from the HMM and the actual state used to generate the sample. This example worked very well, it’s not always the case that things turn out so nicely.

Voodoo I tell you!

### Outro

There is no guarantee that a fitted HMM will be of any use, and even with this simple example, the state estimates can be wildly inaccurate. You can try this for yourself by using different seed values.

This example is based on one from the book Hidden Markov Models and Dynamical Systems, which I found to be an excellent resource on the topic. It is clearly written, covers the basic theory and some actual applications, along with some very illustrative examples. Source code is provided in python.

I've read (or at least tried to read) pretty much every book on HMMs I could find, and found this one to be the most useful if you are new to HMMs and are interested in applications.

You may also find these other posts about HMMs useful as well:

Fun With R and HMMs
Getting Started with Hidden Markov Models in R
Hidden Markov Models Examples in R, part 3/4

There is also the classic paper by Rabiner.

Hopefully this has shed a bit of light on HMMs and the depmixS4 package. Code is up here.

## Monday, July 14, 2014

### Bayesian Naive Bayes for Classification with the Dirichlet Distribution

I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Naïve Bayes” using the Dirichlet distribution.

Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the book, so took it upon myself to port it to R as a learning exercise.

I was not immediately familiar with the Dirichlet distribution, but in this case it appeals to the intuitive counting approach to discrete event probabilities.

In a nutshell we use the training data to learn the posterior distribution, which turns out to be counts of how often a given event occurs, grouped by class, feature and feature state.

Prediction is a case of counting events in the test vector. The more this count differs from the per-class trained counts, the lower the probability the current candidate class is a match.

Anyway, there are three files. The first is a straightforward port of Barber’s code, but this wasn’t very R-like, and in particular only seemed to handle input features with the same number of states.

I developed my own version that expects everything to be represented as factors. It is all a bit rough and ready but appears to work and there is a test/example script up here. As a bigger test I ran it on a sample  car evaluation data set from here, the confusion matrix is as follows:

testY   acc good unacc vgood
acc    83    3    29     0
good   16    5     0     0
unacc  17    0   346     0
vgood  13    0     0     6

That’s it for now. Comments/feedback appreciated. You can find me on twitter here