Monday, July 14, 2014

Bayesian Naive Bayes for Classification with the Dirichlet Distribution

I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Na├»ve Bayes” using the Dirichlet distribution. 

Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the book, so took it upon myself to port it to R as a learning exercise.

I was not immediately familiar with the Dirichlet distribution, but in this case it appeals to the intuitive counting approach to discrete event probabilities.

In a nutshell we use the training data to learn the posterior distribution, which turns out to be counts of how often a given event occurs, grouped by class, feature and feature state.

Prediction is a case of counting events in the test vector. The more this count differs from the per-class trained counts, the lower the probability the current candidate class is a match.

Anyway, there are three files. The first is a straightforward port of Barber’s code, but this wasn’t very R-like, and in particular only seemed to handle input features with the same number of states.

I developed my own version that expects everything to be represented as factors. It is all a bit rough and ready but appears to work and there is a test/example script up here. As a bigger test I ran it on a sample  car evaluation data set from here, the confusion matrix is as follows:

testY   acc good unacc vgood
  acc    83    3    29     0
  good   16    5     0     0
  unacc  17    0   346     0
  vgood  13    0     0     6

That’s it for now. Comments/feedback appreciated. You can find me on twitter here

Links to files:

Everything in one directory (with data) here

Sunday, June 22, 2014

Trading in a low vol world

I wanted to take a look at what works in low vol environments, such as we are currently experiencing. I am open to the idea we have entered a period of structurally low volatility due to increased regulatory burden and flow on effects from the decline of institutional FICC trading. Or it may just be a function of QE, and post-tapering we will see a return to higher levels.

The plan

The main idea is to compare mean reversion (MR) vs. follow through (FT). For simplicity I define mean reversion as an up day being followed by a down day, and a down day being followed by an up day. Conversely, follow through sees an up day followed by another up day, and a down day followed by a down day.

I took a look at the major US equity indices, SPX (GSPC), NDX and RUT.

For each series we calculate daily log returns for the current period and shift the forward to get the return for the next period. Then we calculate realized volatility (RV) and split the data set into "low volatility" and "high volatility", by looking at median realized vol for the whole series.

Then, for each series, we use bootstrapped samples to simulate a number of trajectories/equity curves for each strategy (MR/FT) under the two classes of RV. Finally we take an average of the total return of each trajectory to get a ballpark idea of how they went.


The data is from the start of 1999 to the present, so roughly 15 years. Each run generates 1000 trajectories with a sample size of roughly 950.

For the low vol case, the results are unfortunately ambiguous. Follow through in a low vol environment seemed to do well for NDX and RUT, but the opposite was the case for SPX.

The TR column is the sum of the series over the whole period for the volatility class (i.e. a simple long only strategy), giving an idea of a directional bias that may be present in the sampling.

In the high vol environment, mean reversion was a clear winner, and consistent over the different underlyings.

The results seem relatively stable across trajectory size/sample size.


I'm not really sure what is going on SPX. My intuition was that FT would do well in low vol environments, but that doesn't seem to be the case, at least not for SPX.

I was actually getting consistent votes for FT in the low vol case, then restarted R to run with a clean environment and started getting the above instead. You can't spell argh without R it seems.

Source is up here. As always you can find me on twitter here. Thanks for stopping by.

Saturday, May 31, 2014

Divergence on NDX

I generally take a dim view of old-timey technical indicators, perhaps they work for some people but I have found there are much better tool available. One exception is divergence, which in this case is when price makes a new high, but the MACD (or your favourite oscillator) does not.  

There is very nice looking divergence on NDX, and it also shows up in a weaker form on SPX and INDU. I never take it as a trade signal by itself, but it does make me look a little closer. I have marked off some previous occurrences as well. It does not give any indication about when a sell off may occur, or how much of a sell off will eventuate. Pretty useful isn't it?

Another form of divergence I take note of is the marked failure of RUT to make it back to its recent highs, which differs from NDX/SPX/INDU. 

You can also see in the charts above that volume has been declining, especially over the last 4-5 weeks. 

I do think we are in a long run bull market which still has a few more years to go. In the event of a shorter term sell off I would generally be looking to buy dips. 

A good sign of a bull market is shrugging off negative events. We've had some reasonably serious geopolitical happenings, the invasion in Ukraine, a coup in Thailand, and anti-Chinese riots in Vietnam that produced a number of fatalities. 

Struggling to think what a catalyst might be, perhaps some unpleasant surprise regarding QE tapering, or unconstrained collapse in the Chinese property market, both of which I think are pretty unlikely

There's a bunch of macro data out next week, and Apple is having its WWDC. Apple used to make up a very large amount of NDX, something like 24% of the index value was determined by AAPL prices. I know they rebalanced it and am not up to date with where it currently stands.

A quick look at FX realized vol

Much has been said about the decline in volatility. At the moment I am very active in FX spot trading and as a generalization do better the more vol there is.

 I wanted to see how things stood on the crosses I am most active in, namely EUR/USD, GBP/USD and USD/JPY.

 I took hourly data from FxPro (not my broker, nor an endorsement), calculated volatility as the high minus the low, and summed the total for each day. You can think of it as how many pips were on offer if one could correctly call the high and low of each hour of each day.

 All up there is about 90 days of data, so it covers roughly the last four months. I also took the average of the last five days which are the red X’s on the box plots. We are here.

As you can see, vol is below average. It was quiet week overall, bank holidays in the US and parts of Europe, and only a moderate amount of data coming out. Next week should be a bit busier I think.

Since I had all the data I also took at look at the average hourly RV per day.

I don’t want to read too much into this chart but things have been quiet. I read somewhere else fx vol is approaching levels of 2007 which was a very quiet time indeed.

Some R code is up here, data is here.

Saturday, May 17, 2014

RcppArmadillo cheatsheet

I have been using RcppArmadillo more and more frequently, so thought I would make a cheatsheet/cookbook type reference that translates common R operations into equivalent arma code.

I have put them up on a github wiki page here.

The functions are all pretty basic and not particularly robust. In particular they do not do any bounds or sanity checking.

You might also enjoy the arma documentation, in particular the matlab/octave syntax conversion example.

There is also an excellent book Seamless R and C++ Integration with Rcpp

Any corrections or additions are most welcome.

Sunday, May 11, 2014

Hedge Fund Managers on YouTube

Bill Ackman reads from the book of Buffet

Ray Dalio gives a run down of macroeconomics

Saturday, May 10, 2014

This has got to be some sign of a top

I'm watching Silicon Valley and can't help but think of real estate booms and the renovation tv shows start playing. At least this is parody.

You've got Seth Klarman saying things are overvalued, Einhorn short and now a TV show showing how the current boom is entrenched in the public consciousness.