Monday, May 11, 2015

Equity Ranking Backtest with Python/Pandas

I have been look at equities a bit of late, I am particularly interested in ranking a universe of equities for “low frequency” manual trading on a weekly or monthly basis.

Every period I would rank each name on a bunch of different factors, then invest in the highest ranked ones for that month.

I was initially working in R but the code grew unwieldy, and I wanted a second opinion on my approach so took the time to re implement it in python using Pandas.


For each symbol in our universe, we load the raw data and generate the information used for ranking. If we have 5 names, we end up with 5 dataframes.

Then we combine those dataframes into one big dataframe, and iterate through month by month, selecting the symbols that meet our ranking criteria. From those selected, we equally weight and sum the next period returns.

One thing that is really cool about the pandas dataframe is that it allows multiple rows with the same index.

This makes it easy to get the data for the month under consideration. We just pass the month to index function and get the subset of data for that month, e.g.

>>> df.ix['2015-02']
                 cpr       npr       avg   over  sym
2015-02-28  0.043302 -0.062449 -0.038914  False  DBC
2015-02-28 -0.025028  0.008524  0.006130   True  IEF
2015-02-28  0.056838 -0.014239  0.005434   True  VEU
2015-02-28 -0.037434  0.017171  0.015900   True  VNQ
2015-02-28  0.055832 -0.011697  0.009236   True  VTI

[5 rows x 5 columns]

In this example there are 5 symbols, and we see the ranking information for February 2015.

Another option would be to use hierarchical indexing, with a sub-index for each month, but this way worked for my needs and I think is quite clean and simple.

If anyone knows an equivalent in R that is as clean and easy to work with for multiple time series I would love to hear about it. 

Code Notes

The demo code does a simple back test of the GTAA/Relative Strength trend following system using ETFs.  

I have stripped it down to the basics so hopefully it is easy to understand. Load the data, generate the dataframe with the info we want, make a combined data frame, then go through month by month.

The ranking is done by filtering out names under their 10 month moving average, then selecting the top n based on average 3 month return.

The “cpr” column is the current period return, and the “npr” column is the next period return, which is the return realized if we select a given security for that month.

The data is just ETF data from Yahoo, which I have put up here. Code is here.

I found Python For Data Analysis a very useful book is when working with pandas.