Often I have some idea for a trading system
that is of the form “does some particular aspect of the last n periods of data
have any predictive use for subsequent periods.”

I generally like to work with nice units of
time, such as 4 weeks or 6 months, rather than 30 or 126 days. It probably
doesn’t make a meaningful difference in most cases, but it’s nice to have the
option.

At first this seemed like something
rollapply() from the zoo package could help with, but there are a number of
preconditions that need to be met and frankly I find them to be a bit of a
pain.

In a nutshell I have not been able to find
a nice way for it to apply a function to a rolling subset of time series data
nicely aligned to weekly or monthly boundaries.

All is not lost, there is a neat function
in xts called endpoints(), which takes a time series and a period (e.g.
“weeks”, “months”) and returns the indexes into that time series for the
corresponding periods.

Using this information it becomes easy to
subset the time set data using normal row subset operations.

The xts package also has period.apply but
it runs on non-overlapping intervals, which is close but still not quite what I
want.

In the script for this post there are 4 or
so functions of note.

The main one is roll_model, which takes a
set of data to be subsetted and passed to the model, the size of per model
training and test sets and the period size to split things up, which is
anything valid for use with endpoints().

A utility function is train_test_split
which also uses endpoints() to split a subset of data into 2 sets, one for
training the model, one for testing. In practice it needs to be the same period
type as you expect to use with roll_model.

The function that actually builds the model
and returns some results is run_model(), which calls train_test_split to get
the training and test set, builds a model using ksvm in this example, and sees
how it goes based on the test set.

Another utility function is called before
that, data_prep which builds the main data object to be passed to roll_model.
In this example it takes a set of log closes to close returns, sets Y to be the
return at time t, X1 the log return at t-1, X2 at t-2 and so on.

The example model is not a particularly useful way of
looking at things, which is not surprising given close to close returns are
effectively random noise. But perhaps the script itself is useful for other ideas, and if anyone knows
better/easier/alternate ways of doing the same thing I would love to hear about
them.

The script is available here.