But can we quantify this effect, does it really exist? We can and it does, and it’s simple to show with less than 10 lines of python.

### Methods and madness

We create a two column data frame, one column with the monthly return, and the other a dummy variable that is 1 for our hold months (October – May) and 0 for our sell months (June – September).

Once we have created our dummy variable factor marking the events we wish to distinguish between, we do an OLS regression and look at the coefficient of our factor.

If it is “significant”, we conclude there is a material difference between a factor being present and when it is not.

If you are a commercial data scientist, you can use this same method to see if some key metric has actually changed after a marketing campaign or new release. This could be things like increasing user signups or revenue. Your dummy variable would be 0 before the campaign, and 1 afterwards.

If we can show our campaign worked, we can tell our boss how great we are and not to forget all our hard work come bonus time.

### Example

As an example, lets look at SPY from 1993 onwards. First we download the data from yahoo, and create a column of monthly returns. Then we code our dummy variable as described above and run the regression.

Looking at the pandas OLS output, we see the following summary:

#-----------------------Summary of Estimated Coefficients------------------------

# Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%

#--------------------------------------------------------------------------------

# x 0.0141 0.0054 2.61 0.0096 0.0035 0.0246

# intercept -0.0038 0.0044 -0.87 0.3869 -0.0124 0.0048

#---------------------------------End of Summary---------------------------------

Where x is our Halloween dummy variable with a p-value of 0.0096. Significant at any reasonable level. Take that EMH!

Looking at the data, the average monthly return is -0.28% for summer, and +1.11% for the Halloween period.

### End notes

For the Halloween effect, rather than looking at monthly returns, we should probably look at the differential of monthly return and risk free rate. There’s a fairly comprehensive paper with a good historical review available here.

Also, there is a great and freely available book on working with time series data available here, the examples are in R but should be pretty easy to follow along.

Finally, code is up: here