Sunday, November 24, 2013

Just for fun: attractors in R

I have a borderline unhealthy obsession with attractors. I thought I got it out of my system, but here we are. For whatever reason, I felt like making some in R.

You can find the R code here. It uses the attractor function to define density in a matrix, which is how often a given point gets hit. Then we plot the log scaled version of that using image().

Be warned, with a lot of points it can be quite slow. If you play around you might want to drop the n value to 10,000 or so.

Some pictures



If you like this sort of thing, I made a realtime interactive dejong attractor, using a kinect, openFrameworks and a lot of GLSL shaders. You can see a clip of it below, the source is also on my github if you are so inclined.

Attractor from dizzy pete on Vimeo.

You can find me on twitter and G+

Book Review: Applied Predictive Modeling by Max Kuhn and Kjell Johnson

This is a gem of a book.

From the introduction:

We intend this work to be a practitioner’s guide to the predictive modeling process and a place where one can come to learn about the approach and to gain intuition about the many commonly used and modern, powerful models.
…it was our goal to be as hands-on as possible, enabling the readers to reproduce the results within reasonable precision as well as being able to naturally extend the predictive modeling approach to their own data.

The book is structured into four main sections. First is General Strategies, which provides an introduction and discusses things like pre-processing and tuning. 

The next two sections cover regression and classification, each with chapters on linear and non-linear methods, as well as tree and rule based methods, with one to two chapters on practical issues such as measuring performance. 

The final section covers feature selection, predictor importance and a discussion around model performance. 


There are a few things I really like:

It is not an academic or mathematical treatise; the emphasis is on practice, discussing the issues that commonly arise and how they can be approached. Plenty of references are provided for those wanting to dig deeper.

Every example has its data set and code available so one can work through the examples as presented. In most cases they are real world datasets and there is great discussion of the real world issues that arise, what should be considered and the various tradeoffs that can be made.

Discussion and code are separate. Aside from the excellent content, this is probably what I appreciate the most. Each chapter presents its content, with charts where appropriate, while the actual walk through of the code and raw output is in a separate section of the chapter. 

This makes it much easier to focus on the material being presented. It is always difficult to present source code along with discussion. This is not a book about programming per-se, it is about using existing tools to make intelligent and reasoned decisions about the task at hand. It makes a lot of sense to have the code presented separately.

Also, as far as I have read, each chart is at most only one page away from the text discussing it. This is a small thing but I feel there has been serious consideration about the presentation of the material and it has been done very well. [Update: This is not strictly true but is mostly the case throughout the book]

It is not a book about caret, the package of author Max Kuhn. To be honest I would be pretty happy even if it were about caret, which certainly does get some use in the code, but it is relatively package agnostic. 

Final thoughts

This is a great book, providing both the trees and the forest so to speak. I am unaware of any other book with similar content, and I wish I had something like this when I was first getting interested in machine learning. 

There are books that are very introductory, books that cover the details of the algorithms, and books that provide rigorous coverage of the theory, but these are not really accessible to those without a serious amount of mathematics. There are a few equations presented where appropriate, but it is certainly not the focus of the book.  

There are no real shortcomings, though if there were ever a second edition, coverage of time series methods and deep learning would be welcome. I appreciate they are both book worthy topics by themselves, and the latter is still very much a moving target. 

In summary: Great content, well written and well presented. This book would be my top recommendation to anyone looking to get started or working with predictive modeling. Well worth checking out. 

You can read more reviews on amazon here: Applied Predictive Modeling

You can find me on twitter and G+