Monday, July 30, 2012

Multidimensional Scaling and Company Similarity

Background and idea

Often we are looking at a particular sector, and want to get a quick overview of a group of companies relative to one another. I thought I might apply Multidimensional Scaling (MDS) to various financial ratios and see if it gave us anything useful.

The premise is that companies in similar industries should all have a degree of sameness, so MDS might be useful to highlight the companies that stand out from the crowd, perhaps in some literal sense ...


I mostly use the data functions from quantmod to retrieve the financial statements from Google Finance. As always with free data, the quality is variable, but good enough for our purpose today. We need to do a bit of dancing to get the market price at the time the results were released, and this uses data from Yahoo Finance. It was a little bit more work to implement, but worth it so we can include P/E in the comparison.

I looked at two groups of companies, tech stocks and financials/banks.

For the tech stocks I used ROE, EPS, P/E, Operating Margin, Current Ratio, Gearing, Asset Turnover and Debt Ratio. For the financials, I used ROE, EPS, P/E, Gearing and Debt Ratio, mainly because the data available did not have the line items required to calculate the other ratios. 

The data from Google gives the last four periods, with the most recent coming first. It also gives Annual and Quarterly data and the charts below use the annual results. Annual Period 1 means the most recent results. Due to the scaling function, the actual scales on the graphs are not particularly meaningful, so I took them out. 


These are the charts for the most recent results (so end of year 2011). Overall, I am quite pleased with the results. We can see how most of the companies cluster together, while a few seem to be quite different. This shows at a glance the companies that might be worthy of further investigation. 

Tech Stocks



Code is up here MDS Company Similarity with R, it should hopefully be documented enough for others to mess around with. Any questions, comments or suggestions are very much appreciated as always.

As an aside, this is the first R program I wrote devoid of any for loops. I finally feel I am coming to grips with the language.


  1. In case it is of value: I receive this error (and I apologize, but I have not done any debugging ...)

    sapply(symbols, get_prices, env=finEnv)
    Error in as.Date.default(end_dates) :
    do not know how to convert 'end_dates' to class "Date"

    My Windows 7 install running in RStudio 0.96.304

    R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
    Copyright (C) 2012 The R Foundation for Statistical Computing
    ISBN 3-900051-07-0
    Platform: x86_64-pc-mingw32/x64 (64-bit)

    1. Hey thanks, I am not sure what is happening there, it is working for me on OS X with R 2.15.0

      This should reproduce the error (but it works for me)



      dates <- as.Date(colnames(AAPL.f[[1]][[2]]))
      didx <- index(AAPL)
      end_dates <- sapply(1:length(dates), function(x) last(didx[didx< dates[x]]))
      #[1] 15240 14876 14512 14148
      as.Date(end_dates) #error here
      #[1] "2011-09-23" "2010-09-24" "2009-09-25" "2008-09-26"
      #[1] "numeric"

      if it does give you an error, could you try and see if running this does any better:


      it comes from the zoo library which should be attached when quantmod loads.

      It will be a few days till I can get access to a windows machine, I will take another look then.

      Alternately you can skip using the compares on the P/E ratio, if you comment out the following lines

      1: the c(share_price, IS) on line 71 (also delete the comma on the line above it)
      2: the pe ratio entry on line 83
      3: the sapply line that generates the error on line 123

      P/E is pretty important though, so it's likely the chart plots will be quite different. Let me know how it goes

  2. So I took a look and could reproduce it, turns out yahoo does not have RBS data going back to 2008. I should probably handle that better in the code, I'll take a look at it over the weekend. Thanks for letting me know!

  3. Very interesting experiment! Would be more interesting if the dataset contains more companies, say 1000.