Wednesday, May 29, 2013

Machine Learning for Hackers

I recently read "Machine Learning for Hackers" by Drew Conway and John Myles White.

I'd picked it up because I heard it was a good way to get familiar with the data mining capabilities of R. I also expected the case study based approach to be a good way to see how they approach a broad array of machine learning problems. In these respects I was reasonably well rewarded. You will find a bunch of R code scraps that can be reused with a little effort. Unfortunately the explanation of what the code does (and how) is often absent. In this sense the book is true to its name: you will learn some recipes for tackling certain problems, but you may not understand how the code works, let alone the technique being applied.

The one issue I found unforgivable is that in the instances where the authors talk about machine learning theory, or use its terms, they are often wrong. One example is the application of naive Bayes to spam classification. The scoring function they use is the commonly used likelihood times the prior, leaving off the evidence divisor.

As a method of scoring in Bayesian methods this is appropriate because it is proportional to calculating the full posterior probability, and much more efficient to compute. However, the resulting score is not a probability, yet the authors continuously refer to it as one. This may seem minor, but to me it undermined my confidence in their ability to communicate necessary details about the techniques they are applying.

Another example: in the section on distance metrics the authors state that multiplying a matrix by its transpose computes “the correlation between every pair of columns in the original matrix.” This is also wrong. What they want to say is that it produces a matrix of scores that indicate the correlation between the rows. It is an approximation because the score depends on the length of the columns and whether they have been normalised. These values would not be comparable between matrices. What would be comparable between matrices is a correlation coefficient, but this is not what is being computed.

I am not suggesting that a hacker's guide to machine learning should include a thorough theoretical treatment of the subject. I think only that where terms and theory are introduced they should be used correctly. By this criteria this book is a failure. However, for my purposes (grabbing some code snippets for doing analysis with R) it was moderately successful. My largest disappointment was that given the mistakes I noticed regarding the topics about which I have reasonable knowledge, I have no confidence in their explanation of those areas where I am ignorant.

No comments:

Post a Comment