Tiny Epiphany: February 2011

Monday, February 14, 2011

"R Graph Cookbook" - a review

A publisher asked me to write a review for the book "R Graph Cookbook". Out of curiosity (and since they didn't mind if I wrote honestly), I complied. It ended up being a good decision, since this was a pretty easy reading, and there are things in base R graphics that I've missed since venturing into ggplot2-land.

The book is most appropriate to beginner (and early-intermediate) users of R and R graphics. It is very well structured, always with code for generating a graph preceding each graph, and explanations as to how everything works after the graph. The examples are relevant, and are quite easy to follow. While the author does not explicitly include an introduction to R, he explains the functions used in each code segment.

Earlier chapters go through the commonly-used charts, how to make them and and how to tweak certain parameters. The author goes very slowly over the different plots and gives many examples over different chapters. Later chapters cover recipes for more exotic plots (like 3d plots, maps, sparklines, and calendar maps), showing the true power of the R graphics, as well as the author's creativity. I don't think the author intended it to be read from the beginning to the end: the sections are very well structured, so it's pretty easy to flip to a random plot and look at how it's made.

One thing that I didn't like (which some people might like) is that most of the earlier chapters seemed like a puffed up version of the R documentations. While the advantage of having a book is that you get a more approachable and thorough overview of a subject, you can find information much more quickly by using a combination of google, online R graphics tutorials, and the R documentations. For example, if you know you want a bar plot, looking up the documentation for barplot() to see all tweak-able parameters is easier than waiting until chapter 5 to learn a few more of them.

The later chapters are more interesting: it really shows that R can do a lot, and it did give me ideas about how to visualize the data I am currently working with. Of course, the advanced materials are far from complete -- and it can't possibly be complete given the open nature of R: there are many, many packages in R, and the best thing to do if you're looking for something specific is just to use a search engine and read documentations.

So yeah, it's a really well-written book by someone who is definitely qualified to write it (Mittal started http://www.prettygraph.com/). It's a book that does what it intends to do: good for beginners to flip through to get ideas, and well-suited for someone to whom the documentations are too intimidating. If you are at the stage where you're comfortable reading the documentations, though, then maybe about 30%-40% of the book could be helpful: you may still be inspired by some examples, though in that case I'd wonder if the book has enough content to be worthwhile.

So that's it. "R Graph Cookbook" by Hrishi V. Mittal. It was strange to find a bio of my friend Paul Butler on the list of reviewers, though that only affected the degree of my amusement and not how I saw the book.

End of Entry

Sunday, February 13, 2011

Another Pascal's Wager

What is your goal in life? It's a difficult question to answer, and if you haven't found an answer yet, let me propose a temporary one to you:

Live to grow. Live to be more mature, to be able to learn more quickly, and to be able to more quickly adapt to new situations. Treat life as training.

The reason is two-folds: for one, if you eventually decide to do something else with your life, then regardless of what it is, the growth you've gained will help you. The strengths/maturity you gain would make you more capable of achieving other things you want. You would have more and better tools to draw from.

The second reason is that it is fulfilling. Most people would say that they want to live to be happy, but there are different forms of happiness: there's happiness arising from everyday situations, and there's the deep sense of happiness you get from reflecting on who you are and what you do. Growing and becoming better includes being able to balance between the two types of happiness, and can be immensely fulfilling.

A third reason would apply if you are agnostic about whether or nor we would retain our consciousness after death. In the event that we do retain our consciousness, then who we are, the maturity and discipline of our thoughts, are the only things we might be able to take away (certainly we can't take away much else, including anything physical). Our maturity and cognitive discipline are the only things that might help us in tackling the new challenges that might lie ahead.

End of Entry

Thursday, February 10, 2011

Statistics, data mining, machine learning, and culture

"You'll realize that you need a dictionary to go between measure theory and probability theory. The underlying concept is the same, but the terminologies are different." - Kathryn Hare, PM354 Measure Theory

When I started learning about statistical techniques, I knew the textbook definition of statistics and data mining. The more I worked in these areas, the less clear the distinctions became. Then I met people who loved machine learning but didn't know statistics, and those who haven't heard of machine learning as a statistician.

After asking people and reading around, I got some partial answer as to how these fields differ. In short, doing statistics is like asking a multiple choice question, mostly with two choices (i.e. is this true, or not?), with more emphasis on using data points "efficiently", since getting data from experiments is expensive. Data mining is more like doing exploratory analysis on a big data set, usually collected for other purposes, without guarantee of any results. Machine learning deals with automating decisions to optimize something in real time, so there is a focus on iterative methods and on-line algorithms that can generate better predictions over time.

Overall, though, the techniques used in each of these areas are pretty much the same. However, there is a pretty important difference between them, and that difference lies in the culture of the people using the techniques.

For example, statisticians are a very different breed of people compared to people in data mining and machine learning. When I think of the word "statistician", I still somehow think of an old, bald man in his PhD suit reviewing papers and writing reports for his consulting work. "Machine learning" on the other hand, has a quite different feel to it. I think of hackers, people who just want to get something cool working -- a book recommendation, a way to predict which ads you click on -- and whose method of "reporting" primary consists of shouting across the room. "Data mining" seems to fall somewhere in between, but I haven't met enough data miners to be sure.

No, I don't think all statisticians are old bald men (in fact my mom is a statistician, and she's neither old, bald, nor a man). I think there are very cool statisticians out there that do really interesting and useful research (e.g. mom). I do think that each fields tends to have its own distinct culture, just as each company, school, or any non-random congregation of people would.

The culture of people in different fields is something pretty important to think about when we decide what to do with our lives. There are many, many interesting fields out there, and choosing one that is an epsilon "more interesting" than the others is not as fruitful as understanding the culture of the people in these fields: how they do work, how they collaborate, and what they are generally like.

End of Entry