Things I wish I’d known before I started using R

I’ve been using R for a couple of years now.  This post is aimed at me a couple of years ago, or you if you’re just starting to use R and are pressed for time.  Here’s some things I wish I’d known in early 2009.

  1. Use a naming convention
  2. read.csv is a great function, but be careful
  3. doBy is not just a city in the Middle East
  4. attach is more trouble than it’s worth
  5. Many packages are poor
  6. There’s a lot of useful blog posts out there

Use a naming convention.  You should probably have a naming convention whatever language you’re using, but you really need one with R, thanks to features like not needing to declare variables, and partial matchingGoogle has one – I think that putting dots in variable names is asking for trouble, but it would be fine – the important thing is just to pick one.  (And you should probably figure that Google knows more about how to write good code than I do).

read.csv is a really great function.  But it has some gotchas – e.g. default options may convert numbers to factors.  Dealing with data is a whole other post, but you can always convert back using as.numeric(as.character(f)), or go through the documentation carefully (see stringsAsFactors, colClasses), or perhaps best yet, use Python or some other scripting language to pre-process the data (it’s always a mess).

doBy is a great package, covering 95% of what you need in processing data by groups.  People swear by plyr too, but try doBy first.

attach – I’ll let Google’s R style guide take this one:

The possibilities for creating errors when using attach are numerous. Avoid it.

Many packages are poor.  My friend John Mount has a whole tutorial (the cranky guide to trying R packages) on this:

The summary is: expect errors, search out errors and don’t start with the built in examples or real data.

Why do I bring this up?  Well, it’s not just to criticize package designers who fail to do even minimal QA.  If you’re using a language and something doesn’t work, your first instinct is (hopefully) to think that you’ve got something wrong.  If you’re using a contributed package, give a bit more weight to the idea that they’ve got something wrong (and don’t rule it out even for something in core R).

And finally, there’s a lot of useful resources out there.  It’s my blog, so I’m going to point you to my “R in production systems” post, but John Mount and Nina Zumel’s posts on R are an excellent read, particularly Survive R.  I’ve also liked Quick-R for SAS / SPSS / Stata users, and you probably will have your favorites.

What am I missing?  Add it in the comments.

Advertisements

Tags:

5 Responses to “Things I wish I’d known before I started using R”

  1. Roman Says:

    Check your link, a bunch of them is pointing to a local server.

  2. erehweb Says:

    Thanks! Should be fixed now.

  3. Scott Locklin Says:

    Thanks for the doBy pointer -somehow I never noticed that. It seems to do what their abortion of an apply function is supposed to do. Excellent!

  4. Pat Burns Says:

    First off, I think this is an absolutely wonderful topic to have in a blog. I have two suggestions, the first of which was not available in early 2009:

    For true beginners: ‘Some hints for the R beginner’: http://www.burns-stat.com/pages/Tutor/hints_R_begin.html

    The second is for once you’ve found your feet a little bit at least: ‘The R Inferno’ http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: