Reasons to use R

April 10, 2007

I use a statistical software called R (http://www.R-project.org). I would highly recommend anyone who needs to manipulate, analyse and visualise data to check the software out.

Here are some reasons to use R over other statistical softwares:

  1. It is completely free to download and use without restrictions.
  2. It is also open source which means you can read the source code of all functions and modify it yourself.
  3. It has good documentation for most functions. There are a number of significantly well written introductory R books, tutorials, reference cards, reference manuals, vignettes and newsletters.
  4. A sensible default values and error messages is available for most functions. This is because majority of the functions and packages are written by experts in the field who are aware of the common and frequent pitfalls that a naive user might fall into.
  5. Large number of packages dealing with many areas of applications. A lot of statistical procedures are already available, so you do not need to waste time re-inventing the wheel everytime. Plus the many different ways to store, manipulate, analyse and visualise your data, though a bit overwhelming to new users at first, is intended to make you think about your data.
  6. Active, responsive and vibrant R community as seen in the helpful R mailing lists. But please do check the helpful posting guide before posting in order to ellicit the informative responses.
  7. Ability to work in an interactive and Integrated Development Environment which means you are able to debug errors and visualise results on the fly without having to first compile and then execute it.
  8. Cross platform compatability means that you are able to develop it on one environment and then implement it in many different platforms with nominal changes (if any). I usually develop and test the codes on either Windows of Mac with small datasets before running the codes on bigger datasets or bigger simulations on large UNIX or on Linux clusters.
  9. Various benchmark shows that R is just as fast or faster than many statistical softwares for various tasks. Here is an example of such a benchmark test which is outdated but still indicative. Many personal experience of people who have used multiple statistical softwares suggest that R codes are much easier to understand (but this depends on how one writes it).
  10. One can use Sweaveto automate reports that need to be compiled on regular or frequent basis.

BioConductor contains a collection of many R packages that specifically deal with analysis of genomic data (e.g. SAGE, SNP, sequence, microarrays, array CGH, proteomics, biological annotations and ontologies). This is a the fist choice of tools for many statisticians and leading experts in the field. Thus this is where the software for many proposed new methods becomes first available.