Anyone working with high dimensional data would have tried to plot two variables at some point. The problem is that if even a small proportion of the data is noisy, this translated to a large number of points which can obscure good visualization. Here is an example:
x <- runif(100000)
y <- -10 + 20*x + rnorm(100000, sd=2)
y[1:20000] <- rnorm(20000, sd=10) # 20% noise
plot(x, y, pch=”O”)
which produces the following
which I don’t think is very informative as the very strong linear trend is now obscured by 20% of the noisy data. An alternative at better visualizing such plot will be to use the hexbin package.
library( hexbin)
plot( hexbin(x, y, xbins=50) )
The hexbin package estimates the density (number of points in) the neighbourhood of predefined grid centres and uses varying shades of grey to represent the density. You can install the hexbin package in R using the following commands:
source(“http://bioconductor.org/biocLite.R”)
biocLite(“hexbin”)

