Using hexbin to better visualize a dense two-dimensional plot

Anyone working with high dimensional data would have tried to plot two variables at some point. The problem is that if even a small proportion of the data is noisy, this translated to a large number of points which can obscure good visualization. Here is an example:

x <- runif(100000)
y <- -10 + 20*x + rnorm(100000, sd=2)
y[1:20000] <- rnorm(20000, sd=10) # 20% noise
plot(x, y, pch=”O”)

which produces the following

normal_plot.jpg

which I don’t think is very informative as the very strong linear trend is now obscured by 20% of the noisy data. An alternative at better visualizing such plot will be to use the hexbin package.

library( hexbin)
plot( hexbin(x, y, xbins=50) )

hexbin_plot.jpg

The hexbin package estimates the density (number of points in) the neighbourhood of predefined grid centres and uses varying shades of grey to represent the density. You can install the hexbin package in R using the following commands:

source(“http://bioconductor.org/biocLite.R&#8221;)
biocLite(“hexbin”)

About these ads

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s