Plotting with ggplot2

For beginners just getting their bearings in R and Rstudio, there are a few quick ways to translate your data into visualization. Let’s start with some random sample data with an exponential behavior:

The Plot Command

set.seed(123)
x <- rnorm(100,10,1)
y <- exp(x)
plot(x,y, main = "A Sample Plot", xlab = "X-label", ylab = "Y-label") Behold…ggplot2!

The plot command is a quick way to snapshot your data in a bare-bones manner. I tend to only use this if I want to view my data quickly in the RStudio Viewer, but for reports, customer requests, and more complex plotting I always turn to the ggplot2 package. I highly recommend exploring this on your own as there are so many options available and I’m always finding new ways to use it. You can check out a number of other examples at The R Graph Gallery

For the example below, we’ll use one of R’s built-in datasets: the classic mtcars dataset depicting “Motor Trend” Car Road tests on 32 automobiles from 1974. If you call the mtcars dataset using the head() command, you can see the spread of columnar data available: Let’s take a quick look at two variables related to the data set, mpg and wt. To perform a simple execution of the ggplot2 command:

• Define your data = using the data frame for analysis
• Define your aesthetic definitions using aes() with x and y designations (check out what other options are available to aes such as fill = which is very helpful for coloring groups of data)
• Using the geom_ series of options, choose the one which will best help you to visualize your data. There are a ton of options, for this we will use the geom_point one.
library(ggplot2)

ggplot(data = mtcars, aes(x = wt, y = mpg)) +
geom_point() But Could It Look Better?

Definitely! Already we can see that the default setting of ggplot spruce the aesthetics up a bit. Let’s take a deeper dive though, I want to make this pop and be more visually informative to the viewer. To accomplish this I’m going to take further advantage of the aes() capabilities and add some themes to control my title as well as hide that margin of empty space along the sides”

mtcars.plot <- ggplot(mtcars, aes(x = wt, y = mpg, color = mpg, size = mpg)) +
geom_point(alpha = 0.6) +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
axis.line = element_line(size = 3, colour = "grey80")) +
labs(title = "mtcars Data") +
xlab("Miles Per Gallon") + ylab("Vehicle Weight (per 1000lbs)") Presto! Now we get a better visual representation of how mpg is affected by vehicle weight.

Let’s Face It, It’s Facet

Let’s expand our reach one more time to include a bit more data. If we wanted to divide this up into facets by number of cylinders and analyze its effect, we could place + facet_wrap(~cyl) at the end of our ggplot call and get a nice visual of the division of 4, 6, and 8-cylinder vehicles with their respective mpg ~ wt relationships.

mtcars.plot + facet_wrap(~cyl) Outside The Box

Finally, I’ll end this post on one of the more applicable data visualization methods that I’ve been asked to produce. Box plots are a great way of displaying numerous pieces of information within a dataset. IQRs and Median values can be easily ascertained at a quick glance and spread among different distributions. Here we’ll take the mtcars dataset one last time and look at mpg as a function of the number of cylinders through use of the geom_boxplot() option (and for kicks we’ll define a character list of pretty colors thanks to the R Color Brewer color options).

colorcyl <- c("firebrick", "deepskyblue", "darkseagreen")

ggplot(mtcars, aes(x = as.factor(cyl), y = mpg)) + geom_boxplot(fill = colorcyl, alpha = 0.5) +
stat_boxplot(geom ='errorbar', width = 0.5) +
theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
plot.subtitle = element_text(hjust = 0.5, face = "bold")) +
theme(axis.line = element_line(size = 3, colour = "grey80")) +
labs(title = "mtcars Data") + xlab("Miles Per Gallon") + ylab("Vehicle Weight (per 1000lbs)") +
theme(legend.position="none") That’ll conclude our first look at the extremely ground-level capabilities of ggplot2. Please explore more on your own and feel free to send me any creations you happen to produce!