Broken axis with ggplot2

For visualizing my data I use R and the library ggplot2. And just lately I made some sensitivity simulations with out dynamic global vegetation model (DGVM) LPJ-GUESS. While summarizing the data per ecosystem and having a first look at the data I realized, that one ecosystem has up to 10 times higher values than all the others. That made me searching for “broken axis” and I didn’t find a satisfying solution, so I had to create my own.

The data used in this example can be downloaded here (I hope I don’t delete it). First the required libraries and data must be loaded and I define a function for the base plot, which I also run immediately.

library(ggplot2)
library(Cairo)
if (file.exists("data.sum.RData")) {
  load("data.sum.RData")
} else {
  load(url("https://www.dropbox.com/s/nhbqrvptnjfyz18/data.sum.RData?dl=1"))
}
base.plot <- function(data) {
p <- ggplot(data, aes(x=value, y=name, col=sens))
p <- p + theme_bw()
p <- p + theme(legend.position="bottom")
p <- p + geom_point(size=2.5, position=position_jitter(w=0, h=0.15), alpha=0.8)
p <- p + scale_color_brewer(palette="Set1", guide=guide_legend(ncol=6, title=NULL))
p <- p + xlab("") + ylab("")
return(p)
}
p <- base.plot(data.sum)
CairoPNG(filename="base_plot.png", width=640, height=320)
print(p)
dev.off()

normal ggplot

Here you see the large offest between “desert” and the other ecosystems. Therefore I created a “desert mask” column in my data.frame, rescaled the values of the desert, so that they are still larger than the maximum of the others and created custom breaks and labels with the minimum desert value and maximum value of the others. The step between the labels should be 10 here.

data.sum$mask = 0
data.sum$mask[data.sum$name == "desert"] = 1
max.value <- max(data.sum$value)
max.value.other <- max(data.sum$value[data.sum$name != "desert"])
min.value.desert <- min(data.sum$value[data.sum$name == "desert"])
scale <- floor(min.value.desert / max.value.other) - 1
data.sum$value[data.sum$mask == 1] = data.sum$value[data.sum$mask == 1] / scale
step <- 10
low.end <- max(data.sum$value[data.sum$name != "desert"])
up.start <- ceiling(max(data.sum$value[data.sum$name != "desert"]))
breaks <- seq(0, max(data.sum$value), step)
labels <- seq(0, low.end+step, step)
labels <- append(labels, scale * seq(from=ceiling((up.start + step) / step) * step, length.out=length(breaks) - length(labels), by=step))

And now add that new data can be plotted using facet_grid, to show a clear break in the axis:

p <- base.plot(data.sum)
p <- p + facet_grid(. ~ mask, scales="free", space="free")
p <- p + scale_x_continuous(breaks=breaks, labels=labels, expand=c(0.075,0))
p <- p + theme(strip.background = element_blank(), strip.text.x = element_blank())
CairoPNG(filename="broken_axis.png", width=640, height=320)
print(p)
dev.off()

broken ggplot2

UPDATE: After a comment via twitter, I will also show a plot with a logarithmic x-axis. In my opinion the above “broken axis” stills looks better, although that´s not a clean statistical way.

p <- base.plot(data.sum)
CairoPNG(filename="log10_axis.png", width=640, height=320)
p <- p + scale_x_log10(breaks=c(10, 20, 30, 40, 50, 75, 500, 700))
print(p)
dev.off()

log10 ggplot2

This post was originally published on my Worpress blog, which is no longer available.


See also