stat_summary
operates on unique x
; stat_summary_bin
operators on binned x
. They are more flexible versions of
stat_bin
: instead of just counting, they can compute any
aggregate.
stat_summary_bin(mapping = NULL, data = NULL, geom = "pointrange", position = "identity", ..., fun.data = NULL, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) stat_summary(mapping = NULL, data = NULL, geom = "pointrange", position = "identity", ..., fun.data = NULL, fun.y = NULL, fun.ymax = NULL, fun.ymin = NULL, fun.args = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
mapping  Set of aesthetic mappings created by 

data  The data to be displayed in this layer. There are three options: If A A 
geom  Use to override the default connection between

position  Position adjustment, either as a string, or the result of a call to a position adjustment function. 
...  other arguments passed on to 
fun.data  A function that is given the complete data and should
return a data frame with variables 
fun.ymin, fun.y, fun.ymax  Alternatively, supply three individual functions that are each passed a vector of x's and should return a single number. 
fun.args  Optional additional arguments passed on to the functions. 
na.rm  If 
show.legend  logical. Should this layer be included in the legends?

inherit.aes  If 
stat_summary
understands the following aesthetics (required aesthetics are in bold):
x
y
group
You can either supply summary functions individually (fun.y
,
fun.ymax
, fun.ymin
), or as a single function (fun.data
):
Complete summary function. Should take numeric vector as input and return data frame as output
ymin summary function (should take numeric vector and return single number)
y summary function (should take numeric vector and return single number)
ymax summary function (should take numeric vector and return single number)
A simple vector function is easiest to work with as you can return a single
number, but is somewhat less flexible. If your summary function computes
multiple values at once (e.g. ymin and ymax), use fun.data
.
If no aggregation functions are suppled, will default to
mean_se
.
geom_errorbar
, geom_pointrange
,
geom_linerange
, geom_crossbar
for geoms to
display summarised data
d < ggplot(mtcars, aes(cyl, mpg)) + geom_point() d + stat_summary(fun.data = "mean_cl_boot", colour = "red", size = 2)# You can supply individual functions to summarise the value at # each x: d + stat_summary(fun.y = "median", colour = "red", size = 2, geom = "point")d + stat_summary(fun.y = "mean", colour = "red", size = 2, geom = "point")d + stat_summary(fun.y = mean, fun.ymin = min, fun.ymax = max, colour = "red")# Don't use ylim to zoom into a summary plot  this throws the # data away p < ggplot(mtcars, aes(cyl, mpg)) + stat_summary(fun.y = "mean", geom = "point") p#> Warning: Removed 9 rows containing nonfinite values (stat_summary).# A set of useful summary functions is provided from the Hmisc package: stat_sum_df < function(fun, geom="crossbar", ...) { stat_summary(fun.data = fun, colour = "red", geom = geom, width = 0.2, ...) } d < ggplot(mtcars, aes(cyl, mpg)) + geom_point() # The crossbar geom needs grouping to be specified when used with # a continuous x axis. d + stat_sum_df("mean_cl_boot", mapping = aes(group = cyl))# An example with highly skewed distributions: if (require("ggplot2movies")) { set.seed(596) mov < movies[sample(nrow(movies), 1000), ] m2 < ggplot(mov, aes(x = factor(round(rating)), y = votes)) + geom_point() m2 < m2 + stat_summary(fun.data = "mean_cl_boot", geom = "crossbar", colour = "red", width = 0.3) + xlab("rating") m2 # Notice how the overplotting skews off visual perception of the mean # supplementing the raw data with summary statistics is _very_ important # Next, we'll look at votes on a log scale. # Transforming the scale means the data are transformed # first, after which statistics are computed: m2 + scale_y_log10() # Transforming the coordinate system occurs after the # statistic has been computed. This means we're calculating the summary on the raw data # and stretching the geoms onto the log scale. Compare the widths of the # standard errors. m2 + coord_trans(y="log10") }#> Warning: Removed 1 rows containing missing values (geom_crossbar).