In ggplot2, a plot in constructed by adding layers to it. A layer consists of two important parts: the geometry (geoms), and statistical transformations (stats). The 'geom' part of a layer is important because it determines the looks of the data. Geoms determine how something is displayed, not what is displayed.
Specifying geoms
There are five ways in which the 'geom' part of a layer can be specified.
# 1. The geom can have a layer constructor
geom_area()
# 2. A stat can default to a particular geom
stat_density() # has `geom = "area"` as default
# 3. It can be given to a stat as a string
stat_function(geom = "area")
# 4. The ggproto object of a geom can be given
stat_bin(geom = GeomArea)
# 5. It can be given to `layer()` directly
layer(
geom = "area",
stat = "smooth",
position = "identity"
)
Many of these ways are absolutely equivalent. Using
stat_density(geom = "line")
is identical to using
geom_line(stat = "density")
. Note that for layer()
, you need to
provide the "position"
argument as well. To give geoms as a string, take
the function name, and remove the geom_
prefix, such that geom_point
becomes "point"
.
Some of the more well known geoms that can be used for the geom
argument
are: "point"
, "line"
,
"area"
, "bar"
and
"polygon"
.
Graphical display
A ggplot is build on top of the grid package. This package understands various graphical primitives, such as points, lines, rectangles and polygons and their positions, as well as graphical attributes, also termed aesthetics, such as colours, fills, linewidths and linetypes. The job of the geom part of a layer, is to translate data to grid graphics that can be plotted.
To see how aesthetics are specified, run vignette("ggplot2-specs")
. To see
what geom uses what aesthetics, you can find the Aesthetics section in
their documentation, for example in ?geom_line
.
While almost anything can be represented by polygons if you try hard enough,
it is not always convenient to do so manually. For this reason, the geoms
provide abstractions that take most of this hassle away. geom_ribbon()
for example is a special case of geom_polygon()
, where two sets of
y-positions have a shared x-position. In turn, geom_area()
is a special
case of a ribbon, where one of the two sets of y-positions is set at 0.
# A hassle to build a polygon
my_polygon <- data.frame(
x = c(economics$date, rev(economics$date)),
y = c(economics$uempmed, rev(economics$psavert))
)
ggplot(my_polygon, aes(x, y)) +
geom_polygon()
# More succinctly
ggplot(economics, aes(date)) +
geom_ribbon(aes(ymin = uempmed, ymax = psavert))
In addition to abstraction, geoms sometimes also perform composition.
A boxplot is a particular arrangement of lines, rectangles and points that
people have agreed upon is a summary of some data, which is performed by
geom_boxplot()
.
Boxplot data
value <- fivenum(rnorm(100))
df <- data.frame(
min = value[1], lower = value[2], middle = value[3],
upper = value[4], max = value[5]
)
# Drawing a boxplot manually
ggplot(df, aes(x = 1, xend = 1)) +
geom_rect(
aes(
xmin = 0.55, xmax = 1.45,
ymin = lower, ymax = upper
),
colour = "black", fill = "white"
) +
geom_segment(
aes(
x = 0.55, xend = 1.45,
y = middle, yend = middle
),
size = 1
) +
geom_segment(aes(y = lower, yend = min)) +
geom_segment(aes(y = upper, yend = max))
# More succinctly
ggplot(df, aes(x = 1)) +
geom_boxplot(
aes(ymin = min, ymax = max,
lower = lower, upper = upper,
middle = middle),
stat = "identity"
)
Under the hood
Internally, geoms are represented as ggproto
classes that
occupy a slot in a layer. All these classes inherit from the parental
Geom
ggproto object that orchestrates how geoms work. Briefly, geoms
are given the opportunity to draw the data of the layer as a whole,
a facet panel, or of individual groups. For more information on extending
geoms, see the Creating a new geom section after running
vignette("extending-ggplot2")
. Additionally, see the New geoms section
of the online book.
See also
For an overview of all geom layers, see the online reference.
Other layer documentation:
layer()
,
layer_positions
,
layer_stats