If using the same color for all bars, define the
assigning color based on another variable, map the variable to the
aesthetic, and if needed, use one of the
scale_fill_*() functions to set colors.
You can set all bars to be a given color with the
Alternatively, if the colors should be based on a variable, this
should be should happen in the
And if you want to then customize the colors, one option is
scale_fill_manual(), which allows you to manually assign
colors to each bar. See other
scale_fill_*() functions for
more options for color choices.
ggplot(mpg, aes(x = drv, fill = drv)) + geom_bar() + scale_fill_manual(values = c("purple", "orange", "darkblue"))
geom_bar() to a small
value to obtain narrower bars with more space between them.
expand argument in
scale_y_continuous(), e.g. add
scale_y_continuous(expand = expansion(mult = c(0, 0.05)))
to remove the expansion on the lower end of the y-axis but keep the
expansion on the upper end of the y-axis at 0.05 (the default expansion
for continuous scales).
By default ggplot2 expands the axes so the geoms aren’t flush against the edges of the plot.
To remove the spacing between the bars and the x-axis, but keep the spacing between the bars and the top of the plot, use the following.
To achieve the opposite, switch the values in
that the tallest bar is now flush against top of the plot.
To adjust spacing around the x-axis, adjust the
scale_x_discrete(). Note that this places the
bars flush against the left side and leaves some space on the right
The default look of a bar plot can be achieved with the following.
ggplot(mpg, aes(x = drv)) + geom_bar() + scale_x_discrete(expand = expansion(add = 0.6)) + scale_y_continuous(expand = expansion(mult = 0.05))
position = position_dodge2(preserve = "single") in
In the following plot the bars have differing widths within each
drv as there are differing levels of
You can use
preserve = "single" to address this.
How can I create a stacked bar plot displaying a conditional distribution where each stack is scaled to sum to 100%?
The following plot is useful for comparing counts but not as useful for comparing proportions, which is what you need if you want to be able to make statements like “in this sample, it’s more likely to have a two-seater car that has rear-wheel drive than an SUV that has rear-wheel drive”.
position = "fill" will generate a bar plot with bars of
equal length and the stacks in each bar will show the proportion of
drv for that particular
ggplot(mpg, aes(y = class, fill = drv)) + geom_bar(position = "fill") + scale_x_continuous(name = "percentage", labels = scales::label_percent(accuracy = 1))
How can I create a stacked bar plot based on data from a contingency table of to categorical variables?
Suppose you have the following data from an opinion poll, where the numbers in the cells represent the number of responses for each party/opinion combination.
poll <- tribble( ~party, ~agree, ~disagree, ~no_opinion, "Democrat", 20, 30, 20, "Republican", 15, 20, 10, "Independent", 10, 5, 0 )
You can first pivot the data longer to obtain a data frame with one
row per party/opinion combination and a new column,
the number of responses that fall into that category.
poll_longer <- poll %>% pivot_longer( cols = -party, names_to = "opinion", values_to = "n" ) poll_longer #> # A tibble: 9 × 3 #> party opinion n #> <chr> <chr> <dbl> #> 1 Democrat agree 20 #> 2 Democrat disagree 30 #> 3 Democrat no_opinion 20 #> 4 Republican agree 15 #> 5 Republican disagree 20 #> 6 Republican no_opinion 10 #> 7 Independent agree 10 #> 8 Independent disagree 5 #> 9 Independent no_opinion 0
Then, you can pass this result to
ggplot() and create a
bar for each
party on the
x, if you prefer vertical bars) axis and fill the bars in
with number of responses for each
To plot proportions (relative frequencies) instead of counts, use
position = "fill" in
Map the variable you want to group by to the
aesthetic, map the variable you want to
color the vars by to the
fill aesthetic, and set
position = "dodge" in
Suppose you have data from a survey with three questions, where respondents select “Agree” or “Disagree” for each question.
survey <- tibble::tribble( ~respondent, ~q1, ~q2, ~q3, 1, "Agree", "Agree", "Disagree", 2, "Disagree", "Agree", "Disagree", 3, "Agree", "Agree", "Disagree", 4, "Disagree", "Disagree", "Agree" )
You’ll first want to reshape these data so that each row represents a
respondent / question pair. You can do this with
tidyr::pivot_longer(). Then, pass the resulting longer data
ggplot() group responses for each question
Alternatively, you can use
stat_summary() to let ggplot2
calculate and plot the means.
Why do the bars on my plot disappear when I specify an axis range
ylim()? How can I get the bars to show up within a
given axis range?
ylim() is a shortcut for supplying the
limits argument to individual scales. When either of these
is set, any values outside the limits specified are replaced with
NA. Since the bars naturally start at
y = 0,
replacing part of the bars with
NAs results in the bars
entirely disappearing from the plot. For changing axis limits without
dropping data observations, set limits in
instead. Also note that this will result in a deceiving bar plot, which
should be avoided in general.
In the following plot the y-axis is limited to 20 to 120, and hence the bars are not showing up.
In order to obtain a bar plot with limited y-axis, you need to
instead set the limits in
This is, indeed, a deceiving plot. If you’re using a bar plot to display values that could not take the value of 0, you might choose a different geom instead. For example, if you have the following data and plot.
Also suppose that you want to cut off the bars at
y = 1000 since you know that the variable you’re plotting
cannot take a value less than 1000, you might use