# Basic bar plots

For a quick reminder:

``````weeds.aov2 <- aov(flowers ~ species * soil, data = weeds)
anova(weeds.aov2)``````
``````## Analysis of Variance Table
##
## Response: flowers
##              Df Sum Sq Mean Sq F value    Pr(>F)
## species       2 2368.6 1184.31  9.1016 0.0005203 ***
## soil          1  238.5  238.52  1.8331 0.1830080
## species:soil  2  155.0   77.52  0.5958 0.5557366
## Residuals    42 5465.1  130.12
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1``````

From this, only Species was significant. For this dataset with a continuous Y and categorical X we would plot a bargraph.

There are three main ways to display a bar/column graph, `geom_col()`, `geom_bar()` and `stat_summary()`. I will cover each of them in some depth, showing the benefits to each. Here is a quick breakdown to begin.

Plot Pro Con
`geom_col()` Simple and effective, defaults to displaying data as is Errorbars are finicky
`geom_bar()` Errorbars work well, displays sample size/counts by default Requires a single argument to match geom_col
`stat_summary()` Quick calculation of mean, used across all geometric types Difficult to code and errorbars just flat out dont work

I find best way to generate the bargraph properly, is to use the `summarise()` command to generate our means and standard errors before plotting. This extra step saves alot of hassle and you can copy this code across any dataset, changing the column names. We can generate these within ggplot, but it leads to complications (see `stat_summary()` below).

``````weeds.summarise <- weeds %>% group_by(species) %>%
summarise(mean = mean(flowers), se=sd(flowers)/sqrt(n()))``````

This is a quick way to generate our mean and se for flowers for each species. Now, we can graph our results in a bargraph.

``````ggplot(weeds.summarise, aes(x=species, y=mean, fill=species)) +
geom_col()`````` This will generate a pretty basic graph. You will notice that I used fill instead of colour. If you use colour on a column/bar graph it will colour the outline. Using fill will fill the entire bar according to the species.

We used `geom_col()` to generate a column graph. You can use `geom_bar()` but it requires a stat = argument. If you use `geom_bar()`, `stat = “identity”` use the numbers in the mean column of our data, displaying data as it is in the data frame, rather than counting the number of cases in each X position (its default state).

I personally use `geom_bar()` as I find it easier to do errorbars later. Future pages use `geom_bar()`

``````ggplot(weeds.summarise, aes(x=species, y=mean, fill=species)) +
geom_bar(stat="identity")`````` Regardless of what way you graph this, they look the same. For now, let’s work with `geom_bar()`. Let’s fix up the graph as much as we want, until we are happy.

``````weeds.bar <- ggplot(weeds.summarise, aes(x=species, y=mean, fill=species))+
geom_bar(stat="identity", show.legend=F, colour="black")+
labs(x="Weed Species", y= expression(Flowers~(m^3)))+
theme(panel.background = element_blank(), panel.grid = element_blank(), axis.line = element_line(colour = "black", size=1), axis.text = element_text(colour="lightsteelblue4", size=12), axis.title = element_text(colour="steelblue", size=14, face="bold"))+
scale_fill_manual(values = c("lightblue", "steelblue", "darkslateblue"))
weeds.bar`````` So, now we have our graph in a “nicer” format, we can see that there are some cruical points of information missing from this graph. Most notably, the errorbars and letters or some other notation that denotes statistical differences between the levels (i.e. Tukeys HSD results).