# Two-factor ANOVAs

To conduct an two-factor ANOVA is pretty straightforward.

``````weeds.aov2 <- aov(flowers ~ species + soil, data = weeds) # two-factor anova (without interaction)
summary(weeds.aov2)``````
``````##             Df Sum Sq Mean Sq F value   Pr(>F)
## species      2   2369  1184.3   9.272 0.000436 ***
## soil         1    239   238.5   1.867 0.178720
## Residuals   44   5620   127.7
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1``````

This example constructs an ANOVA with two factors, but does not include the interaction term. If we want the interaction term, simply replace the + sign with an asterisk * .

``````weeds.aov2 <- aov(flowers ~ species * soil, data = weeds) # two-factor anova (with interaction)
summary(weeds.aov2)``````
``````##              Df Sum Sq Mean Sq F value  Pr(>F)
## species       2   2369  1184.3   9.102 0.00052 ***
## soil          1    239   238.5   1.833 0.18301
## species:soil  2    155    77.5   0.596 0.55574
## Residuals    42   5465   130.1
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1``````

Including the asterisk tells the formula to multiply both of the factors creating the interaction factor. It will automatically produce the results for factors independantly as well as the interaction term.

Don’t forget to check your assumptions

Everything stays the same for assumptions except the following modifications to Bartlett’s and Levene’s Tests.

``bartlett.test(flowers ~ interaction(species, soil), data = weeds) # Add the interaction() argument to correctly analyse an interaction term``
``````##
##  Bartlett test of homogeneity of variances
##
## data:  flowers by interaction(species, soil)
## Bartlett's K-squared = 5.3304, df = 5, p-value = 0.3769``````
``leveneTest(flowers ~ species * soil, data = weeds) # same syntax as the normal formula``
``````## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  5    0.81 0.5492
##       42``````

### Transformations

There are two methods to transform your response (Y) variable for an analysis.

1. Use a data manipulation technique such as `mutate()` to create a new column; or
2. Transform the variable within the analysis formula (see below)

For this example, we will be log transforming the flowers column within the weeds dataset.
NOTE: THIS MAKES NO SENSE AS IT IS NORMAL data. IT IS JUST AN EXAMPLE!

``````## Mutate Option ##
weeds <- mutate(weeds, logflowers = log(flowers)) # create new column called "logflowers"

## Formula option ##
weeds.aov.log <- aov(log(flowers) ~ species * soil, data = weeds) # log(flowers) as our Y variable tells the anova to use a log transformed response.

summary(weeds.aov.log)``````
``````##              Df Sum Sq Mean Sq F value  Pr(>F)
## species       2  2.842  1.4211  11.158 0.00013 ***
## soil          1  0.239  0.2387   1.874 0.17831
## species:soil  2  0.247  0.1234   0.969 0.38792
## Residuals    42  5.349  0.1274
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1``````
``shapiro.test(log(weeds.aov\$residuals)) #### DO NOT DO THIS!! ####``
``````##
##  Shapiro-Wilk normality test
##
## data:  log(weeds.aov\$residuals)
## W = 0.95759, p-value = 0.4422``````
``shapiro.test(weeds.aov.log\$residuals) # Do this! #``
``````##
##  Shapiro-Wilk normality test
##
## data:  weeds.aov.log\$residuals
## W = 0.97792, p-value = 0.4951``````

See how those are different? The same thing applies to square root (sqrt) or square/cubic transformations (^2, ^3).