Homepage > Data Exploration and Manipulation > Manipulating data > The Pipe Function

The Pipe Function

This lets you run multiple different functions on one dataset without having to use the intermediate steps you would have to use in base R.
You start with the data you want to apply the functions to, followed by a pipe %>%. After each pipe you must go to the next line.

This is useful for large messy functions with multiple nested parts. It separates everything out and makes it easier to follow.

A pipe is simply a > nested within two percentage, %, symbols. The keyboard shortcut for this is Ctrl + SHIFT + M

sum_data <- weeds %>% 
  group_by(species, soil) %>% 
  summarise(max(flowers))

You simply start with the data you want to apply the functions to, followed by a pipe. After each pipe you must go to the next line (sorta).
In this example, we grouped the data by species and soil, then performed the summarise function to generate the max number for each combination

You will notice, that because we specified the data in the first line, we did not have to specify the data in the other lines, only the columns

new_data <- weeds %>% 
  mutate(binary = soil == "sandstone") %>% 
  filter(weeds == "native")

As you can see, we can do this with most of the functions we have already learnt. This above example will generate a binary outcome (true/false) for soil with TRUE as “sandstone”. Followed by filtering for “native” weeds. This will generate a a dataset with native weeds that have a true/false outcome based on soil.

Piping is incredibly useful and much easier to read. It is a function I keep forgetting to use, until I look at my code later on, full of regrets. It shortens and simplifies code alot.