16.5 A 1st encounter with dplyr [_]

dplyr is a package that provides numerous functions for manipulating data. It is part of the expanding tidyverse of packages sponsored in large part by RStudio. Hadley Whickham is the primary achitect of the tidyverse; he wrote many of the first packages in this framework and laidout the overall conceptual basis that other package authors follow.

For more on dplyr see

We will use 2 dplyr handy functions

  • summarize() / summarise()
  • group_by()

dplyr can use a syntax that involves “pipes”. This is a relatively recent innovation in R coding. You can string together R commands using the pipe function, %>%.

Note that the pipe function actually is implemented by the magrittr package. If you haven’t loaded ggplot, dplyr, or wildlifeR yet you might have to load up magrittr directly.

library(magrittr) 

For more background info on pipes see

When using pipes from magrittr, you start with data and follow it with an action you want done to it. So, for example, previously when we wanted the mean of the “mass”" column we did this

mean(my.frogs$mass) #[_]

Which is kind of read like a normal mathematical equation or function, where you start from inside the parentheses and work out.

Eg, this.is.read.2nd(this.is.read.1st)

R let’s you nest as many functions as you want. If I want to round off my calculation I can wrap “mean(my.frogs$mass)” in “round(…)”"

round(mean(my.frogs$mass)) #[_]

Using pipes to get the mean I write things more like a sentence

Eg, this.is.read.1st %>% this.is.read.2nd

my.frogs$mass %>% mean #[_] note parentheses after mean!

Which reads kind of like “Take the mass column and the dataframe and apply the mean() function to it.”

To round the mean we would do this

my.frogs$mass %>% mean %>% round #[_]

Which read left to right like a sentence is “Take the mass column, calculate the mean, and then round off the mean”

Note that the round() command has an argument for how many digits you want to round to. You include that in the parentheses

my.frogs$mass %>% mean() %>% round(digits = 2) #[_]

16.5.1 Optional: Piping everything [O]

This section is optional

Most people learn about pipes when doing data summarizing and cleaning with dplyr and friends. But pipes can be used in many (most?) context.

Try this

my.frogs$mass %>% hist

Not everythign works though. For example, I can’t figure out how to use pipes and t.test(). THere might be a way.

my.frogs %>% t.test(mass ~ sex)

End optional section


16.5.1.1 dplyr’s summarize() command [_]

Instead of mean(data$column) we can use summarise() (for the British) or summarize(), plus pipes.

We can get the grand mean of just the mass column by loading dplyr using library() and then using the summarise() command

library(dplyr)                      #[_]
my.frogs %>% summarise(mean(mass))

This is maybe more complicated than “mean(my.frogs\(mass)" or my.frogs\)mass %>% mean, but overall the pipe framework and summarise pays off when combined with group_by() in the next section