16.5 A 1st encounter with dplyr [_]
dplyr is a package that provides numerous functions for manipulating data. It is part of the expanding tidyverse of packages sponsored in large part by RStudio. Hadley Whickham is the primary achitect of the tidyverse; he wrote many of the first packages in this framework and laidout the overall conceptual basis that other package authors follow.
For more on dplyr see
- https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html
- https://dplyr.tidyverse.org/
- http://genomicsclass.github.io/book/pages/dplyr_tutorial.html
We will use 2 dplyr handy functions
- summarize() / summarise()
- group_by()
dplyr can use a syntax that involves “pipes”. This is a relatively recent innovation in R coding. You can string together R commands using the pipe function, %>%.
Note that the pipe function actually is implemented by the magrittr package. If you haven’t loaded ggplot, dplyr, or wildlifeR yet you might have to load up magrittr directly.
library(magrittr)
For more background info on pipes see
When using pipes from magrittr, you start with data and follow it with an action you want done to it. So, for example, previously when we wanted the mean of the “mass”" column we did this
mean(my.frogs$mass) #[_]
Which is kind of read like a normal mathematical equation or function, where you start from inside the parentheses and work out.
Eg, this.is.read.2nd(this.is.read.1st)
R let’s you nest as many functions as you want. If I want to round off my calculation I can wrap “mean(my.frogs$mass)” in “round(…)”"
round(mean(my.frogs$mass)) #[_]
Using pipes to get the mean I write things more like a sentence
Eg, this.is.read.1st %>% this.is.read.2nd
my.frogs$mass %>% mean #[_] note parentheses after mean!
Which reads kind of like “Take the mass column and the dataframe and apply the mean() function to it.”
To round the mean we would do this
my.frogs$mass %>% mean %>% round #[_]
Which read left to right like a sentence is “Take the mass column, calculate the mean, and then round off the mean”
Note that the round() command has an argument for how many digits you want to round to. You include that in the parentheses
my.frogs$mass %>% mean() %>% round(digits = 2) #[_]
16.5.1 Optional: Piping everything [O]
This section is optional
Most people learn about pipes when doing data summarizing and cleaning with dplyr and friends. But pipes can be used in many (most?) context.
Try this
my.frogs$mass %>% hist
Not everythign works though. For example, I can’t figure out how to use pipes and t.test(). THere might be a way.
my.frogs %>% t.test(mass ~ sex)
End optional section
16.5.1.1 dplyr’s summarize() command [_]
Instead of mean(data$column) we can use summarise() (for the British) or summarize(), plus pipes.
We can get the grand mean of just the mass column by loading dplyr using library() and then using the summarise() command
library(dplyr) #[_]
my.frogs %>% summarise(mean(mass))
This is maybe more complicated than “mean(my.frogs\(mass)" or my.frogs\)mass %>% mean, but overall the pipe framework and summarise pays off when combined with group_by() in the next section