<ol start="3" style="list-style-type: upper-alpha"> <li>Data subsetting using dplyr</li> </ol> • mammalsmilk

Make some subsets of the data without certain columns and certain rows.

Packages

We’ll use the following packages. You’ll need to download them if you haven’t already. I’d try just loading them with library() first (in the next section), then installing if needed.

Download packages

Only run this code if you haven’t already download these packages. You can run it by removing the “#” in front of the code.

#install.package("here")
#install.package("RCurl")

Load packages

library(here)

## here() starts at C:/Users/lisanjie/Documents/1_R/git/mammalsmilk

library(RCurl)

## Loading required package: bitops

library(devtools)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Loading the mammalsmilk package

If you haven’t already, download the mammalsmilk package from GitHub (note the “S” between mammal and milk). This is commented out in the code below - only run the code if you have never downloaded it before, or haven’t recently (in case its been updated)

# install_github("brouwern/mammalsmilk")

You can then load the package with library()

library(mammalsmilk)

Load data

We’ll start with data the has the major issues cleaned up. This was done in the preceding data cleaning vignette. You can load it just from the package (shown first) or you can try to load the data from your hard drive (shown second).

Within the package, these data are just called “milk”

data("milk")

There are many ways to get data in R. If you want to load the data from your hard drive using the here package as discssed in the data access vignette, your code might look like this..

#file name of cleaned data we want to subset
file. <- "skibiel_mammalsmilk.csv"

#full path
## NOTE: this will be particular to YOUR computer
full.path <- here::here("inst/extdata", #folder
                        file.)  #file name

# load the data
milk <- read.csv(file = full.path)

Check

dim(milk)

## [1] 130  19

Remove columns from a dataframe

The “N” column indicates sample size used in the original paper where the data came from and and isn’t needed. Also, for many analyses we’ll fous on fat, so we’ll drop the other columns.

We can drop columns using the synatx “select(-column.name)”; note

the minus sign (“-”) preceding the column name
the column name isn’t in quotes

We can do multiple columns by just seperating the names with a comma, including a “-” for each one. (note, no “c(…)” needed!)

milk_fat <- milk %>% dplyr::select(-N,
                                  -prot,
                                  -sugar,
                                  -energy)

We can save this sub-dataset for easy access

#file name
file. <- "skibiel_fat.csv"

# full path
## note: this will be particular to your computer!
full.path <- here::here("inst/extdata", #folder
                        file.)  #file name

write.csv(milk_fat,
          file = full.path,
          row.names =F)

These data are also stored within the mammalsmilk package and can be accerssed directly using library(“milk_fat”)

Filter by rows

For some analyses we’ll focus on just primates and their close-ish relatives. The filter() command lets us just slect the rows we want.

First, let’s make a vector of the names of the taxonomic orders we want

primates.order.and.friends <- c("Rodentia","Primates","Lagomorpha")

milk_primates <-  milk_fat %>% filter(ord %in% primates.order.and.friends)

As beofre we can save this.

file. <- "skibiel_primate_fat.csv"

full.path <- here::here("inst/extdata", #folder
                        file.)  #file name

write.csv(milk_primates,
          file = full.path,
          row.names =F)

These data can also be accessed directly using data(“milk_primates”)

Data subsetting using dplyr

Nathan Brouwer

2018-11-27