• Data subsetting using dplyr
  • " />

    Make some subsets of the data without certain columns and certain rows.

    Packages

    We’ll use the following packages. You’ll need to download them if you haven’t already. I’d try just loading them with library() first (in the next section), then installing if needed.

    Download packages

    Only run this code if you haven’t already download these packages. You can run it by removing the “#” in front of the code.

    #install.package("here")
    #install.package("RCurl")

    Load packages

    library(here)
    ## here() starts at C:/Users/lisanjie/Documents/1_R/git/mammalsmilk
    library(RCurl)
    ## Loading required package: bitops
    library(devtools)
    library(dplyr)
    ## 
    ## Attaching package: 'dplyr'
    ## The following objects are masked from 'package:stats':
    ## 
    ##     filter, lag
    ## The following objects are masked from 'package:base':
    ## 
    ##     intersect, setdiff, setequal, union

    Loading the mammalsmilk package

    If you haven’t already, download the mammalsmilk package from GitHub (note the “S” between mammal and milk). This is commented out in the code below - only run the code if you have never downloaded it before, or haven’t recently (in case its been updated)

    # install_github("brouwern/mammalsmilk")

    You can then load the package with library()

    library(mammalsmilk)

    Load data

    We’ll start with data the has the major issues cleaned up. This was done in the preceding data cleaning vignette. You can load it just from the package (shown first) or you can try to load the data from your hard drive (shown second).

    Within the package, these data are just called “milk”

    data("milk")

    There are many ways to get data in R. If you want to load the data from your hard drive using the here package as discssed in the data access vignette, your code might look like this..

    #file name of cleaned data we want to subset
    file. <- "skibiel_mammalsmilk.csv"
    
    #full path
    ## NOTE: this will be particular to YOUR computer
    full.path <- here::here("inst/extdata", #folder
                            file.)  #file name
    
    # load the data
    milk <- read.csv(file = full.path)

    Check

    dim(milk)
    ## [1] 130  19

    Remove columns from a dataframe

    The “N” column indicates sample size used in the original paper where the data came from and and isn’t needed. Also, for many analyses we’ll fous on fat, so we’ll drop the other columns.

    We can drop columns using the synatx “select(-column.name)”; note

    1. the minus sign (“-”) preceding the column name
    2. the column name isn’t in quotes

    We can do multiple columns by just seperating the names with a comma, including a “-” for each one. (note, no “c(…)” needed!)

    milk_fat <- milk %>% dplyr::select(-N,
                                      -prot,
                                      -sugar,
                                      -energy)

    We can save this sub-dataset for easy access

    #file name
    file. <- "skibiel_fat.csv"
    
    # full path
    ## note: this will be particular to your computer!
    full.path <- here::here("inst/extdata", #folder
                            file.)  #file name
    
    write.csv(milk_fat,
              file = full.path,
              row.names =F)

    These data are also stored within the mammalsmilk package and can be accerssed directly using library(“milk_fat”)

    Filter by rows

    For some analyses we’ll focus on just primates and their close-ish relatives. The filter() command lets us just slect the rows we want.

    First, let’s make a vector of the names of the taxonomic orders we want

    primates.order.and.friends <- c("Rodentia","Primates","Lagomorpha")
    milk_primates <-  milk_fat %>% filter(ord %in% primates.order.and.friends)

    As beofre we can save this.

    file. <- "skibiel_primate_fat.csv"
    
    full.path <- here::here("inst/extdata", #folder
                            file.)  #file name
    
    write.csv(milk_primates,
              file = full.path,
              row.names =F)

    These data can also be accessed directly using data(“milk_primates”)