c-data_subsetting.Rmd
Make some subsets of the data without certain columns and certain rows.
We’ll use the following packages. You’ll need to download them if you haven’t already. I’d try just loading them with library() first (in the next section), then installing if needed.
Only run this code if you haven’t already download these packages. You can run it by removing the “#” in front of the code.
#install.package("here")
#install.package("RCurl")
library(here)
## here() starts at C:/Users/lisanjie/Documents/1_R/git/mammalsmilk
library(RCurl)
## Loading required package: bitops
library(devtools)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
If you haven’t already, download the mammalsmilk package from GitHub (note the “S” between mammal and milk). This is commented out in the code below - only run the code if you have never downloaded it before, or haven’t recently (in case its been updated)
# install_github("brouwern/mammalsmilk")
You can then load the package with library()
library(mammalsmilk)
We’ll start with data the has the major issues cleaned up. This was done in the preceding data cleaning vignette. You can load it just from the package (shown first) or you can try to load the data from your hard drive (shown second).
Within the package, these data are just called “milk”
data("milk")
There are many ways to get data in R. If you want to load the data from your hard drive using the here package as discssed in the data access vignette, your code might look like this..
#file name of cleaned data we want to subset
file. <- "skibiel_mammalsmilk.csv"
#full path
## NOTE: this will be particular to YOUR computer
full.path <- here::here("inst/extdata", #folder
file.) #file name
# load the data
milk <- read.csv(file = full.path)
Check
dim(milk)
## [1] 130 19
The “N” column indicates sample size used in the original paper where the data came from and and isn’t needed. Also, for many analyses we’ll fous on fat, so we’ll drop the other columns.
We can drop columns using the synatx “select(-column.name)”; note
We can do multiple columns by just seperating the names with a comma, including a “-” for each one. (note, no “c(…)” needed!)
milk_fat <- milk %>% dplyr::select(-N,
-prot,
-sugar,
-energy)
We can save this sub-dataset for easy access
#file name
file. <- "skibiel_fat.csv"
# full path
## note: this will be particular to your computer!
full.path <- here::here("inst/extdata", #folder
file.) #file name
write.csv(milk_fat,
file = full.path,
row.names =F)
These data are also stored within the mammalsmilk package and can be accerssed directly using library(“milk_fat”)
For some analyses we’ll focus on just primates and their close-ish relatives. The filter() command lets us just slect the rows we want.
First, let’s make a vector of the names of the taxonomic orders we want
primates.order.and.friends <- c("Rodentia","Primates","Lagomorpha")
milk_primates <- milk_fat %>% filter(ord %in% primates.order.and.friends)
As beofre we can save this.
file. <- "skibiel_primate_fat.csv"
full.path <- here::here("inst/extdata", #folder
file.) #file name
write.csv(milk_primates,
file = full.path,
row.names =F)
These data can also be accessed directly using data(“milk_primates”)