b1-data_access.Rmd
In this tutorial we’ll load the data from the mammalsmilk data. All the data files used in the package are contained internally within the package and can be loaded (as detailed below) using the standard data() function in R. However, getting raw into R is frequently an issue with R, and so I give instructions on how to find the raw .csv on your hard drive or the internet. I then outline two ways to load data into R. I first go over a basic way using the “Import Dataset” button in RStudio. I think cover a more advanced way using there here() function.
Skibiel et al 2013. The evolution of the nutrient composition of mammalian milks. Journal of Animal Ecology 82: 1254-1264. https://doi.org/10.1111/1365-2656.12095
We’ll use the following packages. You’ll need to download them if you haven’t already. I’d try just loading them with library() first (in the next section), then installing if needed.
If you haven’t already, download the mammalsmilk package from GitHub (note the “S” between mammal and milk). This is commented out in the code below - only run the code if you have never downloaded it before, or haven’t recently (in case its been updated)
# install_github("brouwern/mammalsmilk")
You can then load the package with library()
library(mammalsmilk)
The datasets for this package can be found in several places.
Data within R or an R package is accessed using the data() command. TO get the classic iris data set I do this:
data("iris")
If mammalsmilk is downloaded and installed using library() I can get a dataset like this:
data("milk_raw")
This should always work. However, two things:
Data can be loaded directly from GitHub using the RCurl command.
First, we need the URL for the file
# Create an object for the URL where your data is stored.
url <- "https://raw.githubusercontent.com/brouwern/mammalsmilk/master/inst/extdata/skibiel_mammalsmilk_raw.csv"
Then we run getURL()
myData <- getURL(url)
Finally load it using read.csv
milk_raw <- read.csv(textConnection(myData))
Check that its there using dim()
dim(milk_raw)
#> [1] 130 22
For more information on loading from GitHub see https://github.com/christophergandrud/Introduction_to_Statistics_and_Data_Analysis_Yonsei/wiki/Importing-Data,-Basic
Loading data into R can be difficult when you first get started. Its good to practice this, so if you’ve had trouble in the past I recommend locating the .csv files associated with this package and trying to load them by hand, following the directions below.
How data gets loaded into R is always somewhat particular to how you are running R and where you have your data saved. I will outline a basic way to load data first, then provide code for how I use a more advanced and flexible approach.
Once you have the package downloaded and installed using library() you should be able to find the files associated with it just using the search function on your computer. The main starting data file for the package is “skibiel_mammalsmilk_raw.csv”; my PC found it instantly. I then right clicked on it and select “Open File Location” to see the directory.
This opens a directory “…/mammalsmilk/inst/extdata” that contains all the .csv files underlying the data files used in the package.
To practice loading .csv files I recommend copying these files to a new directory where they can easily be found, eg, where you normally save all your R work.
If you can’t find the data on your hard drive, you can also download it directly from GitHub:
This will take you to a raw version of the data and you can right click and “Save as” it to wherever you want.
The easiest way to load data is to figure out where you have saved the data, then use the “Import dataset” button in RStudio to navigate there. This will generate code to load the data into your current R session. You can copy this code and paste it into your script for future use.
Using the following steps I can generate code to load the milk data
The code that gets generated looks like this, which reflects the particular location of the .csv file on my hard drive
#Note: code is particular to my hard drive
dat <- read.csv("~/1_R/git/mammalsmilk/data/skibiel_mammalsmilk_raw.csv")
Based on the file name, RStudio gives R object the name “skibiel_mammalsmilk_raw”, which I change to just “dat” to make it easier to type.
This approach will always work, but has one major hangup: if I make any changes to the folder structure of my project, such as where it is on my hard drive, then the read.csv() code will break. In the next section, I show a more flexible technique.
For my work I usually
One important caveat: for packages, data in .csv format is usually hidden somewhat in a sub-folder called “extdata” (“external data”) within another folder called “inst” (“installed” non-R files). The full path I will be using below is there for not “mammalsmilk/data” but rather “mammalsmilk/inst/extdata”.
here() does 2 things
The here package is new-ish and there are unfortunately some other packages that have here() functions, so its always necessary to use here::here().
If I call here::here() it tells me exactly where my project is
here::here()
#> [1] "C:/Users/lisanjie/Documents/1_R/git/mammalsmilk"
Note that this is similar to the getwd() command, but there are very important differences between how these functions work.
To load data, I usually create an object with my file name of interest
file. <- "Skibiel_mammalsmilk_raw.csv"
I then use here() to build the full file path for the data file. “data” is the folder where the .csv file is.
full.path <- here::here("inst/extdata", #folder
file.) #file name
Note that here() take the file path, adds the “/inst/extdata” folder extension and finally the “/Skibiel_mammalsmilk_raw.csv” file.
full.path
#> [1] "C:/Users/lisanjie/Documents/1_R/git/mammalsmilk/inst/extdata/Skibiel_mammalsmilk_raw.csv"
I can then pass this R object with the text of the file path to read.csv()
milk_raw <- read.csv(file = full.path)
Look at the size of dataframe just loaded
dim(milk_raw)
#> [1] 130 22
and take a look at the raw numbers and summary
#> order family spp mass.female
#> 1 Artiodactyla Bovidae Bos frontalis 800000
#> 2 Artiodactyla Bovidae Capra ibex 53000
#> 3 Artiodactyla Bovidae Connocheates taurinus taurinus 170500
#> 4 Artiodactyla Bovidae Connocheates gnou 200000
#> 5 Artiodactyla Bovidae Damaliscus pygargus phillipsi 61000
#> 6 Artiodactyla Bovidae Gazella dorcas 20600
#> gestation.month lacatation.months mass.litter repro.output
#> 1 9.02 4.5 26949 0.03
#> 2 5.60 7.5 3489 0.07
#> 3 8.32 8.0 17717 0.10
#> 4 8.50 7.5 11110 0.06
#> 5 8.00 4.0 6500 0.11
#> 6 4.74 2.8 1771 0.09
#> dev.stage.at.birth diet arid biome N lactation.stage.orig
#> 1 3 herbivore no terrestrial 4+ <NA>
#> 2 3 herbivore no terrestrial 24 30-60
#> 3 3 herbivore yes terrestrial 5 150
#> 4 3 herbivore yes terrestrial 3 150
#> 5 3 herbivore yes terrestrial 4 150
#> 6 3 herbivore yes terrestrial 16 30-60
#> dry.matter fat protein sugar energy ref
#> 1 20.0 7.0 6.3 5.2 1.21 Oftedal & Iverson (1995)
#> 2 23.3 12.4 5.7 <NA> <NA> Oftedal & Iverson (1995)
#> 3 13.4 7.5 4.1 5.3 1.13 Osthoff, Hugo & de Wit (2009a)
#> 4 12.0 5.5 4.3 4.1 0.91 Osthoff, Hugo & de Wit (2009a)
#> 5 16.0 8.6 5.6 4.9 1.31 Osthoff, Hugo & de Wit (2009a)
#> 6 24.1 8.8 8.8 <NA> <NA> Oftedal & Iverson (1995)
#> gest.month.NUM lacat.mo.NUM
#> 1 9.02 4.5
#> 2 5.60 7.5
#> 3 8.32 8.0
#> 4 8.50 7.5
#> 5 8.00 4.0
#> 6 4.74 2.8
#> order family spp mass.female
#> 125 Rodentia Muridae Pseudomys australis 65
#> 126 Rodentia Muridae Rattus norvegicus 253
#> 127 Rodentia Octodontidae Octodon degus 235
#> 128 Rodentia Scuiridae Tamias amoenus 53
#> 129 Rodentia Scuiridae Urocitellus columbianus 406
#> 130 Soricomorpha11 Soricidae Crocidura russula 14
#> gestation.month lacatation.months mass.litter repro.output
#> 125 1.02 0.9 13 0.20
#> 126 0.71 0.8 51 0.20
#> 127 2.96 1.2 74 0.31
#> 128 0.98 1.5 14 0.26
#> 129 0.84 1.0 32 0.08
#> 130 0.97 0.8 4 0.29
#> dev.stage.at.birth diet arid biome N
#> 125 0 herbivore yes terrestrial 7-Jun
#> 126 0 omnivore no terrestrial 18-Mar
#> 127 3 herbivore yes terrestrial 7
#> 128 0 omnivore no terrestrial 11
#> 129 0 herbivore no terrestrial 26
#> 130 0 carnivore no terrestrial 3
#> lactation.stage.orig dry.matter fat protein sugar energy
#> 125 12-Jul 25.4 12.1 6.4 3.6 1.62
#> 126 17-Aug 22.1 8.8 8.1* 3.8 1.43
#> 127 15-21 30.5 20.1 4.4 2.7 2.2
#> 128 15-20 36.7 21.7 8.1 4.3 2.62
#> 129 19 29.9 9.2 10.7 3.4 1.6
#> 130 12-Aug 51.0 30.0 9.4 3 3.4
#> ref gest.month.NUM lacat.mo.NUM
#> 125 Oftedal & Iverson (1995) 1.02 0.9
#> 126 Oftedal & Iverson (1995) 0.71 0.8
#> 127 Veloso & Kenagy (2005) 2.96 1.2
#> 128 Veloso, Place & Kenagy (2003) 0.98 1.5
#> 129 Skibiel & Hood (in press) 0.84 1.0
#> 130 Oftedal & Iverson (1995) 0.97 0.8
#> order family spp
#> Artiodactyla :23 Bovidae :13 Acomys cahirinus : 1
#> Carnivora :23 Cercopithecidae: 8 Alces alces : 1
#> Primates :22 Cervidae : 7 Aloutta palliata : 1
#> Rodentia :17 Muridae : 7 Aloutta seniculus : 1
#> Chiroptera :10 Otariidae : 7 Arctocephalus australis: 1
#> Diprotodontia:10 Phocidae : 7 Arctocephalus gazella : 1
#> (Other) :25 (Other) :81 (Other) :124
#> mass.female gestation.month lacatation.months
#> Min. : 8 Min. : 0.400 Min. : 0.300
#> 1st Qu.: 857 1st Qu.: 1.405 1st Qu.: 1.625
#> Median : 5716 Median : 5.000 Median : 4.500
#> Mean : 2229475 Mean : 5.624 Mean : 6.092
#> 3rd Qu.: 107500 3rd Qu.: 8.365 3rd Qu.: 8.225
#> Max. :170000000 Max. :21.460 Max. :42.000
#>
#> mass.litter repro.output dev.stage.at.birth diet
#> Min. : 0.3 Min. :0.00003 Min. :0.000 carnivore:32
#> 1st Qu.: 42.0 1st Qu.:0.04000 1st Qu.:1.000 herbivore:61
#> Median : 423.5 Median :0.08000 Median :2.000 omnivore :37
#> Mean : 52563.8 Mean :0.10374 Mean :1.831
#> 3rd Qu.: 7038.2 3rd Qu.:0.13750 3rd Qu.:3.000
#> Max. :2272500.0 Max. :0.50000 Max. :4.000
#>
#> arid biome N lactation.stage.orig
#> no :91 aquatic : 22 4 :13 14-Aug : 3
#> yes:39 terrestrial:108 3 :11 150 : 3
#> 6 :10 21-63 : 3
#> 7 : 9 30-60 : 3
#> 5 : 8 i : 3
#> 24 : 5 (Other):113
#> (Other):74 NA's : 2
#> dry.matter fat protein sugar energy
#> Min. : 8.80 Min. : 0.200 1.6* : 4 3 : 5 1.44 : 4
#> 1st Qu.:16.27 1st Qu.: 4.575 1.5 : 3 4.5 : 5 0.81 : 3
#> Median :22.75 Median : 8.550 10.3 : 3 5.3 : 5 1.13 : 3
#> Mean :27.06 Mean :14.068 10.7 : 3 5.2 : 4 0.49 : 2
#> 3rd Qu.:32.05 3rd Qu.:17.575 6.3 : 3 6.6 : 4 0.5 : 2
#> Max. :71.10 Max. :61.100 7 : 3 (Other):92 (Other):101
#> NA's :6 NA's :2 (Other):111 NA's :15 NA's : 15
#> ref gest.month.NUM
#> Oftedal & Iverson (1995) :99 Min. : 0.400
#> Hood et al. (2001) : 4 1st Qu.: 1.405
#> Osthoff, Hugo & de Wit (2009a) : 3 Median : 5.000
#> Derrickson, Jerrard & Oftedal (1996): 2 Mean : 5.624
#> Arnould & Boyd (1995) : 1 3rd Qu.: 8.365
#> Arnould & Hindell (1999) : 1 Max. :21.460
#> (Other) :20
#> lacat.mo.NUM
#> Min. : 0.300
#> 1st Qu.: 1.625
#> Median : 4.500
#> Mean : 6.092
#> 3rd Qu.: 8.225
#> Max. :42.000
#>