• Accessing mammalsmilk data
  • " />

    Introduction

    In this tutorial we’ll load the data from the mammalsmilk data. All the data files used in the package are contained internally within the package and can be loaded (as detailed below) using the standard data() function in R. However, getting raw into R is frequently an issue with R, and so I give instructions on how to find the raw .csv on your hard drive or the internet. I then outline two ways to load data into R. I first go over a basic way using the “Import Dataset” button in RStudio. I think cover a more advanced way using there here() function.

    Important functions used

    • devtools::install_github
    • read.csv

    Original Data

    Skibiel et al 2013. The evolution of the nutrient composition of mammalian milks. Journal of Animal Ecology 82: 1254-1264. https://doi.org/10.1111/1365-2656.12095

    Preliminaries

    Packages

    We’ll use the following packages. You’ll need to download them if you haven’t already. I’d try just loading them with library() first (in the next section), then installing if needed.

    Download packages

    Only run this code if you haven’t already download these packages. You can run it by removing the “#” in front of the code.

    #install.package("here")
    #install.package("RCurl")

    Load packages

    library(here)
    #> here() starts at C:/Users/lisanjie/Documents/1_R/git/mammalsmilk
    library(RCurl)
    #> Loading required package: bitops
    library(devtools)

    Loading the mammalsmilk package

    If you haven’t already, download the mammalsmilk package from GitHub (note the “S” between mammal and milk). This is commented out in the code below - only run the code if you have never downloaded it before, or haven’t recently (in case its been updated)

    # install_github("brouwern/mammalsmilk")

    You can then load the package with library()

    library(mammalsmilk)

    Loading data

    Where are the datasets stored?

    The datasets for this package can be found in several places.

    1. Internal within the package and loaded into R using the data() function.
    2. On GitHub
    3. Saved as .csv files with the package source code. R saves all your packages in a single, and the mammalsmilk directory will have the .csv files under mammalsmilk/inst/extdata

    Accessing data from the package

    Data within R or an R package is accessed using the data() command. TO get the classic iris data set I do this:

    data("iris")

    If mammalsmilk is downloaded and installed using library() I can get a dataset like this:

    data("milk_raw")

    This should always work. However, two things:

    1. This package is under development, so this might not always work
    2. Loading data is a perennially difficult task, so it can e good to locate the .csv files and practice loading by hand

    Loading directly from GitHub

    Data can be loaded directly from GitHub using the RCurl command.

    First, we need the URL for the file

    # Create an object for the URL where your data is stored.
    url <- "https://raw.githubusercontent.com/brouwern/mammalsmilk/master/inst/extdata/skibiel_mammalsmilk_raw.csv"

    Then we run getURL()

    myData <- getURL(url)

    Finally load it using read.csv

    milk_raw <- read.csv(textConnection(myData))

    Check that its there using dim()

    dim(milk_raw)
    #> [1] 130  22

    For more information on loading from GitHub see https://github.com/christophergandrud/Introduction_to_Statistics_and_Data_Analysis_Yonsei/wiki/Importing-Data,-Basic

    Load .csv files from your hard drive

    Loading data into R can be difficult when you first get started. Its good to practice this, so if you’ve had trouble in the past I recommend locating the .csv files associated with this package and trying to load them by hand, following the directions below.

    How data gets loaded into R is always somewhat particular to how you are running R and where you have your data saved. I will outline a basic way to load data first, then provide code for how I use a more advanced and flexible approach.

    Find the .csv files

    Once you have the package downloaded and installed using library() you should be able to find the files associated with it just using the search function on your computer. The main starting data file for the package is “skibiel_mammalsmilk_raw.csv”; my PC found it instantly. I then right clicked on it and select “Open File Location” to see the directory.

    This opens a directory “…/mammalsmilk/inst/extdata” that contains all the .csv files underlying the data files used in the package.

    To practice loading .csv files I recommend copying these files to a new directory where they can easily be found, eg, where you normally save all your R work.

    If you can’t find the data on your hard drive, you can also download it directly from GitHub:

    https://raw.githubusercontent.com/brouwern/mammalsmilk/master/inst/extdata/skibiel_mammalsmilk_raw.csv

    This will take you to a raw version of the data and you can right click and “Save as” it to wherever you want.

    Loading data for begining users

    The easiest way to load data is to figure out where you have saved the data, then use the “Import dataset” button in RStudio to navigate there. This will generate code to load the data into your current R session. You can copy this code and paste it into your script for future use.

    Using the following steps I can generate code to load the milk data

    1. Click on “Import Dataset”, which is on the “Environment” tab of the of the Environment / History / Connection panel.
    2. Select “From text (base)” (all the options work similarly)
    3. Navigating to the .csv
    4. Clicking through the pop up windows to finalize the import.

    The code that gets generated looks like this, which reflects the particular location of the .csv file on my hard drive

    #Note: code is particular to my hard drive
    dat <- read.csv("~/1_R/git/mammalsmilk/data/skibiel_mammalsmilk_raw.csv")

    Based on the file name, RStudio gives R object the name “skibiel_mammalsmilk_raw”, which I change to just “dat” to make it easier to type.

    This approach will always work, but has one major hangup: if I make any changes to the folder structure of my project, such as where it is on my hard drive, then the read.csv() code will break. In the next section, I show a more flexible technique.

    Loading data flexibly

    For my work I usually

    • Create a seperate folder for each analysis (eg “mammals milk”)
    • Use an RStudio Project for the analysis
    • Keep code in one sub-folder (“mammalsmilk/analyses”)
    • Keep data in a seperate sub-folder (“mammalsmilk/data”)
    • use the here() function from the here package to detect where my RStudio package is on my hard drive

    One important caveat: for packages, data in .csv format is usually hidden somewhat in a sub-folder called “extdata” (“external data”) within another folder called “inst” (“installed” non-R files). The full path I will be using below is there for not “mammalsmilk/data” but rather “mammalsmilk/inst/extdata”.

    here() does 2 things

    1. Figures out where the current working directory is on your hard drive
    2. Builds valid file names for locations you specify relative to that working directory

    The here package is new-ish and there are unfortunately some other packages that have here() functions, so its always necessary to use here::here().

    If I call here::here() it tells me exactly where my project is

    here::here() 
    #> [1] "C:/Users/lisanjie/Documents/1_R/git/mammalsmilk"

    Note that this is similar to the getwd() command, but there are very important differences between how these functions work.

    To load data, I usually create an object with my file name of interest

    file. <- "Skibiel_mammalsmilk_raw.csv"

    I then use here() to build the full file path for the data file. “data” is the folder where the .csv file is.

    full.path <- here::here("inst/extdata", #folder
                            file.)  #file name

    Note that here() take the file path, adds the “/inst/extdata” folder extension and finally the “/Skibiel_mammalsmilk_raw.csv” file.

    full.path
    #> [1] "C:/Users/lisanjie/Documents/1_R/git/mammalsmilk/inst/extdata/Skibiel_mammalsmilk_raw.csv"

    I can then pass this R object with the text of the file path to read.csv()

    milk_raw <- read.csv(file = full.path)

    Look at the size of dataframe just loaded

    dim(milk_raw)
    #> [1] 130  22

    and take a look at the raw numbers and summary

    #>          order  family                            spp mass.female
    #> 1 Artiodactyla Bovidae                  Bos frontalis      800000
    #> 2 Artiodactyla Bovidae                     Capra ibex       53000
    #> 3 Artiodactyla Bovidae Connocheates taurinus taurinus      170500
    #> 4 Artiodactyla Bovidae              Connocheates gnou      200000
    #> 5 Artiodactyla Bovidae  Damaliscus pygargus phillipsi       61000
    #> 6 Artiodactyla Bovidae                 Gazella dorcas       20600
    #>   gestation.month lacatation.months mass.litter repro.output
    #> 1            9.02               4.5       26949         0.03
    #> 2            5.60               7.5        3489         0.07
    #> 3            8.32               8.0       17717         0.10
    #> 4            8.50               7.5       11110         0.06
    #> 5            8.00               4.0        6500         0.11
    #> 6            4.74               2.8        1771         0.09
    #>   dev.stage.at.birth      diet arid       biome  N lactation.stage.orig
    #> 1                  3 herbivore   no terrestrial 4+                 <NA>
    #> 2                  3 herbivore   no terrestrial 24                30-60
    #> 3                  3 herbivore  yes terrestrial  5                  150
    #> 4                  3 herbivore  yes terrestrial  3                  150
    #> 5                  3 herbivore  yes terrestrial  4                  150
    #> 6                  3 herbivore  yes terrestrial 16                30-60
    #>   dry.matter  fat protein sugar energy                            ref
    #> 1       20.0  7.0     6.3   5.2   1.21       Oftedal & Iverson (1995)
    #> 2       23.3 12.4     5.7  <NA>   <NA>       Oftedal & Iverson (1995)
    #> 3       13.4  7.5     4.1   5.3   1.13 Osthoff, Hugo & de Wit (2009a)
    #> 4       12.0  5.5     4.3   4.1   0.91 Osthoff, Hugo & de Wit (2009a)
    #> 5       16.0  8.6     5.6   4.9   1.31 Osthoff, Hugo & de Wit (2009a)
    #> 6       24.1  8.8     8.8  <NA>   <NA>       Oftedal & Iverson (1995)
    #>   gest.month.NUM lacat.mo.NUM
    #> 1           9.02          4.5
    #> 2           5.60          7.5
    #> 3           8.32          8.0
    #> 4           8.50          7.5
    #> 5           8.00          4.0
    #> 6           4.74          2.8
    #>              order       family                     spp mass.female
    #> 125       Rodentia      Muridae     Pseudomys australis          65
    #> 126       Rodentia      Muridae       Rattus norvegicus         253
    #> 127       Rodentia Octodontidae           Octodon degus         235
    #> 128       Rodentia    Scuiridae          Tamias amoenus          53
    #> 129       Rodentia    Scuiridae Urocitellus columbianus         406
    #> 130 Soricomorpha11    Soricidae       Crocidura russula          14
    #>     gestation.month lacatation.months mass.litter repro.output
    #> 125            1.02               0.9          13         0.20
    #> 126            0.71               0.8          51         0.20
    #> 127            2.96               1.2          74         0.31
    #> 128            0.98               1.5          14         0.26
    #> 129            0.84               1.0          32         0.08
    #> 130            0.97               0.8           4         0.29
    #>     dev.stage.at.birth      diet arid       biome      N
    #> 125                  0 herbivore  yes terrestrial  7-Jun
    #> 126                  0  omnivore   no terrestrial 18-Mar
    #> 127                  3 herbivore  yes terrestrial      7
    #> 128                  0  omnivore   no terrestrial     11
    #> 129                  0 herbivore   no terrestrial     26
    #> 130                  0 carnivore   no terrestrial      3
    #>     lactation.stage.orig dry.matter  fat protein sugar energy
    #> 125               12-Jul       25.4 12.1     6.4   3.6   1.62
    #> 126               17-Aug       22.1  8.8    8.1*   3.8   1.43
    #> 127                15-21       30.5 20.1     4.4   2.7    2.2
    #> 128                15-20       36.7 21.7     8.1   4.3   2.62
    #> 129                   19       29.9  9.2    10.7   3.4    1.6
    #> 130               12-Aug       51.0 30.0     9.4     3    3.4
    #>                               ref gest.month.NUM lacat.mo.NUM
    #> 125      Oftedal & Iverson (1995)           1.02          0.9
    #> 126      Oftedal & Iverson (1995)           0.71          0.8
    #> 127        Veloso & Kenagy (2005)           2.96          1.2
    #> 128 Veloso, Place & Kenagy (2003)           0.98          1.5
    #> 129     Skibiel & Hood (in press)           0.84          1.0
    #> 130      Oftedal & Iverson (1995)           0.97          0.8
    #>            order                family                        spp     
    #>  Artiodactyla :23   Bovidae        :13   Acomys cahirinus       :  1  
    #>  Carnivora    :23   Cercopithecidae: 8   Alces alces            :  1  
    #>  Primates     :22   Cervidae       : 7   Aloutta palliata       :  1  
    #>  Rodentia     :17   Muridae        : 7   Aloutta seniculus      :  1  
    #>  Chiroptera   :10   Otariidae      : 7   Arctocephalus australis:  1  
    #>  Diprotodontia:10   Phocidae       : 7   Arctocephalus gazella  :  1  
    #>  (Other)      :25   (Other)        :81   (Other)                :124  
    #>   mass.female        gestation.month  lacatation.months
    #>  Min.   :        8   Min.   : 0.400   Min.   : 0.300   
    #>  1st Qu.:      857   1st Qu.: 1.405   1st Qu.: 1.625   
    #>  Median :     5716   Median : 5.000   Median : 4.500   
    #>  Mean   :  2229475   Mean   : 5.624   Mean   : 6.092   
    #>  3rd Qu.:   107500   3rd Qu.: 8.365   3rd Qu.: 8.225   
    #>  Max.   :170000000   Max.   :21.460   Max.   :42.000   
    #>                                                        
    #>   mass.litter         repro.output     dev.stage.at.birth        diet   
    #>  Min.   :      0.3   Min.   :0.00003   Min.   :0.000      carnivore:32  
    #>  1st Qu.:     42.0   1st Qu.:0.04000   1st Qu.:1.000      herbivore:61  
    #>  Median :    423.5   Median :0.08000   Median :2.000      omnivore :37  
    #>  Mean   :  52563.8   Mean   :0.10374   Mean   :1.831                    
    #>  3rd Qu.:   7038.2   3rd Qu.:0.13750   3rd Qu.:3.000                    
    #>  Max.   :2272500.0   Max.   :0.50000   Max.   :4.000                    
    #>                                                                         
    #>   arid            biome           N      lactation.stage.orig
    #>  no :91   aquatic    : 22   4      :13   14-Aug :  3         
    #>  yes:39   terrestrial:108   3      :11   150    :  3         
    #>                             6      :10   21-63  :  3         
    #>                             7      : 9   30-60  :  3         
    #>                             5      : 8   i      :  3         
    #>                             24     : 5   (Other):113         
    #>                             (Other):74   NA's   :  2         
    #>    dry.matter         fat            protein        sugar        energy   
    #>  Min.   : 8.80   Min.   : 0.200   1.6*   :  4   3      : 5   1.44   :  4  
    #>  1st Qu.:16.27   1st Qu.: 4.575   1.5    :  3   4.5    : 5   0.81   :  3  
    #>  Median :22.75   Median : 8.550   10.3   :  3   5.3    : 5   1.13   :  3  
    #>  Mean   :27.06   Mean   :14.068   10.7   :  3   5.2    : 4   0.49   :  2  
    #>  3rd Qu.:32.05   3rd Qu.:17.575   6.3    :  3   6.6    : 4   0.5    :  2  
    #>  Max.   :71.10   Max.   :61.100   7      :  3   (Other):92   (Other):101  
    #>  NA's   :6       NA's   :2        (Other):111   NA's   :15   NA's   : 15  
    #>                                    ref     gest.month.NUM  
    #>  Oftedal & Iverson (1995)            :99   Min.   : 0.400  
    #>  Hood et al. (2001)                  : 4   1st Qu.: 1.405  
    #>  Osthoff, Hugo & de Wit (2009a)      : 3   Median : 5.000  
    #>  Derrickson, Jerrard & Oftedal (1996): 2   Mean   : 5.624  
    #>  Arnould & Boyd (1995)               : 1   3rd Qu.: 8.365  
    #>  Arnould & Hindell (1999)            : 1   Max.   :21.460  
    #>  (Other)                             :20                   
    #>   lacat.mo.NUM   
    #>  Min.   : 0.300  
    #>  1st Qu.: 1.625  
    #>  Median : 4.500  
    #>  Mean   : 6.092  
    #>  3rd Qu.: 8.225  
    #>  Max.   :42.000  
    #>