Introduction to numeric data exploration

Numeric exploration is necessary for quality control and to understand the structure of your data. Some numeric summaries, such as correlation tables, also provide key insights into how to model the data (Zuur et al 2010).

It is increasingly being recommended to include numeric summaries such as this to facilitate meta-analysis and help interested readers understand the structure of your data (Gerstner et al 2017).

For you independent project (in 2018) you just need to submit a script file which carries our relevant numeric summaries.

Preliminaires

Loack packages

library(dplyr)  # for exploratory analyses
library(ggpubr) # plotting using ggplto2
library(cowplot)
library(lme4)
library(arm)
library(stringr) 
library(bbmle)
library(plotrix) ##std.error function for SE
library(psych)
library(here)

Load data

Note: original files is called “skibiel_mammalsmilk.csv” within the mammalsmilk package

Load data in R

file. <- "Appendix-2-Analysis-Data_mammalsmilkRA.csv"
path. <- here("/inst/extdata/",file.)

milk <- read.csv(path., skip = 3)

Check input

head(milk)
#>            ord     fam                            spp mass.fem gest.mo
#> 1 Artiodactyla Bovidae                  Bos frontalis   800000    9.02
#> 2 Artiodactyla Bovidae                     Capra ibex    53000    5.60
#> 3 Artiodactyla Bovidae Connocheates taurinus taurinus   170500    8.32
#> 4 Artiodactyla Bovidae              Connocheates gnou   200000    8.50
#> 5 Artiodactyla Bovidae  Damaliscus pygargus phillipsi    61000    8.00
#> 6 Artiodactyla Bovidae                 Gazella dorcas    20600    4.74
#>   lac.mo mass.litter repro.output dev.birth      diet arid       biome  N
#> 1    4.5       26949         0.03         3 herbivore   no terrestrial 4+
#> 2    7.5        3489         0.07         3 herbivore   no terrestrial 24
#> 3    8.0       17717         0.10         3 herbivore  yes terrestrial  5
#> 4    7.5       11110         0.06         3 herbivore  yes terrestrial  3
#> 5    4.0        6500         0.11         3 herbivore  yes terrestrial  4
#> 6    2.8        1771         0.09         3 herbivore  yes terrestrial 16
#>    fat gest.month lacat.mo prot sugar energy
#> 1  7.0       9.02      4.5  6.3   5.2   1.21
#> 2 12.4       5.60      7.5  5.7    NA     NA
#> 3  7.5       8.32      8.0  4.1   5.3   1.13
#> 4  5.5       8.50      7.5  4.3   4.1   0.91
#> 5  8.6       8.00      4.0  5.6   4.9   1.31
#> 6  8.8       4.74      2.8  8.8    NA     NA
tail(milk)
#>                ord          fam                     spp mass.fem gest.mo
#> 125       Rodentia      Muridae     Pseudomys australis       65    1.02
#> 126       Rodentia      Muridae       Rattus norvegicus      253    0.71
#> 127       Rodentia Octodontidae           Octodon degus      235    2.96
#> 128       Rodentia    Scuiridae          Tamias amoenus       53    0.98
#> 129       Rodentia    Scuiridae Urocitellus columbianus      406    0.84
#> 130 Soricomorpha11    Soricidae       Crocidura russula       14    0.97
#>     lac.mo mass.litter repro.output dev.birth      diet arid       biome
#> 125    0.9          13         0.20         0 herbivore  yes terrestrial
#> 126    0.8          51         0.20         0  omnivore   no terrestrial
#> 127    1.2          74         0.31         3 herbivore  yes terrestrial
#> 128    1.5          14         0.26         0  omnivore   no terrestrial
#> 129    1.0          32         0.08         0 herbivore   no terrestrial
#> 130    0.8           4         0.29         0 carnivore   no terrestrial
#>          N  fat gest.month lacat.mo prot sugar energy
#> 125  7-Jun 12.1       1.02      0.9  6.4   3.6   1.62
#> 126 18-Mar  8.8       0.71      0.8  8.1   3.8   1.43
#> 127      7 20.1       2.96      1.2  4.4   2.7   2.20
#> 128     11 21.7       0.98      1.5  8.1   4.3   2.62
#> 129     26  9.2       0.84      1.0 10.7   3.4   1.60
#> 130      3 30.0       0.97      0.8  9.4   3.0   3.40
summary(milk)
#>             ord                  fam                          spp     
#>  Artiodactyla :23   Bovidae        :13   Acomys cahirinus       :  1  
#>  Carnivora    :23   Cercopithecidae: 8   Alces alces            :  1  
#>  Primates     :22   Cervidae       : 7   Aloutta palliata       :  1  
#>  Rodentia     :17   Muridae        : 7   Aloutta seniculus      :  1  
#>  Chiroptera   :10   Otariidae      : 7   Arctocephalus australis:  1  
#>  Diprotodontia:10   Phocidae       : 7   Arctocephalus gazella  :  1  
#>  (Other)      :25   (Other)        :81   (Other)                :124  
#>     mass.fem            gest.mo           lac.mo        mass.litter       
#>  Min.   :        8   Min.   : 0.400   Min.   : 0.300   Min.   :      0.3  
#>  1st Qu.:      857   1st Qu.: 1.405   1st Qu.: 1.625   1st Qu.:     42.0  
#>  Median :     5716   Median : 5.000   Median : 4.500   Median :    423.5  
#>  Mean   :  2229475   Mean   : 5.624   Mean   : 6.092   Mean   :  52563.8  
#>  3rd Qu.:   107500   3rd Qu.: 8.365   3rd Qu.: 8.225   3rd Qu.:   7038.2  
#>  Max.   :170000000   Max.   :21.460   Max.   :42.000   Max.   :2272500.0  
#>                                                                           
#>   repro.output       dev.birth            diet     arid   
#>  Min.   :0.00003   Min.   :0.000   carnivore:32   no :91  
#>  1st Qu.:0.04000   1st Qu.:1.000   herbivore:61   yes:39  
#>  Median :0.08000   Median :2.000   omnivore :37           
#>  Mean   :0.10374   Mean   :1.831                          
#>  3rd Qu.:0.13750   3rd Qu.:3.000                          
#>  Max.   :0.50000   Max.   :4.000                          
#>                                                           
#>          biome           N           fat          gest.month    
#>  aquatic    : 22   4      :13   Min.   : 0.20   Min.   : 0.400  
#>  terrestrial:108   3      :11   1st Qu.: 4.65   1st Qu.: 1.405  
#>                    6      :10   Median : 8.55   Median : 5.000  
#>                    7      : 9   Mean   :13.99   Mean   : 5.624  
#>                    5      : 8   3rd Qu.:16.82   3rd Qu.: 8.365  
#>                    24     : 5   Max.   :61.10   Max.   :21.460  
#>                    (Other):74                                   
#>     lacat.mo           prot            sugar           energy     
#>  Min.   : 0.300   Min.   : 1.100   Min.   : 0.02   Min.   :0.360  
#>  1st Qu.: 1.625   1st Qu.: 4.125   1st Qu.: 3.00   1st Qu.:0.965  
#>  Median : 4.500   Median : 6.750   Median : 4.70   Median :1.365  
#>  Mean   : 6.092   Mean   : 6.673   Mean   : 4.94   Mean   :1.680  
#>  3rd Qu.: 8.225   3rd Qu.: 9.200   3rd Qu.: 6.60   3rd Qu.:2.045  
#>  Max.   :42.000   Max.   :15.800   Max.   :14.00   Max.   :5.890  
#>                                    NA's   :16      NA's   :16

Numeric data summaries

It can be very useful to generate numeric data summaries to help you and readers understand the data. This can also allow readers to extract information that is of interest to them but no necessarily to you. For example modelers and meta-analysts might want or need a bit of information which was not highlighted in your original analysis. Providing them the information upfront makes it more likely that they will cite you! It also saves them the trouble of asking for it if they really need it, and you the trouble of project files up and working up what they want.

This is not emphasized by Zuur et al, but is emphasized by Gerstner et al 2017.

Gerstner et al 2017. Will your paper be used in a meta-analysis? Making the reach of your research broader and longer lasting. Methods in Ecology & Evolution.

Correlation table

Correlation tables are excellent summaries of the response and predictor variables. This was discussed previously with regards to investigation of collinearity, along with scatter plot matrices.

Table of means & SDs

Mass by diet group

milk %>% 
  group_by(diet) %>%
  summarize(mass.mean = mean(mass.fem),
            mass.sd = sd(mass.fem),
            mass.n = n()
            )
#> # A tibble: 3 x 4
#>   diet      mass.mean   mass.sd mass.n
#>   <fct>         <dbl>     <dbl>  <int>
#> 1 carnivore  8646393. 32150168.     32
#> 2 herbivore   207403.   492666.     61
#> 3 omnivore     13396.    36423.     37

Milk fat by diet group

These data appear in Figure 2a of the original publication

milk %>% 
  group_by(diet) %>%
  summarize(fat.mean = mean(fat),
            fat.sd = sd(fat),
            fat.SE = plotrix::std.error(fat),
            fat.n = n())
#> # A tibble: 3 x 5
#>   diet      fat.mean fat.sd fat.SE fat.n
#>   <fct>        <dbl>  <dbl>  <dbl> <int>
#> 1 carnivore    32.3   16.8   2.96     32
#> 2 herbivore     8.06   5.19  0.664    61
#> 3 omnivore      7.90   6.45  1.06     37