R for Ecological Data Science: A Gentle Introduction

16.4 Summary statistics

This section is review. If you are familar with R you can skip ahead

R is a giant calculator. There are commands for mean, median, standard deviation etc. The summary() command creates a handy summary, including the mean and median, of all columns in a dataframe.

16.4.1 Overall summary

Whole dataframe

summary(my.frogs)

We can look at just a single column by specifying it using the syntax “dataframe$column.names" where the dataframe and column are sperated by a dollar sig ($). (note that it prints it out horizontally, not vertically)

summary(my.frogs$mass)

We used the make_my_data2L() command to make a unique subset of the data. compare the mass values in your subset to the original data

summary(my.frogs$mass)
summary(frogarms$mass)

End review section

16.4.2 Optional: stacking things with rbind()

This section is optional

Handy trick: stack up summaries with rbind(), which stand for “row-bind”.

rbind(summary(my.frogs$mass),  #note the comma
      summary(frogarms$mass))

You can even flip them on their side like this

First, make an object with your summaries

my.summaries <- rbind(summary(my.frogs$mass),
                      summary(frogarms$mass))

Flip them with the t() command (“t” stands for “transpose”)

t(my.summaries)

End optional section

16.4.3 Individual summary stats

This section is review. If you are familar with R you can skip ahead

You can get individual summary statistics using various commands named after the statistic.

The mean of a column with mean().

mean(my.frogs$mass)

The variance with var().

var(my.frogs$mass)

Other include:

median
min, max, range
var, sd
nrow or length() (for sample size)

Note that range() returns 2 values in a vector

range(my.frogs$mass)

16.4.4 The standard error (SE) in R

Note that R doesn’t return a very common statistic, the standard error (SE). The SE is the standard deviation (SD) divided by the square root of the sample size. You can get the same size using the length() command.

You can therefore calculate the SE like this:

sd(my.frogs$mass)/sqrt(length(my.frogs$mass))

16.4.5 OPTIONAL: Find a package the calcualtes the SE [O]

This section is optional

In the following 2 optional sections you can

try to find a package with an SE function
try to write a function that calculates the SE for you

Since R lacks a an SE function many packages include it. For example, the plotrix package has a function std.error(). See if you can download the package, install it using library(), and use std.error(). See the help file for more info (?std.error).

Try to look at the underlying code either in the console or by running the debugger using debugonce().

16.4.6 OPTIONAL: Write your own SD function [O]

Write a function for calculating the SD

Here’s a function that takes a single argument “dat_column”

#NOTE: this is optional
my_sd1 <- function(dat_column){
  sd(dat_column)/sqrt(length(dat_column))
}

To use it, you need to give it the dataframe and the column separated by a “$”"

my_sd1(my.frogs$mass)

Here’s a function that takes 2 arguments: the dataframe, and the name of the column Note that the name of the column needs to be in quotes

my_sd2 <- function(dat, column){
  sd(dat[,column])/sqrt(length(dat[,column]))
}

You can use the function like this:

my_sd2(my.frogs, "mass") #note the use of quotes "..."

Here’s a fancier function that let’s you specify how much to round off the results. I’ve set the default rounding to 3 digits.

my_sd3 <- function(dat, column, digits.round = 3){
  se <- sd(dat[,column])/sqrt(length(dat[,column]))
  round(se, digits = digits.round)
}

The function runs like this.

my_sd3(my.frogs, "mass")

Note that in all of functions as long as I give the function the arguments in the same order they are set up in the code that defines the function, I don’t need to provide the agruement names. This save typing. Compare these results

my_sd3(my.frogs, "mass")
my_sd3(dat = my.frogs, column =  "mass")
my_sd3(column =  "mass", dat = my.frogs)

Now try this

my_sd3("mass", my.frogs)

Can you figure out what has happend with the last one?

End optional section