16.4 Summary statistics
This section is review. If you are familar with R you can skip ahead
R is a giant calculator. There are commands for mean, median, standard deviation etc. The summary() command creates a handy summary, including the mean and median, of all columns in a dataframe.
16.4.1 Overall summary
Whole dataframe
summary(my.frogs)
We can look at just a single column by specifying it using the syntax “dataframe\(column.names" where the dataframe and column are sperated by a dollar sig (\)). (note that it prints it out horizontally, not vertically)
summary(my.frogs$mass)
We used the make_my_data2L() command to make a unique subset of the data. compare the mass values in your subset to the original data
summary(my.frogs$mass)
summary(frogarms$mass)
End review section
16.4.2 Optional: stacking things with rbind()
This section is optional
Handy trick: stack up summaries with rbind(), which stand for “row-bind”.
rbind(summary(my.frogs$mass), #note the comma
summary(frogarms$mass))
You can even flip them on their side like this
First, make an object with your summaries
my.summaries <- rbind(summary(my.frogs$mass),
summary(frogarms$mass))
Flip them with the t() command (“t” stands for “transpose”)
t(my.summaries)
End optional section
16.4.3 Individual summary stats
This section is review. If you are familar with R you can skip ahead
You can get individual summary statistics using various commands named after the statistic.
The mean of a column with mean().
mean(my.frogs$mass)
The variance with var().
var(my.frogs$mass)
Other include:
- median
- min, max, range
- var, sd
- nrow or length() (for sample size)
Note that range() returns 2 values in a vector
range(my.frogs$mass)
16.4.4 The standard error (SE) in R
Note that R doesn’t return a very common statistic, the standard error (SE). The SE is the standard deviation (SD) divided by the square root of the sample size. You can get the same size using the length() command.
You can therefore calculate the SE like this:
sd(my.frogs$mass)/sqrt(length(my.frogs$mass))
16.4.5 OPTIONAL: Find a package the calcualtes the SE [O]
This section is optional
In the following 2 optional sections you can
- try to find a package with an SE function
- try to write a function that calculates the SE for you
Since R lacks a an SE function many packages include it. For example, the plotrix package has a function std.error(). See if you can download the package, install it using library(), and use std.error(). See the help file for more info (?std.error).
Try to look at the underlying code either in the console or by running the debugger using debugonce().
16.4.6 OPTIONAL: Write your own SD function [O]
Write a function for calculating the SD
Here’s a function that takes a single argument “dat_column”
#NOTE: this is optional
my_sd1 <- function(dat_column){
sd(dat_column)/sqrt(length(dat_column))
}
To use it, you need to give it the dataframe and the column separated by a “$”"
my_sd1(my.frogs$mass)
Here’s a function that takes 2 arguments: the dataframe, and the name of the column Note that the name of the column needs to be in quotes
my_sd2 <- function(dat, column){
sd(dat[,column])/sqrt(length(dat[,column]))
}
You can use the function like this:
my_sd2(my.frogs, "mass") #note the use of quotes "..."
Here’s a fancier function that let’s you specify how much to round off the results. I’ve set the default rounding to 3 digits.
my_sd3 <- function(dat, column, digits.round = 3){
se <- sd(dat[,column])/sqrt(length(dat[,column]))
round(se, digits = digits.round)
}
The function runs like this.
my_sd3(my.frogs, "mass")
Note that in all of functions as long as I give the function the arguments in the same order they are set up in the code that defines the function, I don’t need to provide the agruement names. This save typing. Compare these results
my_sd3(my.frogs, "mass")
my_sd3(dat = my.frogs, column = "mass")
my_sd3(column = "mass", dat = my.frogs)
Now try this
my_sd3("mass", my.frogs)
Can you figure out what has happend with the last one?
End optional section