16.2 Preliminaries

First, we need to install the necessary packages. The data are in a package stored on GitHub called wildlifeR. The devtools package is needed for downloading from Github. We’ll also use dplyr for grouping data and calculating summary statistics.

16.2.1 Load packages

You might have to install or re-install wildlifeR using install.packages(). If you have done this recently you can skip this step.

library(devtools)

install_github("brouwern/wildlifeR") #Note that text is quoted " "

Recall that downloading a package and actually loading it into R’s active memory are different things. To actually use the package you need to use the library() command to load it into memory.

16.2.2 Load data

The data we’ll be using is in a dataset called “frogarms” in the wildlifeR package.

data(frogarms) #[_]

You can find out information about these data using the ? command. Note that there are no parentheses required for this ( “?(frogarms)” is wrong)

16.2.3 Subset your data

In this lesson we’ll be primariy working with a personalized subset of the data. This will allow us to

  1. See the effects of sample size by comparing the larger frogarms data to your subset
  2. See the effects of random variation on things like p-values

The worksheet that accompanies this chapter is meant to facilitate these comparisons between the full (frogarms) and sub datasets, and also between you can classmates. The code that follows is focused on working with the subset we will generate below, but the same commands should also be run on the full frogarms dataset.

The function make_my_data2L() in the wildlfieR will extract out a random subset of the data. Change “my.code” to your school email address, minus the “@ pitt.edu” or whatever your affiliation is.

my.frogs <- frogarms # [_]
my.frogs <- make_my_data2L(dat = frogarms,
                           my.code = "nlb24", # <=  change this!
                           cat.var = "sex",
                           n.sample = 20,
                           with.rep = FALSE)

n.sample is set to 20. This is set up to extract 20 unique individuals of each sex (20 male, 20 female). Check that you dataframe is 2*20 = 40 rows using the dim() command.

dim(my.frogs) # [_]

16.2.4 An aside on R functions (Optional) [O]

The following sections are optional. The 2sst task is easy to wrap your brain around (looking at the code behind a function); the 2nd is more advanced (debugging a function).

16.2.4.1 OPTIONAL (easy): Looking at the code behind a function

** This section is optional, but easy for beginners**

Functions in R are typically written using R code. To see the underlying code you can type the function name in the console but nothing else, then run the command

make_my_data2L

The formatting of the output in the console might look a bit goofy; you can adjusted the conlse size so it looks better. This code is fairly long but is mostly fairly basic R commands and is a single function. In contrast, many R functions call other functions within them..

You can’t always see the underlying code for a function though. Try to look at the t.test function

t.test

16.2.5 OPTIONAL (intermediate): Function defaults

This section is optional

“Statistics is the science of defaults” (6 November 2012, https://andrewgelman.com)

See what happens when you run this code; note that there is no “my.code = …” bit in it (just to simplify things).

make_my_data2L(dat = frogarms, cat.var = "sex")

Why does this work? Check out the helpfile using ?make_my_data2L, or look at the raw code as we just did. Notice that right after the function name is listed the following things

  • dat
  • my.code = “nlb24”
  • cat.var
  • n.sample = 20
  • with.rep = FALSE

These are the arguements that make_my_data2L() takes. When an arugment name is followed by an “= …” that means a default has been set. If you call the function and don’t specify what you want for a specific arguement, R checks for pre-specified defaults and uses those as needed. Note that the use of defaults can be problematic, since you might use a deafult you didn’t intend to. Most defaults are sensible, and essential things like dataframes that need to be supplied rarely have defaults.

16.2.6 OPTIONAL (Advanced): Debugging a function

This section is optional and not relevant for beginners

When a function is not working, or you want to understand how it works, you can debug it. First, tell R that the next time you run the function you want to debug it

debugonce(make_my_data2L)

Then run the function

make_my_data2L()

This will create a new tab in debugger mode. Every time you press “enter” you will step through the code to the next full line of R code (note that a line of functional R code can span more than one line of code in a file or when rendered on a screen). When you get to the end of the function you will exit the debugger and go back to normal R mode.

If you want to interact with the function while its debugging you can type directly in the console. For example, trying running the ls() command after every few lines of code to see what happens. You can call dim(), summary(), is() etc on anything you find.

END OPTIONAL SECTION