4.3 Organizing scripts

“You mostly collaborate with yourself, and me-from-two-months-ago never responds to email.” (Karen Cranston, paraphrasing Mark Holder; quoted by Megan Duffy on dynamicecology)

Script files perform a record of your work so you can

  • remember what you did
  • re-run it to check things
  • re-use your code for new analyses
  • track down errors (which will happen!)
  • share with collaborators

Script files are not unique to R, but the R community seems to have built up a particularly good infrasture for their implemenation and ethos encouraging their use. Megan Duffey at Dynamic Ecology has an excellent post on this.

When you first start out learning R most of your scripts will be disposable. Quickly you’ll want to start keeping track of the code you write in class to refer back to. When you start doing analyses you’ll want to write comments as you go, and provide details at the top of your file so you can quickly get up to speed when you come back to the file.

4.3.1 What to include in a script

A good R script should be a self-sufficient document that your future self can easily make sense of, or better yet, someone starting from scratch can understand. Depending on the exact purpose, things to include might be

  • A general title, such as “R Script: data exploration &t-test for analaysis of frog arm girth”
  • Who wrote it and their contact info
  • When the script was created
  • when it was most recently accessed or created
  • What data it uses and where it comes from
  • What project or paper it relates to

A challenge when writing and maintaining R scripts is that you are often actively engaged in learnign R, learning stats, and learning about or exploring your data. So you write a lot of code then erase it, or scratch out code in a script and then move on. While I have written and saved many scripts that I have never re-opened, I have never re-opened a script and said “wow, I went WAY overboard annotating this thing!” ALso, commenting code makes it much easier to read; I often add fairly simple comments to make code easier navigate and to break things up into smaller chunks.

So, at a minimum I think every script should

  • Have some kind of header saying what it is and when it was made
  • Have one line of comments or annotations for every 3 to 5 lines of code.

4.3.2 Formatting sections in R scripts

To make scripts easier to navigate its useful to strong together the comment character “#” to make dividers and boxes. This is very good practice to make code more readable. It can be a bit tedious at times to do this; one advantage of rmarkdown, which we’ll introduce at the end of this chapter and go into further in the next, is that it makes it very easy to format section titles.

4.3.3 A sample R script

On the following pages are examples of R scripts for a formal analysis of a dataset. First I’ll show what the script might look like as I write it. Then I’ll show how I’ll fix it up once I know its something I am going to look back at in the fugure

### R Script: Analysis of frog arm girth

## Nathan Brouwer (brouwern@gmail.com)
## 6/6/2018
## update: 8/17/2018

## I am re-running analysis from paper by Buzatto  et al 2015
## I want to compare the results of a t-test with
## and w/o Welche's correction for unequal variation

## Packages
library(wildlifeR)

## Data set up
# load frogarm data from Buzatto et al 2015
data("frogarms")


## Data visualization
# histogram of all data
hist(frogarms$sv.length)
## Data analysis
# unpaired t-test NOT using Welch's correction
## NOTE: assumes variation within each group EQUAL
t.test(sv.length ~ sex,   # model formula
       var.equal = TRUE,  # set variances to equal 
         data = frogarms) # data

# unpaired t-test >>using<< Welch's correctin
t.test(sv.length ~ sex, 
       var.equal = FALSE, # set variances to be unequal 
         data = frogarms)

4.3.4 A polished R script

###########################################
###
### R Script: Analysis of frog arm girth
###
###########################################

## Author:      Nathan Brouwer (brouwern@gmail.com)
## Creation:    6/6/2018
## Last update: 8/17/2018


###############
## Introduction
###############

# This script is an analysis of frog body size and arm girth

## I am re-running analysis from paper by Buzatto  et al 2015
## I want to compare the results of a t-test with
## and w/o Welche's correction for unequal variation
# Data were originally from 
# Buzatto  et al 2015. Sperm competition and the evolution of 
#       precopulatory weapons: Increasing male density promotes
#       sperm competition and reduces selection on arm strength in
#       a chorusing frog. Evolution 69: 2613-2624. 
#       https://doi.org/10.1111/evo.12766

# Data originally downloaded on  6/6/2018 from
#   https://figshare.com/articles/Data_Paper_Data_Paper/3554424

# Data are included in the wildlifeR package and load from it
#     https://github.com/brouwern/wildlifeR
###############
## Packages
###############

library(wildlifeR)

###############
## Data set up
###############

# load frogarm data from Buzatto et al 2015
data("frogarms")

######################
## Data visualization
######################

# histogram of all data
hist(frogarms$sv.length)
######################
## Data analysis
######################

# unpaired t-test NOT using Welch's correction
## NOTE: assumes variation within each group EQUAL
t.test(sv.length ~ sex,   # model formula
       var.equal = TRUE,  # set variances to equal 
         data = frogarms) # data

# unpaired t-test >>using<< Welch's correctin
t.test(sv.length ~ sex, 
       var.equal = FALSE, # set variances to be unequal 
         data = frogarms)

For more on organizing scripts see points 4 and 5 of “Eight things I do to make my open research more findable and understandable” at (DataColada](http://datacolada.org/69).