15.2 Scatterplots: 2 Continuous Variables
In this lab we’ll explore how to make scatterplots using the qplot() function in ggplot2.
15.2.1 R Preliminaries
- We’ll use the qplot() function in the ggplot package
- The cowplot package provides nice deafults for ggplot IMHO
15.2.2 Scatterplot of Iris data
- Let’s make a scatter plot, where we plot two continous, numeric variables against each other
- that is, both x and y variables are numbers; not categories
I’ve forgotten the names of all the iris variables, so I’ll use the names() command to see what they are
names(iris)
I’ll plot the sepals against the petals
qplot(y = Sepal.Length,
x = Petal.Length,
data = iris)
data:image/s3,"s3://crabby-images/8114f/8114fe9e5f890b8895c958aaf955211b16cc4527" alt="Sepals vs. Petals"
Figure 15.1: Sepals vs. Petals
15.2.3 Scatter plot of mammal brain data
Let’s look at another dataset
15.2.3.1 Preliminaries
Get the data from the ggplot2 package
data(msleep)
15.2.3.2 Look at the data
dim(msleep) #How much data is there?
head(msleep) #What does the data look like
summary(msleep) #Summary of the data
There are a number of “categorical” varibles in this dataset
- genus
- vore = carnivore, omnivore et
- order = taxonomic order
- conservation = conservation status (endangered, etc)
For some reason they don’t load as “factor” variables (better known as categorical or grouping variables, but called “Factors” in R-land)
We can make these factors using the factor() command
msleep$vore <- factor(msleep$vore)
Now see what happens when you call summary()
summary(msleep)
Do the same for “order”"
msleep$order <- factor(msleep$order)
summary(msleep)
And “conservation”
msleep$conservation <- factor(msleep$conservation)
summary(msleep)
15.2.4 Make a basic scatterplot
qplot(y = sleep_total,
x = brainwt,
data = msleep)
data:image/s3,"s3://crabby-images/67562/67562b128fede3bde2e03d8536bb80687caa396a" alt="Mammal sleep, raw data"
Figure 15.2: Mammal sleep, raw data
That looks really really ugly. It will work better if we “log transform the axes”
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep)
data:image/s3,"s3://crabby-images/529c8/529c8bde8961291310134c700b85b7a99bbe4c4f" alt="Mammal sleep, logged data"
Figure 15.3: Mammal sleep, logged data
Things get logged all the time in stats. We’ll talk more about that later.
15.2.5 Add color coding to scatterplot
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep,
color = vore)
data:image/s3,"s3://crabby-images/42083/4208301116f4ebac7215cd8229a1b579bba1f3f0" alt="Add colors with color ="
Figure 15.4: Add colors with color =
15.2.6 Add color & shape coding to scatterplot
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep,
color = vore,
shape = vore)
data:image/s3,"s3://crabby-images/9ce76/9ce762be137562951e23ffebf455ea9cd4b525a4" alt="Add shapes with shape ="
Figure 15.5: Add shapes with shape =
15.2.7 Put diffetrent “vores” in seperate panels
- Seperate panels can be made using the “facet” arguement withing qplot
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep,
color = vore,
shape = vore,
facets = vore ~ .)
data:image/s3,"s3://crabby-images/d5d24/d5d24ae2fd7e1e24685ded0e91a4d68233d3a4f4" alt="Split into different panels w/ facets ="
Figure 15.6: Split into different panels w/ facets =
15.2.8 Add a “trend line”" to a scatterplot
- Add the geom_smooth() function after the initial qplot() command
- This works best if we remove the “color = vore” command, but you can see what happens if you leave it
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep) +
geom_smooth()
data:image/s3,"s3://crabby-images/b6876/b68764ec9dc29696d7648310c89f74bceb8a92aa" alt="Add trendline with + geom_smooth()"
(#fig:last.chnk.sxn3.ch3)Add trendline with + geom_smooth()
15.2.9 Challenge: Modify mammal brain code
Modify the mamal bran code to do the following things
- Change the axes labels (eg “+ ylab(‘y axis’)”)
- Add a title (eg " + ggtitle(‘…’)“)
- Use names(msleep) to see what other varibles are in the dataset
- Use summary(msleep) to whether they are continous or categorical
- Pick another continous variable and plot it instead of sleep_total
- Try this with and without logging using the log() command
In this section we will tackle a typical data analysis problem: determining if two groups, such as organisms or drug treatments, an be considered statistically different from each other. We will use data from a paper titled “Sperm competition and the evolution of precopulatory weapons: Increasing male density promotes sperm competition and reduces selection on arm strength in a chorusing frog” by Buzatto et al (2015).
The end goal is to compare the size of the arms on female and male frogs. First, though, we will get to know the data by calculating summary statistics and making exploratory graphs. We will then carry out a t-test and grapple with with the meaning and interpretation of the output Finally, we’ll explore how best to plot the output of a t-test.
Section outline:
- Data exploration with summary statistics
- Graphical data exploration with boxplots
- Plotting means and measures of variation and precision
- T-tests
- Plotting the output of a t-test