15.2 Scatterplots: 2 Continuous Variables
In this lab we’ll explore how to make scatterplots using the qplot() function in ggplot2.
15.2.1 R Preliminaries
- We’ll use the qplot() function in the ggplot package
- The cowplot package provides nice deafults for ggplot IMHO
15.2.2 Scatterplot of Iris data
- Let’s make a scatter plot, where we plot two continous, numeric variables against each other
- that is, both x and y variables are numbers; not categories
I’ve forgotten the names of all the iris variables, so I’ll use the names() command to see what they are
names(iris)
I’ll plot the sepals against the petals
qplot(y = Sepal.Length,
x = Petal.Length,
data = iris)
15.2.3 Scatter plot of mammal brain data
Let’s look at another dataset
15.2.3.1 Preliminaries
Get the data from the ggplot2 package
data(msleep)
15.2.3.2 Look at the data
dim(msleep) #How much data is there?
head(msleep) #What does the data look like
summary(msleep) #Summary of the data
There are a number of “categorical” varibles in this dataset
- genus
- vore = carnivore, omnivore et
- order = taxonomic order
- conservation = conservation status (endangered, etc)
For some reason they don’t load as “factor” variables (better known as categorical or grouping variables, but called “Factors” in R-land)
We can make these factors using the factor() command
msleep$vore <- factor(msleep$vore)
Now see what happens when you call summary()
summary(msleep)
Do the same for “order”"
msleep$order <- factor(msleep$order)
summary(msleep)
And “conservation”
msleep$conservation <- factor(msleep$conservation)
summary(msleep)
15.2.4 Make a basic scatterplot
qplot(y = sleep_total,
x = brainwt,
data = msleep)
That looks really really ugly. It will work better if we “log transform the axes”
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep)
Things get logged all the time in stats. We’ll talk more about that later.
15.2.5 Add color coding to scatterplot
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep,
color = vore)
15.2.6 Add color & shape coding to scatterplot
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep,
color = vore,
shape = vore)
15.2.7 Put diffetrent “vores” in seperate panels
- Seperate panels can be made using the “facet” arguement withing qplot
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep,
color = vore,
shape = vore,
facets = vore ~ .)
15.2.8 Add a “trend line”" to a scatterplot
- Add the geom_smooth() function after the initial qplot() command
- This works best if we remove the “color = vore” command, but you can see what happens if you leave it
qplot(y = log(sleep_rem),
x = log(brainwt),
data = msleep) +
geom_smooth()
15.2.9 Challenge: Modify mammal brain code
Modify the mamal bran code to do the following things
- Change the axes labels (eg “+ ylab(‘y axis’)”)
- Add a title (eg " + ggtitle(‘…’)“)
- Use names(msleep) to see what other varibles are in the dataset
- Use summary(msleep) to whether they are continous or categorical
- Pick another continous variable and plot it instead of sleep_total
- Try this with and without logging using the log() command
In this section we will tackle a typical data analysis problem: determining if two groups, such as organisms or drug treatments, an be considered statistically different from each other. We will use data from a paper titled “Sperm competition and the evolution of precopulatory weapons: Increasing male density promotes sperm competition and reduces selection on arm strength in a chorusing frog” by Buzatto et al (2015).
The end goal is to compare the size of the arms on female and male frogs. First, though, we will get to know the data by calculating summary statistics and making exploratory graphs. We will then carry out a t-test and grapple with with the meaning and interpretation of the output Finally, we’ll explore how best to plot the output of a t-test.
Section outline:
- Data exploration with summary statistics
- Graphical data exploration with boxplots
- Plotting means and measures of variation and precision
- T-tests
- Plotting the output of a t-test