Chapter 9 Plotting data in dataframes
library(Stat2Data)
data(SeaIce)
Most data in R are organized into dataframes. Similar to when we plot data in a spreadsheet, to plot data from a dataframe we need to tell R exactly what we want on the x-axis (horizontal) and y-axis (vertical).
⚠️ Note: For reasons we don’t have to get in to, the lynx
data were a special case where we didn’t have to define x and y⚠️
We can plot the Extent of artic sea ice again using the plot()
command, and using a cool convention in R called formula notation. Formula notation uses the (tilde)[https://en.wikipedia.org/wiki/Tilde] symbol ~
. In math, ~
can have several meanings. In R, it means “relates to” , “versus”, “depends on.” So we can plot the relation between Year
and seac Extent
as a y versus x relation as Extent ~ Year
. We also have to include the argument data = SeaIce
so R knows where to get Extent
and Year
.
plot(Extent ~ Year, data = SeaIce) # Note, both words capitalized.
9.1 Base-R graphics
When we use the plot
command were using Base R graphics. As noted before there are several ways to make plots in R and you should be able to spot which one is which when looking at code. We’ll cover some of the key features of Bare R graphics now. While different plotting methods have different commands and arguments, they all share a common feature: everything in a plot can be customized, and each element is customized with a command or arguement.
9.1.1 Type of points
plot()
can draw dots or lines We make it use lines using the type = "l"
argument (note that the l is in quotes)
plot(Extent ~ Year, type = "l", data = SeaIce) # Note: l in quotes
As noted before, R doesn’t mind if you split things on lines. To keep track of the things I”m doing to the plot I’ll format things like this
plot(Extent ~ Year, # relationship
type = "l", # type of plot; Note: l in quotes
data = SeaIce) # data
As we did with the lynx
data we can combine points and lines with type = "b"
. (Do you recall what “b” stands for?)
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
data = SeaIce) # data
We can adjust color with col = ...
. Recall the this is just a number, not in quotes.
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
data = SeaIce) # data
We can add a main title to with main = ...
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
main = "Arctic Sea Ice Extent", # main title, in quotes
data = SeaIce) # data
Its always good to include your units. Extent
and Area
are in square kilometers. We can say specifically what we want for the y-axis label using ylab = ...
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
main = "Arctic Sea Ice Extent", # main title, in quotes
ylab = "Extent (square kilometers)",
data = SeaIce) # data
We can change the appearance of the line using lty = ...
, which stands for “line type”:
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
main = "Arctic Sea Ice Extent", # main title, in quotes
ylab = "Extent (square kilometers)",
lty = 2, # line type; not quoted
data = SeaIce) # data
I’m not a fan of the open circles for plotting points; we can change those too using the argument pch =
.
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
main = "Arctic Sea Ice Extent", # main title, in quotes
ylab = "Extent (square kilometers)",
lty = 2, # line type; not quoted
pch = 16, # point type; not quoted
data = SeaIce) # data
9.1.2 Plotting two columns of data
Often we want to represent two distinct things on our graph. In spreadsheets these are called separate series of data. When making a time series plot like this one we can add a new column of data using a species command called points()
which works very similar to plot()
.
⚠️ Note: The points()
command only works if its it precede by a statement from the plot()
command ⚠️
# Main plot: Extent
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
main = "Arctic Sea Ice Extent", # main title, in quotes
ylab = "Extent (square kilometers)",
lty = 2, # line type; not quoted
pch = 16, # point type; not quoted
data = SeaIce) # data
# Second column of data: Area
points(Area ~ Year, data = SeaIce)
I can now customize the sea ice Area
part of the plot by add arguments to the points()
statement.
# Main plot: Extent
plot(Extent ~ Year, # relationship
type = "b", # type of plot; Note: b in quotes
col = 2, # color; no quotes
main = "Arctic Sea Ice Extent", # main title, in quotes
ylab = "Extent (square kilometers)",
lty = 2, # line type; not quoted
pch = 16, # point type; not quoted
data = SeaIce) # data
# Second column of data: Area
points(Area ~ Year,
type = "b",
col = 3,
pch = 17,
data = SeaIce)
❓ There’s a problem with my graph though - can you spot it? It only really becomes apparent when you connect the dots with a line. ❓
9.1.3 Changing the range of a plot axis
Let’s go back to our original plot and forget about all the fancy arguments and adding points()
for a little bit. The problem with the last graph we made is that some points are not showing - if you look in the lower right-hand part points around 2006-2008 and 2011-2013 are not showing up because the y-axis stops around 3.5. We can fix this by adding a new argument which sets the limits of the y-axis: ylim = ...
. To do this correctly, we have to introduce a new function: c()
. This is actually one of the most common functions in R. In the c()
we need to tell R the lower and upper limits we want for the y-axis. Let’s do 0 and 8, which will be coded as c(0,8)
.
So, to set the y-axis limits we do this:
plot(Extent ~ Year,
ylim = c(0,8), # the c(...) fucntion to set limits
data = SeaIce)
Now we have a bunch more space at the bottom. We can add our points()
back to see if this work:
plot(Extent ~ Year,
ylim = c(0,8), # the c(...) fucntion to set limits
data = SeaIce)
points(Area ~ Year,
data = SeaIce,
col = 2)
⚠️ Note: ylim = ...
only goes in plot()
, not points()
. View()
⚠️
Let’s re-make our fancy plots now with ylim = ...
set.
# Main plot: Extent
plot(Extent ~ Year,
type = "b",
col = 2,
main = "Arctic Sea Ice Extent",
ylab = "Extent (square kilometers)",
lty = 2,
pch = 16,
ylim = c(0,8), # <#<== y-axis limits
data = SeaIce)
# Second column of data: Area
points(Area ~ Year,
type = "b",
col = 3,
pch = 17,
data = SeaIce)
9.2 You try it
9.2.1 Fixer-uppers
Fix the code below so it works
plot(Extent , Year,
data = SeaIce)
Fix the code below so it works
plot(Extent ~ Year, # relationship
type = b, # type of plot; Note: b in quotes
col = "2", # color; no quotes
data = SeaIce) # data
9.2.2 Intermediate
Make a plot with Area
on the x-axis and Extent
on the y-axis.
## Write the code below
9.2.3 Advanced
Based on the code above, make a plot of the SeaIce data where Area
appears within the plot()
statement, and Extent
is in points()
.
# Write the code below: