Chapter 8 Data in dataframes
An interesting dataset in Stat2Data
is SeaIce
. Load it with the data()
command.
library(Stat2Data)
data(SeaIce)
SeaIce
shows 37 years of the area of frozen ice in the arctic, from 1979 to 1993. The lynx
data we worked with previously was in a special format that you’ll probably rarely encounter ever again. It was nice to use, however, because it required very little code to plot.
SeaIce, however, is a typical R data object in the form of a dataframe. Dataframes are fundamental units of analysis in R. Most of the data you will load into R and work within R will be in a dataframe. The have the same basic structure as a spreadsheet, but R keeps them hidden in memory and you have to use commands to explore them.
8.1 Looking at dataframes with View()
To get a spreadsheet-like view of a dataframe you can use the View
command
View(SeaIce)
This will bring up the data in a spreadsheet like viewer as a new tab in the script editor, similar to this.
Year | Extent | Area | t |
---|---|---|---|
1979 | 7.22 | 4.54 | 1 |
1980 | 7.86 | 4.83 | 2 |
1981 | 7.25 | 4.38 | 3 |
1982 | 7.45 | 4.38 | 4 |
1983 | 7.54 | 4.64 | 5 |
1984 | 7.11 | 4.04 | 6 |
1985 | 6.93 | 4.18 | 7 |
1986 | 7.55 | 4.67 | 8 |
1987 | 7.51 | 5.61 | 9 |
1988 | 7.53 | 5.32 | 10 |
⚠️ Note: Unlike a spreadsheet you cannot edit the data when is called up using View()
⚠️
Like a spreadsheet the data are organized in columns and rows. Each column represents a type of information:
Year
: when data were collectedExtent
: the amount of area within the ice-bound regionArea
: total area of ice, minus any non-ice area (land, melted water)t
: time point, from 1 (1979) to 15 (1993)
You can think of Extent
as similar to the size of a country, and Area
as the actual amount of land in a country minus any lakes.
Each row represents a different year of data; row 1 is the Extent
and Area
for 1979, row to is the Extent
and Area
for 1980 and so on.
8.2 Looking at dataframes in the console
Another common way to examine data is simply type the name of the data in the console and press enter. This prints it out; however, if its a large data frame this may take up a LOT of room. (I’ll just show an exert here).
SeaIce
## Year Extent Area t
## 1 1979 7.22 4.54 1
## 2 1980 7.86 4.83 2
## 3 1981 7.25 4.38 3
## 4 1982 7.45 4.38 4
## 5 1983 7.54 4.64 5
## 6 1984 7.11 4.04 6
## 7 1985 6.93 4.18 7
## 8 1986 7.55 4.67 8
## 9 1987 7.51 5.61 9
## 10 1988 7.53 5.32 10
## 11 1989 7.08 4.83 11
## 12 1990 6.27 4.51 12
## 13 1991 6.59 4.47 13
## 14 1992 7.59 5.38 14
## 15 1993 6.54 4.53 15
8.3 Examining part of a dataframe
Look at the top of the dataframe
head(SeaIce)
## Year Extent Area t
## 1 1979 7.22 4.54 1
## 2 1980 7.86 4.83 2
## 3 1981 7.25 4.38 3
## 4 1982 7.45 4.38 4
## 5 1983 7.54 4.64 5
## 6 1984 7.11 4.04 6
Look at the bottom
tail(SeaIce)
## Year Extent Area t
## 32 2010 4.93 3.29 32
## 33 2011 4.63 3.18 33
## 34 2012 3.63 2.37 34
## 35 2013 5.35 3.75 35
## 36 2014 5.29 3.70 36
## 37 2015 4.68 3.37 37
8.4 Get information about dataframes
The following commands give you important information about a dataframe.
is(SeaIce)
dim(SeaIce)
names(SeaIce)
summary(SeaIce)
summary(SeaIce$Extent)
mean(SeaIce$Extent)
min(SeaIce$Extent)
max(SeaIce$Extent)
8.5 Accessing rows and columns of dataframes
First row
1, ] SeaIce[
## Year Extent Area t
## 1 1979 7.22 4.54 1
First column
1] # with brackets SeaIce[ ,
## [1] 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
## [16] 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
## [31] 2009 2010 2011 2012 2013 2014 2015
$Year # with $ SeaIce
## [1] 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
## [16] 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
## [31] 2009 2010 2011 2012 2013 2014 2015