12.3 Loading data into R the easy way: pre-made data in an R “Package”
- Getting data into R (or SAS, or ArcGIS…) can be a pain!
- R comes with many datasets that are pre-loaded into it
- There are also many stat. techniques that can easily be added to R
- These are contained in “packages”
12.3.1 Load data that is already in the “base” distribution of R
Fisher’s iris data comes automatically with R. You can load it into R’s memory using the command “data()”
data(iris) #Load the iris data
12.3.2 Look at the iris data
We’ll look at the iris data using some commands like ls(), dim(), and names().
You can check that it was loaded using the ls() command (“list”).
ls()
You can get info about the nature of the dataframe using commands like dim()
dim(iris)
This tells us that the iris data is essentially a spreadsheet that has 150 rows and 5 columns.
We can get the column names with names()
names(iris)
- Note that the first letter of each word is capitalized.
- What are the implications of this?
The top of the data and the bottom of the data can be checked with head() and tail() commands
head(iris) #top of dataframe
tail(iris) #bottom of dataframe
Another common R command is is(), which tells you what something is in R land.
is(iris)
- R might spew a lot of things out at you when you use is()
- usually the 1st item is most important.
- Here, it tells us that the “object” called “iris” in your workspace is 1st and foremost a “data.frame”
- A dataframe is essentially a spreadsheet of data loaded into R.
You can get basic info about the data themselves using commands like summary().
summary(iris)
- Which variables are numeric?
- Which variables are categories/groups (aka “factors”)?
If you wanted info on just 1 column, you would tell R to isolate that column like this, using a dollar sign ($).
summary(iris$Sepal.Width)
That is, that name of the dataframe, a dollar sign ($), and the name of the column.
What happens when you don’t capitalize something? Try these intentional mistakes (but remove the “#” from in front of each one):
#all lower case
summary(iris$sepal.width) # this won't work
#just "s" in "sepal" lower case
summary(iris$sepal.Width) #this won't work either
#or what if you capitalize "i" in "Iris"?
summary(Iris$Sepal.Width) #won't work either
The first two error messages are not very informative; the 3rd one (“Error in summary(Iris$Sepal.Width) : object ‘Iris’ not found”) does make a little sense.