7.4 Learning about data in R

When data is being worked with in R, it lives in a place called the workspace. The workspace is not immediately transparent to you while working in R. It lives behind the scenes in what is essentially R’s working memory. We can see what’s on R’s mind using the ls() command

ls()
## [1] "crabs" "iris"  "x"

We can see our two datasets that we loaded using the data() command.

We can add new things to the work space using an R command like this

my.mean <- mean(c(1,2,2))

Where “<-” is called the assignment operator. This function assigns the output of an R command or R function to an R object in R’s working memory, the workspace.

We can check again what’s on R’s mind using a command ls(), which stands for “list”

ls()
## [1] "crabs"   "iris"    "my.mean" "x"

We can see that we added my.mean. We can see what my.mean is by typing its name in to the console

my.mean
## [1] 1.666667

We can also learn more about is using the is() command

is(my.mean)
## [1] "numeric" "vector"

Here we get a big of R lingo: R tells use “numeric”, which means it contain numeric data (numbers), and “vector”, which is one of several types of R object

R objects can be just about anything. We can assign letter to an R object like this

my.abc <- c("a","b","c")

Note that we have the letter each surrounded by quotes, and all 3 of them within c(…)

If you call up “my.abc” from the console, you will get back the three letter. Now see what is(my.abc) says

is(my.abc)
## [1] "character"           "vector"              "data.frameRowLabels"
## [4] "SuperClassMethod"

There’s a lot that comes out, but the first one says “character”, indicating that yo have character data - data made up of text.

If you type ls() again what happens?

ls()
## [1] "crabs"   "iris"    "my.abc"  "my.mean" "x"

We now see both of our R objects and the two datasets.

If we call is() on one of the dataset what do we is?

is(crabs)
## [1] "data.frame" "list"       "oldClass"   "vector"

Several things get spit out, but the first one is important: “data.frame” Dataframes are fundamental units of analysis in R. Most of the data you will load into R and work within R will be in a dataframe.

Another function that tells about something in the the workspace is str(), which stands for structure. It provides info about what types of variables are in each column, and provides some sample output similar to head(), but oriented differently.

str(crabs)
## 'data.frame':    200 obs. of  8 variables:
##  $ sp   : Factor w/ 2 levels "B","O": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex  : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
##  $ index: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ FL   : num  8.1 8.8 9.2 9.6 9.8 10.8 11.1 11.6 11.8 11.8 ...
##  $ RW   : num  6.7 7.7 7.8 7.9 8 9 9.9 9.1 9.6 10.5 ...
##  $ CL   : num  16.1 18.1 19 20.1 20.3 23 23.8 24.5 24.2 25.2 ...
##  $ CW   : num  19 20.8 22.4 23.1 23 26.5 27.1 28.4 27.8 29.3 ...
##  $ BD   : num  7 7.4 7.7 8.2 8.2 9.8 9.8 10.4 9.7 10.3 ...

Note that the variables “sp”, which stands for “Species”, and “sex” are followed by the word “Factor.” A factor variable is something that is or is summarized as discrete categories. For the species factor, there are two levels: the “B” species and the “O” species.