Chapter 5 A Brief introduction to R

By: Avril Coghlan.

Adapted, edited and expanded: Nathan Brouwer under the Creative Commons 3.0 Attribution License (CC BY 3.0).

This chapter provides a brief introduction to R. At the end of are links to additional resources for getting started with R.

5.1 Vocabulary

  • scalar
  • vector
  • list
  • class
  • numeric
  • character
  • assignment
  • elements of an object
  • indices
  • attributes of an object
  • argument of a function

5.2 R functions

  • <-
  • [ ]
  • $
  • table()
  • function
  • c()
  • log10()
  • help(), ?
  • help.search()
  • RSiteSearch()
  • mean()
  • return()
  • q()

5.3 Interacting with R

You will type R commands into the RStudio console in order to carry out analyses in R. In the RStudio console you will see the R prompt starting with the symbol “>”. “>” will always be there at the beginning of each new command - don’t try to delete it! Moreover, you never need to type it.

We type the commands needed for a particular task after this prompt. The command is carried out by R after you hit the Return key.

Once you have started R, you can start typing commands into the RStudio console, and the results will be calculated immediately, for example:

2*3
## [1] 6

Note that prior to the output of “6” it shows “[1]”.

Now subtraction:

10-3
## [1] 7

Again, prior to the output of “7” it shows “[1]”.

R can act like a basic calculator that you type commands in to. You can also use it like a more advanced scientific calculator and create variables that store information. All variables created by R are called objects. In R, we assign values to variables using an arrow-looking function <- the assignment operator. For example, we can assign the value 2*3 to the variable x using the command:

x <- 2*3

To view the contents of any R object, just type its name, press enter, and the contents of that R object will be displayed:

x
## [1] 6

5.4 Variables in R

There are several different types of objects in R with fancy math names, including scalars, vectors, matrices (singular: matrix), arrays, dataframes, tables, and lists. The scalar** variable x above is one example of an R object. While a scalar variable such as x has just one element, a vector consists of several elements. The elements in a vector are all of the same type (e.g.. numbers or alphabetic characters), while lists may include elements such as characters as well as numeric quantities. Vectors and dataframes are the most common variables you’ll use. You’ll also encounter matrices often, and lists are ubiquitous in R but beginning users often don’t encounter them because they remain behind the scenes.

5.4.1 Vectors

To create a vector, we can use the c() (combine) function. For example, to create a vector called myvector that has elements with values 8, 6, 9, 10, and 5, we type:

myvector <- c(8, 6, 9, 10, 5) # note: commas between each number!

To see the contents of the variable myvector, we can just type its name and press enter:

myvector
## [1]  8  6  9 10  5

5.4.2 Vector indexing

The [1] is the index of the first element in the vector. We can extract any element of the vector by typing the vector name with the index of that element given in square brackets [...].

For example, to get the value of the 4th element in the vector myvector, we type:

myvector[4]
## [1] 10

5.4.3 Character vectors

Vectors can contain letters, such as those designating nucleic acids

my.seq <- c("A","T","C","G")

They can also contain multi-letter strings:

my.oligos <- c("ATCGC","TTTCGC","CCCGCG","GGGCGC")

5.4.4 Lists

NOTE: below is a discussion of lists in R. This is excellent information, but not necessary if this is your very very first time using R.

In contrast to a vector, a list can contain elements of different types, for example, both numbers and letters. A list can even include other variables such as a vector. The list() function is used to create a list. For example, we could create a list mylist by typing:

mylist <- list(name="Charles Darwin", 
               wife="Emma Darwin", 
               myvector)

We can then print out the contents of the list mylist by typing its name:

mylist
## $name
## [1] "Charles Darwin"
## 
## $wife
## [1] "Emma Darwin"
## 
## [[3]]
## [1]  8  6  9 10  5

The elements in a list are numbered, and can be referred to using indices. We can extract an element of a list by typing the list name with the index of the element given in double square brackets (in contrast to a vector, where we only use single square brackets).

We can extract the second element from mylist by typing:

mylist[[2]]  # note the double square brackets [[...]]
## [1] "Emma Darwin"

As a baby step towards our next task, we can wrap index values as in the c() command like this:

mylist[[c(2)]]  # note the double square brackets [[...]]
## [1] "Emma Darwin"

The number 2 and c(2) mean the same thing.

Now, we can extract the second AND third elements from mylist. First, we put the indices 2 and 3 into a vector c(2,3), then wrap that vector in double square brackets: [c(2,3)]. All together it looks like this.

mylist[c(2,3)] # note the double brackets
## $wife
## [1] "Emma Darwin"
## 
## [[2]]
## [1]  8  6  9 10  5

Elements of lists may also be named, resulting in a named lists. The elements may then be referred to by giving the list name, followed by “$”, followed by the element name. For example, mylist$name is the same as mylist[[1]] and mylist$wife is the same as mylist[[2]]:

mylist$wife
## [1] "Emma Darwin"

We can find out the names of the named elements in a list by using the attributes() function, for example:

attributes(mylist)
## $names
## [1] "name" "wife" ""

When you use the attributes() function to find the named elements of a list variable, the named elements are always listed under a heading “$names”. Therefore, we see that the named elements of the list variable mylist are called “name” and “wife”, and we can retrieve their values by typing mylist$name and mylist$wife, respectively.

5.4.5 Tables

Another type of object that you will encounter in R is a table. The table() function allows you to total up or tabulate the number of times a value occurs within a vector. Tables are typically used on vectors containing character data, such as letters, words, or names, but can work on numeric data data.

5.4.5.1 Tables - The basics

If we made a vector variable “nucleotides” containing the of a DNA molecule, we can use the table() function to produce a table variable that contains the number of bases with each possible nucleotides:

bases <- c("A", "T", "A", "A", "T", "C", "G", "C", "G")

Now make the table

table(bases)
## bases
## A C G T 
## 3 2 2 2

We can store the table variable produced by the function table(), and call the stored table “bases.table”, by typing:

bases.table <- table(bases)

Tables also work on vectors containing numbers. First, a vector of numbers.

numeric.vecter <- c(1,1,1,1,3,4,4,4,4)

Second, a table, showing how many times each number occurs.

table(numeric.vecter)
## numeric.vecter
## 1 3 4 
## 4 1 4

5.4.5.2 Tables - further details

To access elements in a table variable, you need to use double square brackets, just like accessing elements in a list. For example, to access the fourth element in the table bases.table (the number of Ts in the sequence), we type:

bases.table[[4]]  # double brackets!
## [1] 2

Alternatively, you can use the name of the fourth element in the table (“John”) to find the value of that table element:

bases.table[["T"]]
## [1] 2

5.5 Arguments

Functions in R usually require arguments, which are input variables (i.e.. objects) that are passed to them, which they then carry out some operation on. For example, the log10() function is passed a number, and it then calculates the log to the base 10 of that number:

log10(100)
## [1] 2

There’s a more generic function, log(), where we pass it not only a number to take the log of, but also the specific base of the logarithm. To take the log base 10 with the log() function we do this

log(100, base = 10)
## [1] 2

We can also take logs with other bases, such as 2:

log(100, base = 2)
## [1] 6.643856

5.6 Help files with help() and ?

In R, you can get help about a particular function by using the help() function. For example, if you want help about the log10() function, you can type:

help("log10")

When you use the help() function, a box or web page will show up in one of the panes of RStudio with information about the function that you asked for help with. You can also use the ? next to the function like this

?log10

Help files are a mixed bag in R, and it can take some getting used to them. An excellent overview of this is Kieran Healy’s “How to read an R help page.”

5.7 Searching for functions with help.search() and RSiteSearch()

If you are not sure of the name of a function, but think you know part of its name, you can search for the function name using the help.search() and RSiteSearch() functions. The help.search() function searches to see if you already have a function installed (from one of the R packages that you have installed) that may be related to some topic you’re interested in. RSiteSearch() searches all R functions (including those in packages that you haven’t yet installed) for functions related to the topic you are interested in.

For example, if you want to know if there is a function to calculate the standard deviation (SD) of a set of numbers, you can search for the names of all installed functions containing the word “deviation” in their description by typing:

help.search("deviation")

Among the functions that were found, is the function sd() in the stats package (an R package that comes with the base R installation), which is used for calculating the standard deviation.

Now, instead of searching just the packages we’ve have on our computer let’s search all R packages on CRAN. Let’s look for things related to DNA. Note that RSiteSearch() doesn’t provide output within RStudio, but rather opens up your web browser for you to display the results.

RSiteSearch("DNA")

The results of the RSiteSearch() function will be hits to descriptions of R functions, as well as to R mailing list discussions of those functions.

5.8 More on functions

We can perform computations with R using objects such as scalars and vectors. For example, to calculate the average of the values in the vector myvector (i.e.. the average of 8, 6, 9, 10 and 5), we can use the mean() function:

mean(myvector) # note: no " "
## [1] 7.6

We have been using built-in R functions such as mean(), length(), print(), plot(), etc.

5.8.1 Writing your own functions

NOTE: *Writing your own functions is an advanced skills. New users can skip this section.

We can also create our own functions in R to do calculations that you want to carry out very often on different input data sets. For example, we can create a function to calculate the value of 20 plus square of some input number:

myfunction <- function(x) { return(20 + (x*x)) }

This function will calculate the square of a number (x), and then add 20 to that value. The return() statement returns the calculated value. Once you have typed in this function, the function is then available for use. For example, we can use the function for different input numbers (e.g.. 10, 25):

myfunction(10)
## [1] 120

5.9 Quiting R

To quit R either close the program, or type:

q()