Chapter 7 Functions

  • seq()
  • is(), is.vector(), is.matrix()
  • gsub()

7.1 Vectors in R

Variables in R include scalars, vectors, and lists. Functions in R carry out operations on variables, for example, using the log10() function to calculate the log to the base 10 of a scalar variable x, or using the mean() function to calculate the average of the values in a vector variable myvector. For example, we can use log10() on a scalar object like this:

# store value in object
x <- 100

# take log base 10 of object
log10(x)
## [1] 2

Note that while mathematically x is a single number, or a scalar, R considers it to be a vector:

is.vector(x)
## [1] TRUE

There are many “is” commands. What is returned when you run is.matrix() on a vector?

is.matrix(x)
## [1] FALSE

Mathematically this is a bit odd, since often a vector is defined as a one-dimensional matrix, e.g., a single column or single row of a matrix. But in R land, a vector is a vector, and matrix is a matrix, and there are no explicit scalars.

7.2 Math on vectors

Vectors can serve as the input for mathematical operations. When this is done R does the mathematical operation separately on each element of the vector. This is a unique feature of R that can be hard to get used to even for people with previous programming experience.

Let’s make a vector of numbers:

myvector <- c(30,16,303,99,11,111)

What happens when we multiply myvector by 10?

myvector*10
## [1]  300  160 3030  990  110 1110

R has taken each of the 6 values, 30 through 111, of myvector and multiplied each one by 10, giving us 6 results. That is, what R did was

## 30*10    # first value of myvector
## 16*10    # second value of myvector
## 303*10   # ....
## 99*10
## 111*10   # last value of myvector

The normal order of operations rules apply to vectors as they do to operations we’re more used to. So multiplying myvector by 10 is the same whether you put he 10 before or after vector. That is myvector\*10 is the same as 10\*myvector.

myvector*10
## [1]  300  160 3030  990  110 1110
10*myvector
## [1]  300  160 3030  990  110 1110

What happen when you subtract 30 from myvector? Write the code below.

myvector-30
## [1]   0 -14 273  69 -19  81

So, what R did was

## 30-30    # first value of myvector
## 16-30    # second value of myvector
## 303-30   # ....
## 99-30
## 111-30   # last value of myvector

Again, myvector-30 is vectorized operation.

You can also square a vector

myvector^2
## [1]   900   256 91809  9801   121 12321

Which is the same as

## 30^2    # first value of myvector
## 16^2    # second value of myvector
## 303^2   # ....
## 99^2
## 111^2   # last value of myvector

Also you can take the square root of a vector using the functions sqrt()

sqrt(myvector)
## [1]  5.477226  4.000000 17.406895  9.949874  3.316625 10.535654

…and take the log of a vector with log()

log(myvector)
## [1] 3.401197 2.772589 5.713733 4.595120 2.397895 4.709530

…and just about any other mathematical operation. Here we are working on a separate vector object; all of these rules apply to a column in a matrix or a dataframe.

This attribute of R is called vectorization. When you run the code myvector*10 or log(myvector) you are doing a vectorized operation - its like normal math with special vector-based super power to get more done faster than you normally could.

7.3 Functions on vectors

As we just saw, we can use functions on vectors. Typically these use the vectors as an input and all the numbers are processed into an output. Call the mean() function on the vector we made called myvector.

mean(myvector)
## [1] 95

Note how we get a single value back - the mean of all the values in the vector. R saw that we had a vector of multiple and knew that the mean is a function that doesn’t get applied to single number, but sets of numbers.

The function sd() calculates the standard deviation. Apply the sd() to myvector:

sd(myvector)
## [1] 110.5061

7.4 Operations with two vectors

You can also subtract one vector from another vector. This can be a little weird when you first see it. Make another vector with the numbers 5, 10, 15, 20, 25, 30. Call this myvector2:

myvector2 <- c(5, 10, 15, 20, 25, 30)

Now subtract myvector2 from myvector. What happens?

myvector-myvector2
## [1]  25   6 288  79 -14  81

7.5 Subsetting vectors

You can extract an element of a vector by typing the vector name with the index of that element given in square brackets. For example, to get the value of the 3rd element in the vector myvector, we type:

myvector[3]
## [1] 303

Extract the 4th element of the vector:

myvector[4]
## [1] 99

You can extract more than one element by using a vector in the brackets:

First, say I want to extract the 3rd and the 4th element. I can make a vector with 3 and 4 in it:

nums <- c(3,4)

Then put that vector in the brackets:

myvector[nums]
## [1] 303  99

We can also do it directly like this, skipping the vector-creation step:

myvector[c(3,4)]
## [1] 303  99

In the chunk below extract the 1st and 2nd elements:

myvector[c(1,2)]
## [1] 30 16

7.6 Sequences of numbers

Often we want a vector of numbers in sequential order. That is, a vector with the numbers 1, 2, 3, 4, … or 5, 10, 15, 20, … The easiest way to do this is using a colon

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

Note that in R 1:10 is equivalent to c(1:10)

c(1:10)
##  [1]  1  2  3  4  5  6  7  8  9 10

Usually to emphasize that a vector is being created I will use c(1:10)

We can do any number to any numbers

c(20:30)
##  [1] 20 21 22 23 24 25 26 27 28 29 30

We can also do it in reverse. In the code below put 30 before 20:

c(30:20)
##  [1] 30 29 28 27 26 25 24 23 22 21 20

A useful function in R is the seq() function, which is an explicit function that can be used to create a vector containing a sequence of numbers that run from a particular number to another particular number.

seq(1, 10)
##  [1]  1  2  3  4  5  6  7  8  9 10

Using seq() instead of a : can be useful for readability to make it explicit what is going on. More importantly, seq has an argument by = ... so you can make a sequence of number with any interval between For example, if we want to create the sequence of numbers from 1 to 10 in steps of 1 (i.e.. 1, 2, 3, 4, … 10), we can type:

seq(1, 10,
    by = 1)
##  [1]  1  2  3  4  5  6  7  8  9 10

We can change the step size by altering the value of the by argument given to the function seq(). For example, if we want to create a sequence of numbers from 1-100 in steps of 20 (i.e.. 1, 21, 41, … 101), we can type:

seq(1, 101,
    by = 20)
## [1]   1  21  41  61  81 101

7.7 Vectors can hold numeric or character data

The vector we created above holds numeric data, as indicated by class()

class(myvector)
## [1] "numeric"

Vectors can also holder character data, like the genetic code:

# vector of character data
myvector <- c("A","T","G")

# how it looks
myvector
## [1] "A" "T" "G"
# what is "is"
class(myvector)
## [1] "character"

7.8 Regular expressions can modify character data

We can use regular expressions to modify character data. For example, change the Ts to Us

myvector <- gsub("T", "U", myvector)

Now check it out

myvector
## [1] "A" "U" "G"

Regular expressions are a deep subject in computing. You can find some more information about them here.