Zoom R: the basics

Last updated: January 4, 2023

 Table of contents

Help and documentation

For some general documentation on R, you can run:

help.start()

To get help on a function (e.g. sum), you can run:

help(sum)

Depending on your settings, this will open a documentation for sum in a pager or in your browser.

R settings

Settings are saved in a .Rprofile file. You can edit the file directly in any text editor or from within R.

List all options:

options()

Return the value of a particular option:

getOption("help_type")
[1] "html"

Set an option:

options(help_type = "html")

Assignment

R can accept the equal sign (=) for assignments, but it is more idiomatic to use the assignment sign (<-) whenever you bind a name to a value and to use the equal sign everywhere else.

a <- 3

Once you have bound a name to a value, you can recall the value with that name:

a  # Note that you do not need to use a print() function in R
[1] 3

You can remove an object from the environment by deleting its name:

rm(a)
a
Error in eval(expr, envir, enclos): object 'a' not found

The garbage collector will take care of deleting the object itself from memory.

Data types and structures

Dimension Homogeneous Heterogeneous
1 d Atomic vector List
2 d Matrix Data frame
3 d Array

Atomic vectors

With a single element

a <- 2
a
[1] 2
typeof(a)
[1] "double"
str(a)
 num 2
length(a)
[1] 1
dim(a)
NULL

The dim attribute of a vector doesn’t exist (hence the NULL). This makes vectors different from one-dimensional arrays which have a dim of 1.

You might have noticed that 2 is a double (double precision floating point number, equivalent of “float” in other languages). In R, this is the default, even if you don’t type 2.0. This prevents the kind of weirdness you can find in, for instance, Python.

In Python:

>>> 2 == 2.0
True
>>> type(2) == type(2.0)
False
>>> type(2)
<class 'int'>
>>> type(2.0)
<class 'float'>

In R:

> 2 == 2.0
[1] TRUE
> typeof(2) == typeof(2.0)
[1] TRUE
> typeof(2)
[1] "double"
> typeof(2.0)
[1] "double"

If you want to define an integer variable, you use:

b <- 2L
b
[1] 2
typeof(b)
[1] "integer"
mode(b)
[1] "numeric"
str(b)
 int 2

There are six vector types:

  • logical
  • integer
  • double
  • character
  • complex
  • raw

With multiple elements

c <- c(2, 4, 1)
c
[1] 2 4 1
typeof(c)
[1] "double"
mode(c)
[1] "numeric"
str(c)
 num [1:3] 2 4 1
d <- c(TRUE, TRUE, NA, FALSE)
d
[1]  TRUE  TRUE    NA FALSE
typeof(d)
[1] "logical"
str(d)
 logi [1:4] TRUE TRUE NA FALSE

NA (“Not Available”) is a logical constant of length one. It is an indicator for a missing value.

Vectors are homogeneous, so all elements need to be of the same type.

If you use elements of different types, R will convert some of them to ensure that they become of the same type:

e <- c("This is a string", 3, "test")
e
[1] "This is a string" "3"                "test"            
typeof(e)
[1] "character"
str(e)
 chr [1:3] "This is a string" "3" "test"
f <- c(TRUE, 3, FALSE)
f
[1] 1 3 0
typeof(f)
[1] "double"
str(f)
 num [1:3] 1 3 0
g <- c(2L, 3, 4L)
g
[1] 2 3 4
typeof(g)
[1] "double"
str(g)
 num [1:3] 2 3 4
h <- c("string", TRUE, 2L, 3.1)
h
[1] "string" "TRUE"   "2"      "3.1"   
typeof(h)
[1] "character"
str(h)
 chr [1:4] "string" "TRUE" "2" "3.1"

The binary operator : is equivalent to the seq() function and generates a regular sequence of integers:

i <- 1:5
i
[1] 1 2 3 4 5
typeof(i)
[1] "integer"
str(i)
 int [1:5] 1 2 3 4 5
identical(2:8, seq(2, 8))
[1] TRUE

Matrices

j <- matrix(1:12, nrow = 3, ncol = 4)
j
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
typeof(j)
[1] "integer"
str(j)
 int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(j)
[1] 12
dim(j)
[1] 3 4

The default is byrow = FALSE. If you want the matrix to be filled in by row, you need to set this argument to TRUE:

k <- matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
k
     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

Arrays

l <- array(as.double(1:24), c(3, 2, 4))
l
, , 1

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

     [,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

, , 3

     [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18

, , 4

     [,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24
typeof(l)
[1] "double"
str(l)
 num [1:3, 1:2, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(l)
[1] 24
dim(l)
[1] 3 2 4

Lists

m <- list(2, 3)
m
[[1]]
[1] 2

[[2]]
[1] 3
typeof(m)
[1] "list"
str(m)
List of 2
 $ : num 2
 $ : num 3
length(m)
[1] 2
dim(m)
NULL

As with atomic vectors, lists do not have a dim attribute. Lists are in fact a different type of vectors.

Lists can be heterogeneous:

n <- list(2L, 3, c(2, 1), FALSE, "string")
n
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 2 1

[[4]]
[1] FALSE

[[5]]
[1] "string"
typeof(n)
[1] "list"
str(n)
List of 5
 $ : int 2
 $ : num 3
 $ : num [1:2] 2 1
 $ : logi FALSE
 $ : chr "string"
length(n)
[1] 5

Data frames

Data frames contain tabular data. Under the hood, a data frame is a list of vectors.

o <- data.frame(
  country = c("Canada", "USA", "Mexico"),
  var = c(2.9, 3.1, 4.5)
)
o
  country var
1  Canada 2.9
2     USA 3.1
3  Mexico 4.5
typeof(o)
[1] "list"
str(o)
'data.frame':   3 obs. of  2 variables:
 $ country: chr  "Canada" "USA" "Mexico"
 $ var    : num  2.9 3.1 4.5
length(o)
[1] 2
dim(o)
[1] 3 2

Indexing

Indexing in R starts at 1.

a
[1] 2
a[1]
[1] 2
a[2]
[1] NA
c
[1] 2 4 1
c[2]
[1] 4
c[2:4]
[1]  4  1 NA
j
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
j[2, 3]
[1] 8
l
, , 1

     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

, , 2

     [,1] [,2]
[1,]    7   10
[2,]    8   11
[3,]    9   12

, , 3

     [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18

, , 4

     [,1] [,2]
[1,]   19   22
[2,]   20   23
[3,]   21   24
l[2, 1, 3]
[1] 14
n
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 2 1

[[4]]
[1] FALSE

[[5]]
[1] "string"
n[3]
[[1]]
[1] 2 1
typeof(n[3])
[1] "list"
n[3][1]
[[1]]
[1] 2 1
n[[3]]
[1] 2 1
typeof(n[[3]])
[1] "double"
n[[3]][1]
[1] 2
o
  country var
1  Canada 2.9
2     USA 3.1
3  Mexico 4.5
o[1]
  country
1  Canada
2     USA
3  Mexico
typeof(o[1])
[1] "list"
str(o[1])
'data.frame':   3 obs. of  1 variable:
 $ country: chr  "Canada" "USA" "Mexico"
o[[1]]
[1] "Canada" "USA"    "Mexico"
typeof(o[[1]])
[1] "character"
o$country
[1] "Canada" "USA"    "Mexico"
typeof(o$country)
[1] "character"

Copy-on-modify

While some languages (e.g. Python) do not make a copy if you modify a mutable object, R does.

Let’s have a look at Python:

>>> a = [1, 2, 3]
>>> b = a
>>> b
[1, 2, 3]
>>> a[0] = 4
>>> a
[4, 2, 3]
>>> b
[4, 2, 3]

Modifying a also modifies b. If you want to keep b unchanged, you need to explicitly make a copy of a.

Now, let’s see what happens in R:

> a <- c(1, 2, 3)
> b <- a
> b
[1] 1 2 3
> a[1] <- 4
> a
[1] 4 2 3
> b
[1] 1 2 3

Here, the default is to create a new copy in memory when a is transformed so that b remains unchanged. This is more intuitive, but more memory intensive.

Function definition

compare <- function(x, y) {
  x == y
}

We can now use our function:

compare(2, 3)
[1] FALSE

Note that the result of the last statement is printed automatically:

test <- function(x, y) {
  x
  y
}
test(2, 3)
[1] 3

If you want to return other results, you need to explicitly use the print() function:

test <- function(x, y) {
  print(x)
  y
}
test(2, 3)
[1] 2

[1] 3

Control flow

Conditionals

test_sign <- function(x) {
  if (x > 0) {
    "x is positif"
  } else if (x < 0) {
    "x is negatif"
  } else {
    "x is equal to zero"
  }
}
test_sign(3)
[1] "x is positif"
test_sign(-2)
[1] "x is negatif"
test_sign(0)
[1] "x is equal to zero"

Loops

for (i in 1:10) {
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

Notice that here we need to use the print() function.