Zoom R: the basics
Help and documentation
For some general documentation on R, you can run:
help.start()
To get help on a function (e.g. sum), you can run:
help(sum)
Depending on your settings, this will open a documentation for sum in a pager or in your browser.
R settings
Settings are saved in a .Rprofile file. You can edit the file directly in any text editor or from within R.
List all options:
options()
Return the value of a particular option:
getOption("help_type")
[1] "html"
Set an option:
options(help_type = "html")
Assignment
R can accept the equal sign (=) for assignments, but it is more idiomatic to use the assignment sign (<-) whenever you bind a name to a value and to use the equal sign everywhere else.
a <- 3
Once you have bound a name to a value, you can recall the value with that name:
a # Note that you do not need to use a print() function in R
[1] 3
You can remove an object from the environment by deleting its name:
rm(a)
a
Error in eval(expr, envir, enclos): object 'a' not found
The garbage collector will take care of deleting the object itself from memory.
Data types and structures
| Dimension | Homogeneous | Heterogeneous |
|---|---|---|
| 1 d | Atomic vector | List |
| 2 d | Matrix | Data frame |
| 3 d | Array |
Atomic vectors
With a single element
a <- 2
a
[1] 2
typeof(a)
[1] "double"
str(a)
num 2
length(a)
[1] 1
dim(a)
NULL
The dim attribute of a vector doesn’t exist (hence the NULL). This makes vectors different from one-dimensional arrays which have a dim of 1.
You might have noticed that 2 is a double (double precision floating point number, equivalent of “float” in other languages). In R, this is the default, even if you don’t type 2.0. This prevents the kind of weirdness you can find in, for instance, Python.
In Python:
>>> 2 == 2.0
True
>>> type(2) == type(2.0)
False
>>> type(2)
<class 'int'>
>>> type(2.0)
<class 'float'>
In R:
> 2 == 2.0
[1] TRUE
> typeof(2) == typeof(2.0)
[1] TRUE
> typeof(2)
[1] "double"
> typeof(2.0)
[1] "double"
If you want to define an integer variable, you use:
b <- 2L
b
[1] 2
typeof(b)
[1] "integer"
mode(b)
[1] "numeric"
str(b)
int 2
There are six vector types:
- logical
- integer
- double
- character
- complex
- raw
With multiple elements
c <- c(2, 4, 1)
c
[1] 2 4 1
typeof(c)
[1] "double"
mode(c)
[1] "numeric"
str(c)
num [1:3] 2 4 1
d <- c(TRUE, TRUE, NA, FALSE)
d
[1] TRUE TRUE NA FALSE
typeof(d)
[1] "logical"
str(d)
logi [1:4] TRUE TRUE NA FALSE
NA (“Not Available”) is a logical constant of length one. It is an indicator for a missing value.
Vectors are homogeneous, so all elements need to be of the same type.
If you use elements of different types, R will convert some of them to ensure that they become of the same type:
e <- c("This is a string", 3, "test")
e
[1] "This is a string" "3" "test"
typeof(e)
[1] "character"
str(e)
chr [1:3] "This is a string" "3" "test"
f <- c(TRUE, 3, FALSE)
f
[1] 1 3 0
typeof(f)
[1] "double"
str(f)
num [1:3] 1 3 0
g <- c(2L, 3, 4L)
g
[1] 2 3 4
typeof(g)
[1] "double"
str(g)
num [1:3] 2 3 4
h <- c("string", TRUE, 2L, 3.1)
h
[1] "string" "TRUE" "2" "3.1"
typeof(h)
[1] "character"
str(h)
chr [1:4] "string" "TRUE" "2" "3.1"
The binary operator : is equivalent to the seq() function and generates a regular sequence of integers:
i <- 1:5
i
[1] 1 2 3 4 5
typeof(i)
[1] "integer"
str(i)
int [1:5] 1 2 3 4 5
identical(2:8, seq(2, 8))
[1] TRUE
Matrices
j <- matrix(1:12, nrow = 3, ncol = 4)
j
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
typeof(j)
[1] "integer"
str(j)
int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(j)
[1] 12
dim(j)
[1] 3 4
The default is byrow = FALSE. If you want the matrix to be filled in by row, you need to set this argument to TRUE:
k <- matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
k
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Arrays
l <- array(as.double(1:24), c(3, 2, 4))
l
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
, , 4
[,1] [,2]
[1,] 19 22
[2,] 20 23
[3,] 21 24
typeof(l)
[1] "double"
str(l)
num [1:3, 1:2, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(l)
[1] 24
dim(l)
[1] 3 2 4
Lists
m <- list(2, 3)
m
[[1]]
[1] 2
[[2]]
[1] 3
typeof(m)
[1] "list"
str(m)
List of 2
$ : num 2
$ : num 3
length(m)
[1] 2
dim(m)
NULL
As with atomic vectors, lists do not have a dim attribute. Lists are in fact a different type of vectors.
Lists can be heterogeneous:
n <- list(2L, 3, c(2, 1), FALSE, "string")
n
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 2 1
[[4]]
[1] FALSE
[[5]]
[1] "string"
typeof(n)
[1] "list"
str(n)
List of 5
$ : int 2
$ : num 3
$ : num [1:2] 2 1
$ : logi FALSE
$ : chr "string"
length(n)
[1] 5
Data frames
Data frames contain tabular data. Under the hood, a data frame is a list of vectors.
o <- data.frame(
country = c("Canada", "USA", "Mexico"),
var = c(2.9, 3.1, 4.5)
)
o
country var
1 Canada 2.9
2 USA 3.1
3 Mexico 4.5
typeof(o)
[1] "list"
str(o)
'data.frame': 3 obs. of 2 variables:
$ country: chr "Canada" "USA" "Mexico"
$ var : num 2.9 3.1 4.5
length(o)
[1] 2
dim(o)
[1] 3 2
Indexing
Indexing in R starts at 1.
a
[1] 2
a[1]
[1] 2
a[2]
[1] NA
c
[1] 2 4 1
c[2]
[1] 4
c[2:4]
[1] 4 1 NA
j
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
j[2, 3]
[1] 8
l
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
, , 4
[,1] [,2]
[1,] 19 22
[2,] 20 23
[3,] 21 24
l[2, 1, 3]
[1] 14
n
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 2 1
[[4]]
[1] FALSE
[[5]]
[1] "string"
n[3]
[[1]]
[1] 2 1
typeof(n[3])
[1] "list"
n[3][1]
[[1]]
[1] 2 1
n[[3]]
[1] 2 1
typeof(n[[3]])
[1] "double"
n[[3]][1]
[1] 2
o
country var
1 Canada 2.9
2 USA 3.1
3 Mexico 4.5
o[1]
country
1 Canada
2 USA
3 Mexico
typeof(o[1])
[1] "list"
str(o[1])
'data.frame': 3 obs. of 1 variable:
$ country: chr "Canada" "USA" "Mexico"
o[[1]]
[1] "Canada" "USA" "Mexico"
typeof(o[[1]])
[1] "character"
o$country
[1] "Canada" "USA" "Mexico"
typeof(o$country)
[1] "character"
Copy-on-modify
While some languages (e.g. Python) do not make a copy if you modify a mutable object, R does.
Let’s have a look at Python:
>>> a = [1, 2, 3]
>>> b = a
>>> b
[1, 2, 3]
>>> a[0] = 4
>>> a
[4, 2, 3]
>>> b
[4, 2, 3]
Modifying a also modifies b. If you want to keep b unchanged, you need to explicitly make a copy of a.
Now, let’s see what happens in R:
> a <- c(1, 2, 3)
> b <- a
> b
[1] 1 2 3
> a[1] <- 4
> a
[1] 4 2 3
> b
[1] 1 2 3
Here, the default is to create a new copy in memory when a is transformed so that b remains unchanged. This is more intuitive, but more memory intensive.
Function definition
compare <- function(x, y) {
x == y
}
We can now use our function:
compare(2, 3)
[1] FALSE
Note that the result of the last statement is printed automatically:
test <- function(x, y) {
x
y
}
test(2, 3)
[1] 3
If you want to return other results, you need to explicitly use the print() function:
test <- function(x, y) {
print(x)
y
}
test(2, 3)
[1] 2
[1] 3
Control flow
Conditionals
test_sign <- function(x) {
if (x > 0) {
"x is positif"
} else if (x < 0) {
"x is negatif"
} else {
"x is equal to zero"
}
}
test_sign(3)
[1] "x is positif"
test_sign(-2)
[1] "x is negatif"
test_sign(0)
[1] "x is equal to zero"
Loops
for (i in 1:10) {
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
Notice that here we need to use the print() function.