Zoom R: the basics
Help and documentation
For some general documentation on R, you can run:
help.start()
To get help on a function (e.g. sum
), you can run:
help(sum)
Depending on your settings, this will open a documentation for sum
in a pager or in your browser.
R settings
Settings are saved in a .Rprofile
file. You can edit the file directly in any text editor or from within R.
List all options:
options()
Return the value of a particular option:
getOption("help_type")
[1] "html"
Set an option:
options(help_type = "html")
Assignment
R can accept the equal sign (=
) for assignments, but it is more idiomatic to use the assignment sign (<-
) whenever you bind a name to a value and to use the equal sign everywhere else.
a <- 3
Once you have bound a name to a value, you can recall the value with that name:
a # Note that you do not need to use a print() function in R
[1] 3
You can remove an object from the environment by deleting its name:
rm(a)
a
Error in eval(expr, envir, enclos): object 'a' not found
The garbage collector will take care of deleting the object itself from memory.
Data types and structures
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1 d | Atomic vector | List |
2 d | Matrix | Data frame |
3 d | Array |
Atomic vectors
With a single element
a <- 2
a
[1] 2
typeof(a)
[1] "double"
str(a)
num 2
length(a)
[1] 1
dim(a)
NULL
The dim
attribute of a vector doesn’t exist (hence the NULL
). This makes vectors different from one-dimensional arrays which have a dim
of 1
.
You might have noticed that 2
is a double (double precision floating point number, equivalent of “float” in other languages). In R, this is the default, even if you don’t type 2.0
. This prevents the kind of weirdness you can find in, for instance, Python.
In Python:
>>> 2 == 2.0
True
>>> type(2) == type(2.0)
False
>>> type(2)
<class 'int'>
>>> type(2.0)
<class 'float'>
In R:
> 2 == 2.0
[1] TRUE
> typeof(2) == typeof(2.0)
[1] TRUE
> typeof(2)
[1] "double"
> typeof(2.0)
[1] "double"
If you want to define an integer variable, you use:
b <- 2L
b
[1] 2
typeof(b)
[1] "integer"
mode(b)
[1] "numeric"
str(b)
int 2
There are six vector types:
- logical
- integer
- double
- character
- complex
- raw
With multiple elements
c <- c(2, 4, 1)
c
[1] 2 4 1
typeof(c)
[1] "double"
mode(c)
[1] "numeric"
str(c)
num [1:3] 2 4 1
d <- c(TRUE, TRUE, NA, FALSE)
d
[1] TRUE TRUE NA FALSE
typeof(d)
[1] "logical"
str(d)
logi [1:4] TRUE TRUE NA FALSE
NA
(“Not Available”) is a logical constant of length one. It is an indicator for a missing value.
Vectors are homogeneous, so all elements need to be of the same type.
If you use elements of different types, R will convert some of them to ensure that they become of the same type:
e <- c("This is a string", 3, "test")
e
[1] "This is a string" "3" "test"
typeof(e)
[1] "character"
str(e)
chr [1:3] "This is a string" "3" "test"
f <- c(TRUE, 3, FALSE)
f
[1] 1 3 0
typeof(f)
[1] "double"
str(f)
num [1:3] 1 3 0
g <- c(2L, 3, 4L)
g
[1] 2 3 4
typeof(g)
[1] "double"
str(g)
num [1:3] 2 3 4
h <- c("string", TRUE, 2L, 3.1)
h
[1] "string" "TRUE" "2" "3.1"
typeof(h)
[1] "character"
str(h)
chr [1:4] "string" "TRUE" "2" "3.1"
The binary operator :
is equivalent to the seq()
function and generates a regular sequence of integers:
i <- 1:5
i
[1] 1 2 3 4 5
typeof(i)
[1] "integer"
str(i)
int [1:5] 1 2 3 4 5
identical(2:8, seq(2, 8))
[1] TRUE
Matrices
j <- matrix(1:12, nrow = 3, ncol = 4)
j
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
typeof(j)
[1] "integer"
str(j)
int [1:3, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(j)
[1] 12
dim(j)
[1] 3 4
The default is byrow = FALSE
. If you want the matrix to be filled in by row, you need to set this argument to TRUE
:
k <- matrix(1:12, nrow = 3, ncol = 4, byrow = TRUE)
k
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Arrays
l <- array(as.double(1:24), c(3, 2, 4))
l
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
, , 4
[,1] [,2]
[1,] 19 22
[2,] 20 23
[3,] 21 24
typeof(l)
[1] "double"
str(l)
num [1:3, 1:2, 1:4] 1 2 3 4 5 6 7 8 9 10 ...
length(l)
[1] 24
dim(l)
[1] 3 2 4
Lists
m <- list(2, 3)
m
[[1]]
[1] 2
[[2]]
[1] 3
typeof(m)
[1] "list"
str(m)
List of 2
$ : num 2
$ : num 3
length(m)
[1] 2
dim(m)
NULL
As with atomic vectors, lists do not have a dim
attribute. Lists are in fact a different type of vectors.
Lists can be heterogeneous:
n <- list(2L, 3, c(2, 1), FALSE, "string")
n
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 2 1
[[4]]
[1] FALSE
[[5]]
[1] "string"
typeof(n)
[1] "list"
str(n)
List of 5
$ : int 2
$ : num 3
$ : num [1:2] 2 1
$ : logi FALSE
$ : chr "string"
length(n)
[1] 5
Data frames
Data frames contain tabular data. Under the hood, a data frame is a list of vectors.
o <- data.frame(
country = c("Canada", "USA", "Mexico"),
var = c(2.9, 3.1, 4.5)
)
o
country var
1 Canada 2.9
2 USA 3.1
3 Mexico 4.5
typeof(o)
[1] "list"
str(o)
'data.frame': 3 obs. of 2 variables:
$ country: chr "Canada" "USA" "Mexico"
$ var : num 2.9 3.1 4.5
length(o)
[1] 2
dim(o)
[1] 3 2
Indexing
Indexing in R starts at 1
.
a
[1] 2
a[1]
[1] 2
a[2]
[1] NA
c
[1] 2 4 1
c[2]
[1] 4
c[2:4]
[1] 4 1 NA
j
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
j[2, 3]
[1] 8
l
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18
, , 4
[,1] [,2]
[1,] 19 22
[2,] 20 23
[3,] 21 24
l[2, 1, 3]
[1] 14
n
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 2 1
[[4]]
[1] FALSE
[[5]]
[1] "string"
n[3]
[[1]]
[1] 2 1
typeof(n[3])
[1] "list"
n[3][1]
[[1]]
[1] 2 1
n[[3]]
[1] 2 1
typeof(n[[3]])
[1] "double"
n[[3]][1]
[1] 2
o
country var
1 Canada 2.9
2 USA 3.1
3 Mexico 4.5
o[1]
country
1 Canada
2 USA
3 Mexico
typeof(o[1])
[1] "list"
str(o[1])
'data.frame': 3 obs. of 1 variable:
$ country: chr "Canada" "USA" "Mexico"
o[[1]]
[1] "Canada" "USA" "Mexico"
typeof(o[[1]])
[1] "character"
o$country
[1] "Canada" "USA" "Mexico"
typeof(o$country)
[1] "character"
Copy-on-modify
While some languages (e.g. Python) do not make a copy if you modify a mutable object, R does.
Let’s have a look at Python:
>>> a = [1, 2, 3]
>>> b = a
>>> b
[1, 2, 3]
>>> a[0] = 4
>>> a
[4, 2, 3]
>>> b
[4, 2, 3]
Modifying a
also modifies b
. If you want to keep b
unchanged, you need to explicitly make a copy of a
.
Now, let’s see what happens in R:
> a <- c(1, 2, 3)
> b <- a
> b
[1] 1 2 3
> a[1] <- 4
> a
[1] 4 2 3
> b
[1] 1 2 3
Here, the default is to create a new copy in memory when a
is transformed so that b
remains unchanged. This is more intuitive, but more memory intensive.
Function definition
compare <- function(x, y) {
x == y
}
We can now use our function:
compare(2, 3)
[1] FALSE
Note that the result of the last statement is printed automatically:
test <- function(x, y) {
x
y
}
test(2, 3)
[1] 3
If you want to return other results, you need to explicitly use the print()
function:
test <- function(x, y) {
print(x)
y
}
test(2, 3)
[1] 2
[1] 3
Control flow
Conditionals
test_sign <- function(x) {
if (x > 0) {
"x is positif"
} else if (x < 0) {
"x is negatif"
} else {
"x is equal to zero"
}
}
test_sign(3)
[1] "x is positif"
test_sign(-2)
[1] "x is negatif"
test_sign(0)
[1] "x is equal to zero"
Loops
for (i in 1:10) {
print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
Notice that here we need to use the print()
function.