<- data.frame(
df A = c(1, 2, 3, 4),
B = c(5, 6, 7, 8),
C = c(9, 10, 11, 12)
) df
A B C
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
Suppose you have a dataframe df
with three columns, A
, B
, and C
, as follows:
<- data.frame(
df A = c(1, 2, 3, 4),
B = c(5, 6, 7, 8),
C = c(9, 10, 11, 12)
) df
A B C
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12
Now, you want to insert a shorter vector D
into the df
dataframe:
$D <- c(13,14) df
What will be the D
column of df
after the operation?
Please select the correct option.
This exercise should help answer this question: ‘In what type of situations would “recycling” be useful?’
First, let’s set up the data frame a
<- data.frame(n = 1:4)
a dim(a)
[1] 4 1
a
n
1 1
2 2
3 3
4 4
If the following WebR chunk is working properly, you should see an editor window below the Run code
tab displaying this line of R code: (a <- data.frame(n = 1:4))
.
The R command
a$rowNum1 <- NA
would insert a new row into the data frame a
full of NA
values.
$rowNum1 <- c(1,2)
a a
n rowNum1
1 1 1
2 2 2
3 3 1
4 4 2
Suppose you have two vectors in R:
Vector A: c(1, 2, 3)
Vector B: c(4, 5)
If you perform the operation A + B, what will be the result of vector recycling?
Please select the correct option.
What vector could you add to this vector so the sum is the vector (1, 2, 1, 2)?
rep(1, 4)
[1] 1 1 1 1
<- rep(1, times = 4)
r1 <- rep(c(0,1), times = 2)
r2 r1
[1] 1 1 1 1
r2
[1] 0 1 0 1
+ r2 r1
[1] 1 2 1 2
$rowNum6 <- r1 + r2
a a
n rowNum1 rowNum6
1 1 1 1
2 2 2 2
3 3 1 1
4 4 2 2
for
loopsLoops allow you to repeat actions on each item from a vector of items.
Here is an example for
loop, iterating through the values of i
from 1 to 3:
for (i in 1:3) {
print(paste("i =",i))
}
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"
This does the same thing as this repetitive code:
<- c(1,2,3)
i.vector <- i.vector[1]
i print(paste("i =",i))
[1] "i = 1"
<- i.vector[2]
i print(paste("i =",i))
[1] "i = 2"
<- i.vector[3]
i print(paste("i =",i))
[1] "i = 3"
If you know in advance how many loops you’ll need, use a for
loop.
Think about how as i
increments from 1 to nrow(a)
, how could we map that sequence (e.g. 1, 2, 3, 4) to the desired sequence of 1, 2, 1, 2.
An if
statement might be useful here.
# Set value that we want to iterate 1, 2, 1, 2, ...
<- 1
j # Initialize rowNum2 to all missing values
$rowNum2 <- NA
a# Start the for loop, looping over the number of rows in a
for (i in c(1:nrow(a))) {
# Assign value j to row i
$rowNum2[i] <- j
a# Increment j
<- j + 1
j # If j is greater than 2, set it back to 1
if (j > 2) {
<- 1
j
}
} a
n rowNum1 rowNum6 rowNum2
1 1 1 1 1
2 2 2 2 2
3 3 1 1 1
4 4 2 2 2
while
loopsHere’s an example while
loop:
<- 1
i while (i < 4) {
print(paste("i =",i))
<- i + 1
i }
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"
If you know the loop exit criterion but not how many loops, use a while
loop or a repeat
loop.
while
: tests the condition at the start of the loop
The modulo operator %%
might be helpful here.
$rowNum3 <- NA
a<- 1 #set index
i while(i <= nrow(a)){ #set conditions for while loop
if ((i %% 2)) { #if statement for when "i" is odd
$rowNum3[i] <- 1
a
}else #else statement for when "i" is even
$rowNum3[i] <- 2
a
<- i + 1 #counter for "i", increments by 1 with each loop iteration
i
} a
n rowNum1 rowNum6 rowNum2 rowNum3
1 1 1 1 1 1
2 2 2 2 2 2
3 3 1 1 1 1
4 4 2 2 2 2
Or we can do this more concisely using an ifelse
statement:
$rowNum3 <- NA
a<- 1 #set index
i while(i <= nrow(a)){ #set conditions for while loop
$rowNum3[i] <- ifelse(i %% 2, 1, 2)
a<- i + 1 #counter for "i", increments by 1 with each loop iteration
i
} a
n rowNum1 rowNum6 rowNum2 rowNum3
1 1 1 1 1 1
2 2 2 2 2 2
3 3 1 1 1 1
4 4 2 2 2 2
repeat
loopsHere’s an example repeat
loop:
<- 1
i repeat {
print(paste("i =",i))
<- i + 1
i if (i > 3) break
}
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"
If you know the loop exit criterion but not how many loops, use a while
loop or a repeat
loop.
repeat
: tests the condition at the end of the loop
$rowNum4 <- NA
a<- 1 #set index
i repeat {
if ((i %% 2)) { #if statement for when "i" is odd
$rowNum4[i] <- 1
a
}else #else statement for when "i" is even
$rowNum4[i] <- 2
a
<- i + 1 #counter for "i", increments by 1 with each loop iteration
i if (i > nrow(a)) {
break
}
} a
n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 1 1 1 1 1
4 4 2 2 2 2 2
rep
function# This will only work correctly if nrow(a) is even
$rowNum5 <- rep(c(1,2), nrow(a)/2)
a a
n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2
3 3 1 1 1 1 1 1
4 4 2 2 2 2 2 2
# All even rows
$rowNum1==2,] a[a
n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
2 2 2 2 2 2 2 2
4 4 2 2 2 2 2 2
# All odd rows
$rowNum1==1,] a[a
n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
1 1 1 1 1 1 1 1
3 3 1 1 1 1 1 1
# List rows 3 and 4
c(3,4),] a[
n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
3 3 1 1 1 1 1 1
4 4 2 2 2 2 2 2
Learning objective: Learn how to alter the options of an R command to achieve your goals.
This exercise should help answer this question: “When reading a file, will missing data be automatically represented as NA values, or does that need to be coded/manually curated?”
The tab-delimited file in testdata.txt
contains the following data:
1 1 1
2 2 2
3 NA 99
4 4 4
Your collaborator who gave you these data informed you that in this file 99
stands for a missing value, as does NA
.
However if we use the read.table
command with its default options to read this in, we fail to accomplish the desired task, as 99
is not reading as a missing value:
<- "data/testdata.txt"
infile # Adjust the read.table options to read the file correctly as desired.
<- read.table(infile)
b b
V1 V2 V3
1 1 1 1
2 2 2 2
3 3 NA 99
4 4 4 4
str(b)
'data.frame': 4 obs. of 3 variables:
$ V1: int 1 2 3 4
$ V2: int 1 2 NA 4
$ V3: int 1 2 99 4
Read the help page for the read.table
command
To read this in properly, we have to let ‘read.table’ know that there is no header and that which values should be mapped to the missing NA value:
<- read.table(infile, header = FALSE, na.strings = c("NA","99"))
b b
V1 V2 V3
1 1 1 1
2 2 2 2
3 3 NA NA
4 4 4 4
str(b)
'data.frame': 4 obs. of 3 variables:
$ V1: int 1 2 3 4
$ V2: int 1 2 NA 4
$ V3: int 1 2 NA 4
summary(b)
V1 V2 V3
Min. :1.00 Min. :1.000 Min. :1.000
1st Qu.:1.75 1st Qu.:1.500 1st Qu.:1.500
Median :2.50 Median :2.000 Median :2.000
Mean :2.50 Mean :2.333 Mean :2.333
3rd Qu.:3.25 3rd Qu.:3.000 3rd Qu.:3.000
Max. :4.00 Max. :4.000 Max. :4.000
NA's :1 NA's :1