8  R Basics Group Exercise

8.1 Question: Recycling in a dataframe

Suppose you have a dataframe df with three columns, A, B, and C, as follows:

df <- data.frame(
A = c(1, 2, 3, 4),
B = c(5, 6, 7, 8),
C = c(9, 10, 11, 12)
)
df
  A B  C
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12

Now, you want to insert a shorter vector D into the df dataframe:

df$D <- c(13,14)

What will be the D column of df after the operation?

Please select the correct option.

8.2 Exercise 1: recycling

This exercise should help answer this question: ‘In what type of situations would “recycling” be useful?’

First, let’s set up the data frame a

a <- data.frame(n = 1:4)
dim(a)
[1] 4 1
a
  n
1 1
2 2
3 3
4 4
Task

Use recycling to insert into the data frame a a column named rowNum1 that contains a 1 in odd rows and a 2 in even rows.

Warning

If the following WebR chunk is working properly, you should see an editor window below the Run code tab displaying this line of R code: (a <- data.frame(n = 1:4)).

Tip

The R command

a$rowNum1 <- NA

would insert a new row into the data frame a full of NA values.

a$rowNum1 <- c(1,2)
a
  n rowNum1
1 1       1
2 2       2
3 3       1
4 4       2

8.3 Question: Vector addition and recycling

Suppose you have two vectors in R:

Vector A: c(1, 2, 3)
Vector B: c(4, 5)

If you perform the operation A + B, what will be the result of vector recycling?

Please select the correct option.

8.4 Exercise 2: vector addition

Task

Use vector addition to construct a vector of length 4 that contains a 1 in odd positions and a 2 in even positions. Then insert this vector into the data frame a into a column named rowNum6.

Tip

What vector could you add to this vector so the sum is the vector (1, 2, 1, 2)?

rep(1, 4)
[1] 1 1 1 1
r1 <- rep(1, times = 4)
r2 <- rep(c(0,1), times = 2)
r1
[1] 1 1 1 1
r2
[1] 0 1 0 1
r1 + r2
[1] 1 2 1 2
a$rowNum6 <- r1 + r2
a
  n rowNum1 rowNum6
1 1       1       1
2 2       2       2
3 3       1       1
4 4       2       2

8.5 Exercise 3: for loops

Loops allow you to repeat actions on each item from a vector of items.

Here is an example for loop, iterating through the values of i from 1 to 3:

for (i in 1:3) {
  print(paste("i =",i))
}
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"

This does the same thing as this repetitive code:

i.vector <- c(1,2,3)
i <- i.vector[1]
print(paste("i =",i))
[1] "i = 1"
i <- i.vector[2]
print(paste("i =",i))
[1] "i = 2"
i <- i.vector[3]
print(paste("i =",i))
[1] "i = 3"
Note

If you know in advance how many loops you’ll need, use a for loop.

Task

Use a for loop to insert into the data frame a a column named rowNum2 that contains a 1 in odd rows and a 2 in even rows.

Tip

Think about how as i increments from 1 to nrow(a), how could we map that sequence (e.g. 1, 2, 3, 4) to the desired sequence of 1, 2, 1, 2.

An if statement might be useful here.

# Set value that we want to iterate 1, 2, 1, 2, ...
j <- 1
# Initialize rowNum2 to all missing values
a$rowNum2 <- NA  
# Start the for loop, looping over the number of rows in a
for (i in c(1:nrow(a))) {
   # Assign value j to row i
   a$rowNum2[i] <- j
   # Increment j
   j <- j + 1
   # If j is greater than 2, set it back to 1
   if (j > 2) {
     j <- 1
   }
}
a
  n rowNum1 rowNum6 rowNum2
1 1       1       1       1
2 2       2       2       2
3 3       1       1       1
4 4       2       2       2

8.6 Exercise 4: while loops

Here’s an example while loop:

i <- 1
while (i < 4) {
  print(paste("i =",i))
  i <- i + 1
}
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"
Note

If you know the loop exit criterion but not how many loops, use a while loop or a repeat loop.

while: tests the condition at the start of the loop

Task

Use a while loop to insert into the data frame a a column named rowNum3 that contains a 1 in odd rows and a 2 in even rows.

Tip

The modulo operator %% might be helpful here.

a$rowNum3 <- NA
i <- 1 #set index
while(i <= nrow(a)){ #set conditions for while loop

  if ((i %% 2)) { #if statement for when "i" is odd
    a$rowNum3[i] <- 1
  }
  else #else statement for when "i" is even
    a$rowNum3[i] <- 2
  
  i <- i + 1 #counter for "i", increments by 1 with each loop iteration
} 
a
  n rowNum1 rowNum6 rowNum2 rowNum3
1 1       1       1       1       1
2 2       2       2       2       2
3 3       1       1       1       1
4 4       2       2       2       2

Or we can do this more concisely using an ifelse statement:

a$rowNum3 <- NA
i <- 1 #set index
while(i <= nrow(a)){ #set conditions for while loop
  a$rowNum3[i] <- ifelse(i %% 2, 1, 2)
  i <- i + 1 #counter for "i", increments by 1 with each loop iteration
} 
a
  n rowNum1 rowNum6 rowNum2 rowNum3
1 1       1       1       1       1
2 2       2       2       2       2
3 3       1       1       1       1
4 4       2       2       2       2

8.7 Exercise 5: repeat loops

Here’s an example repeat loop:

i <- 1
repeat {
  print(paste("i =",i))
  i <- i + 1
  if (i > 3) break
}
[1] "i = 1"
[1] "i = 2"
[1] "i = 3"
Note

If you know the loop exit criterion but not how many loops, use a while loop or a repeat loop.

repeat: tests the condition at the end of the loop

Task

Use a repeat loop to insert into the data frame a a column named rowNum4 that contains a 1 in odd rows and a 2 in even rows.

a$rowNum4 <- NA
i <- 1 #set index
repeat { 

  if ((i %% 2)) { #if statement for when "i" is odd
    a$rowNum4[i] <- 1
  }
  else #else statement for when "i" is even
    a$rowNum4[i] <- 2
  
  i <- i + 1 #counter for "i", increments by 1 with each loop iteration
  if (i > nrow(a)) {
    break
  }
} 
a
  n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4
1 1       1       1       1       1       1
2 2       2       2       2       2       2
3 3       1       1       1       1       1
4 4       2       2       2       2       2

8.8 Exercise 6: using the rep function

Task

Use the rep command to insert into the data frame a a column named rowNum5 that contains a 1 in odd rows and a 2 in even rows.

# This will only work correctly if nrow(a) is even
a$rowNum5 <- rep(c(1,2), nrow(a)/2)
a
  n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
1 1       1       1       1       1       1       1
2 2       2       2       2       2       2       2
3 3       1       1       1       1       1       1
4 4       2       2       2       2       2       2

8.9 Exercise 7

Tasks

Task 1: List all even rows of the data frame a.

Task 2: List rows 3 and 4 of the data frame a.

# All even rows
a[a$rowNum1==2,]
  n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
2 2       2       2       2       2       2       2
4 4       2       2       2       2       2       2
# All odd rows
a[a$rowNum1==1,]
  n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
1 1       1       1       1       1       1       1
3 3       1       1       1       1       1       1
# List rows 3 and 4
a[c(3,4),]
  n rowNum1 rowNum6 rowNum2 rowNum3 rowNum4 rowNum5
3 3       1       1       1       1       1       1
4 4       2       2       2       2       2       2

8.10 Exercise 8

Note

Learning objective: Learn how to alter the options of an R command to achieve your goals.

This exercise should help answer this question: “When reading a file, will missing data be automatically represented as NA values, or does that need to be coded/manually curated?”

The tab-delimited file in testdata.txt contains the following data:

1       1       1
2       2       2
3       NA      99
4       4       4

Your collaborator who gave you these data informed you that in this file 99 stands for a missing value, as does NA.

However if we use the read.table command with its default options to read this in, we fail to accomplish the desired task, as 99 is not reading as a missing value:

infile <- "data/testdata.txt"
# Adjust the read.table options to read the file correctly as desired.
b <- read.table(infile)
b
  V1 V2 V3
1  1  1  1
2  2  2  2
3  3 NA 99
4  4  4  4
str(b)
'data.frame':   4 obs. of  3 variables:
 $ V1: int  1 2 3 4
 $ V2: int  1 2 NA 4
 $ V3: int  1 2 99 4
Task

Use the read.table command to read this file in while automatically setting both the ’NA” and the 99 to NA. This should be done by adjusting the various options of the read.table command.

Tip

Read the help page for the read.table command

To read this in properly, we have to let ‘read.table’ know that there is no header and that which values should be mapped to the missing NA value:

b <- read.table(infile, header = FALSE, na.strings = c("NA","99"))
b
  V1 V2 V3
1  1  1  1
2  2  2  2
3  3 NA NA
4  4  4  4
str(b)
'data.frame':   4 obs. of  3 variables:
 $ V1: int  1 2 3 4
 $ V2: int  1 2 NA 4
 $ V3: int  1 2 NA 4
summary(b)
       V1             V2              V3       
 Min.   :1.00   Min.   :1.000   Min.   :1.000  
 1st Qu.:1.75   1st Qu.:1.500   1st Qu.:1.500  
 Median :2.50   Median :2.000   Median :2.000  
 Mean   :2.50   Mean   :2.333   Mean   :2.333  
 3rd Qu.:3.25   3rd Qu.:3.000   3rd Qu.:3.000  
 Max.   :4.00   Max.   :4.000   Max.   :4.000  
                NA's   :1       NA's   :1