14  Loops in R, Part II

14.1 Acknowledgment/License

The original source for this chapter was from the web site

https://datacarpentry.org/semester-biology/

which was built using this underlying code

https://github.com/datacarpentry/semester-biology

and is used under the

Attribution 4.0 International (CC BY 4.0)

license https://creativecommons.org/licenses/by/4.0/.

The material presented here has been modified from the original source.

Accordingly this chapter is made available under the same license terms.

14.2 Source code

If you’d like to work within R Studio using the source code of this chapter, you can obtain it from here.

14.3 Looping with functions

  • It is common to combine loops with with functions by calling one or more functions as a step in our loop
  • For example, let’s take the non-vectorized version of our est_mass function that returns an estimated mass if the volume > 5 and NA if it’s not.
  • We can’t pass the vector to the function and get back a vector of results because of the if statements
  • So let’s loop over the values
  • First we’ll create an empty vector to store the results
  • And them loop by index, callling the function for each value of volumes
  • This is the for loop equivalent of an mapply statement
Do Size Estimates By Name Loop.

If dinosaur_lengths.csv is not already in your working directory download a copy of the data on dinosaur lengths with species names. Load it into R.

Write a function mass_from_length() that uses the equation mass <- a * length^b to estimate the size of a dinosaur from its length. This function should take two arguments, length and species. For each of the following inputs for species, use the given values of a and b for the calculation:

  • For Stegosauria: a = 10.95 and b = 2.64 (Seebacher 2001).
  • For Theropoda: a = 0.73 and b = 3.63 (Seebacher 2001).
  • For Sauropoda: a = 214.44 and b = 1.46 (Seebacher 2001).
  • For any other value of species: a = 25.37 and b = 2.49.
  1. Use this function and a for loop to calculate the estimated mass for each dinosaur, store the masses in a vector, and after all of the calculations are complete show the first few items in the vector using head().
  1. Add the results in the vector back to the original data frame. Show the first few rows of the data frame using head().
  1. Calculate the mean mass for each species using dplyr.

14.4 Looping over files

  • Repeat same actions on many similar files
  • Let’s download some simulated satellite collar data
  • Now we need to get the names of each of the files we want to loop over
  • We do this using list.files()
  • If we run it without arguments it will give us the names of all files in the directory
  • But we just want the data files so we’ll add the optional pattern argument to only get the files that start with "locations-"
  • Once we have this list we can loop over it count the number of observations in each file
  • First create an empty vector to store those counts
  • Then write our loop
Do Task 1 of Multiple-file Analysis.

Exercise uses different collar data

You have a satellite collars on a number of different individuals and want to be able to quickly look at all of their recent movements at once. The data is posted daily to a zip file that contains one csv file for each individual: data/individual_collar_data.zip

Start your solution by:

  • If individual_collar_data.zip is not already in your working directory download the zip file using download.file()
  • Unzip it using unzip()
  • Obtain a list of all of the files with file names matching the pattern "collar-data-.*.txt" (using list.files())
  1. Use a loop to load each of these files into R and make a line plot (using geom_path()) for each file with long on the x axis and lat on the y axis. Graphs, like other types of output, won’t display inside a loop unless you explicitly display them, so you need put your ggplot() command inside a print() statement.

Include the name of the file in the graph as the graph title using labs().

14.5 Storing loop results in a data frame

  • We often want to calculate multiple pieces of information in a loop making it useful to store results in things other than vectors
  • We can store them in a data frame instead by creating an empty data frame and storing the results in the ith row of the appropriate column
  • Associate the file name with the count
  • Also store the minimum latitude
  • Start by creating an empty data frame
  • Use the data.frame function
  • Provide one argument for each column
  • “Column Name” = “an empty vector of the correct type”
  • Now let’s modify our loop from last time
  • Instead of storing count in results[i] we need to first specify the count column using the $: results$count[i]
  • We also want to store the filename, which is data_files[i]
Do Task 2 of Multiple-file Analysis.

Exercise uses different collar data

  1. Add code to the loop to calculate the minimum and maximum latitude in the file, and store these values, along with the name of the file, in a data frame. Show the data frame as output.

If you’re interested in seeing another application of for loops, check out the code below used to simulate the data for this exercise using for loops.

individuals = paste(c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'), c(1:10), sep = "")
for (individual in individuals) {
    lat = vector("numeric", 24)
    long = vector("numeric", 24)
    lat[1] = rnorm(1, mean = 26, sd = 2)
    long[1] = rnorm(1, mean = -35, sd = 3)
    for (i in 2:24) {
        lat[i] = lat[i - 1] + rnorm(1, mean = 0, sd = 1)
        long[i] = long[i - 1] + rnorm(1, mean = 0, sd = 1)
    }
    times = seq(from=as.POSIXct("2016-02-26 00:00", tz="UTC"),
                to=as.POSIXct("2016-02-26 23:00", tz="UTC"),
                by="hour")  
    df = data.frame(date = "2016-02-26",
                    collar = individual,
                    time = times,
                    lat = lat,
                    long = long)
    write.csv(df, paste("collar-data-", individual, "-2016-02-26.txt", sep = ""))
}
zip("data/individual_collar_data.zip", list.files(pattern = "collar-data-[A-Z][0-9]+-.*"))

14.6 Subsetting Data

  • Loops can subset in ways that are difficult with things like group_by
  • Look at some data on trees from the National Ecological Observatory Network
  • Look at a north-south gradient in number of trees
  • Need to know number of trees in each band of y values
  • Start by defining the size of the window we want to use
    • Use the grid lines which are 2.5 m
  • Then figure out the edges for each window
  • But we don’t want to go all the way to the far edge
  • Set up an empty data frame to store the output
  • Look over the left edges and subset the data occuring within each window

14.7 Nested Loops

  • Sometimes need to loop over multiple things in a coordinate fashion

  • Pass a window over some spatial data

  • Look at full spatial pattern not just east-west gradient

  • Basic nested loops work by putting one loop inside another one

  • Loop over x and y coordinates to create boxes
  • Need top and bottom edges
  • Redefine out storage

14.8 Sequence along

  • seq_along() generates a vector of numbers from 1 to length(volumes)