14 Loops in R, Part II
14.1 Acknowledgment/License
The original source for this chapter was from the web site
https://datacarpentry.org/semester-biology/
which was built using this underlying code
https://github.com/datacarpentry/semester-biology
and is used under the
Attribution 4.0 International (CC BY 4.0)
license https://creativecommons.org/licenses/by/4.0/.
The material presented here has been modified from the original source.
Accordingly this chapter is made available under the same license terms.
14.2 Source code
If you’d like to work within R Studio using the source code of this chapter, you can obtain it from here.
14.3 Looping with functions
- It is common to combine loops with with functions by calling one or more functions as a step in our loop
- For example, let’s take the non-vectorized version of our
est_mass
function that returns an estimated mass if thevolume > 5
andNA
if it’s not.
- We can’t pass the vector to the function and get back a vector of results because of the
if
statements - So let’s loop over the values
- First we’ll create an empty vector to store the results
- And them loop by index, callling the function for each value of
volumes
- This is the for loop equivalent of an
mapply
statement
If dinosaur_lengths.csv
is not already in your working directory download a copy of the data on dinosaur lengths with species names. Load it into R.
Write a function mass_from_length()
that uses the equation mass <- a * length^b
to estimate the size of a dinosaur from its length. This function should take two arguments, length
and species
. For each of the following inputs for species
, use the given values of a
and b
for the calculation:
- For
Stegosauria
:a = 10.95
andb = 2.64
(Seebacher 2001). - For
Theropoda
:a = 0.73
andb = 3.63
(Seebacher 2001). - For
Sauropoda
:a
=214.44
andb = 1.46
(Seebacher 2001). - For any other value of
species
:a = 25.37
andb = 2.49
.
- Use this function and a for loop to calculate the estimated mass for each dinosaur, store the masses in a vector, and after all of the calculations are complete show the first few items in the vector using
head()
.
- Add the results in the vector back to the original data frame. Show the first few rows of the data frame using
head()
.
- Calculate the mean mass for each
species
usingdplyr
.
14.4 Looping over files
- Repeat same actions on many similar files
- Let’s download some simulated satellite collar data
- Now we need to get the names of each of the files we want to loop over
- We do this using
list.files()
- If we run it without arguments it will give us the names of all files in the directory
- But we just want the data files so we’ll add the optional
pattern
argument to only get the files that start with"locations-"
- Once we have this list we can loop over it count the number of observations in each file
- First create an empty vector to store those counts
- Then write our loop
Exercise uses different collar data
You have a satellite collars on a number of different individuals and want to be able to quickly look at all of their recent movements at once. The data is posted daily to a zip file that contains one csv file for each individual: data/individual_collar_data.zip
Start your solution by:
- If
individual_collar_data.zip
is not already in your working directory download the zip file usingdownload.file()
- Unzip it using
unzip()
- Obtain a list of all of the files with file names matching the pattern
"collar-data-.*.txt"
(usinglist.files()
)
- Use a loop to load each of these files into R and make a line plot (using
geom_path()
) for each file withlong
on thex
axis andlat
on they
axis. Graphs, like other types of output, won’t display inside a loop unless you explicitly display them, so you need put yourggplot()
command inside aprint()
statement.
Include the name of the file in the graph as the graph title using labs()
.
14.5 Storing loop results in a data frame
- We often want to calculate multiple pieces of information in a loop making it useful to store results in things other than vectors
- We can store them in a data frame instead by creating an empty data frame and storing the results in the
i
th row of the appropriate column - Associate the file name with the count
- Also store the minimum latitude
- Start by creating an empty data frame
- Use the
data.frame
function - Provide one argument for each column
- “Column Name” = “an empty vector of the correct type”
- Now let’s modify our loop from last time
- Instead of storing
count
inresults[i]
we need to first specify thecount
column using the$
:results$count[i]
- We also want to store the filename, which is
data_files[i]
Exercise uses different collar data
- Add code to the loop to calculate the minimum and maximum latitude in the file, and store these values, along with the name of the file, in a data frame. Show the data frame as output.
If you’re interested in seeing another application of for loops, check out the code below used to simulate the data for this exercise using for loops.
= paste(c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'), c(1:10), sep = "")
individuals for (individual in individuals) {
= vector("numeric", 24)
lat = vector("numeric", 24)
long 1] = rnorm(1, mean = 26, sd = 2)
lat[1] = rnorm(1, mean = -35, sd = 3)
long[for (i in 2:24) {
= lat[i - 1] + rnorm(1, mean = 0, sd = 1)
lat[i] = long[i - 1] + rnorm(1, mean = 0, sd = 1)
long[i]
}= seq(from=as.POSIXct("2016-02-26 00:00", tz="UTC"),
times to=as.POSIXct("2016-02-26 23:00", tz="UTC"),
by="hour")
= data.frame(date = "2016-02-26",
df collar = individual,
time = times,
lat = lat,
long = long)
write.csv(df, paste("collar-data-", individual, "-2016-02-26.txt", sep = ""))
}zip("data/individual_collar_data.zip", list.files(pattern = "collar-data-[A-Z][0-9]+-.*"))
14.6 Subsetting Data
- Loops can subset in ways that are difficult with things like
group_by
- Look at some data on trees from the National Ecological Observatory Network
- Look at a north-south gradient in number of trees
- Need to know number of trees in each band of y values
- Start by defining the size of the window we want to use
- Use the grid lines which are 2.5 m
- Then figure out the edges for each window
- But we don’t want to go all the way to the far edge
- Set up an empty data frame to store the output
- Look over the left edges and subset the data occuring within each window
14.7 Nested Loops
Sometimes need to loop over multiple things in a coordinate fashion
Pass a window over some spatial data
Look at full spatial pattern not just east-west gradient
Basic nested loops work by putting one loop inside another one
- Loop over x and y coordinates to create boxes
- Need top and bottom edges
- Redefine out storage
14.8 Sequence along
seq_along()
generates a vector of numbers from 1 tolength(volumes)