15  Functions

15.1 Acknowledgment/License

The original source for this chapter was from the web site

https://datacarpentry.org/semester-biology/

which was built using this underlying code

https://github.com/datacarpentry/semester-biology

and is used under the

Attribution 4.0 International (CC BY 4.0)

license https://creativecommons.org/licenses/by/4.0/.

The material presented here has been modified from the original source.

Accordingly this chapter is made available under the same license terms.

15.2 Source code

If you’d like to work within R Studio using the source code of this chapter, you can obtain it from here.

15.3 Understandable and reusable code

  • Write code in understandable chunks.
  • Write reusable code.

15.4 Understandable chunks

  • Human brain can only hold limited number of things in memory
  • Write programs that don’t require remembering all of the details at once
  • Treat functions as a single conceptual chunk.

15.5 Reuse

  • Want to do the same thing repeatedly?
    • Inefficient & error prone to copy code
    • If it occurs in more than one place, it will eventually be wrong somewhere.
  • Functions are written to be reusable.

15.6 Function basics

function_name <- function(inputs) {
  output_value <- do_something(inputs)
  return(output_value)
}
  • The braces indicate that the lines of code are a group that gets run together
  • Pressing run anywhere in this group runs all the lines in that group
  • A function runs all of the lines of code in the braces
  • Using the arguments provided
  • And then returns the output
  • Creating a function doesn’t run it.
  • Call the function with some arguments.
  • Store the output to use it later in the program
Do Writing Functions

Edit the following function to replace the ________ with variables names for the input and output.

Use the function to calculate how many grams there are in 3.75 pounds.

  • Treat functions like a black box
    • Draw a box on board showing inputs->function->outputs
    • The only things the function knows about are the inputs we pass it
    • The only thing the program knows about the function is the output it produces
Do Function Execution
  • Walk through function execution (using debugger)
    • Call function
    • Assign 0.8 to length, 1.6 to width, and 2.0 to height inside function
    • Calculate the area and assign it to area
    • Calculate volume and assign it to volume
    • Send volume back as output
    • Store it in shrub_vol
  • Treat functions like a black box.
    • Can’t access a variable that was created in a function
      • > volume
      • Error: object 'volume' not found
    • Or an argument by name
      • > width
      • Error: object 'width' not found
    • ‘Global’ variables can influence function, but should not.
      • Very confusing and error prone to use a variable that isn’t passed in as an argument
Do Use and Modify.

The length of an organism is typically strongly correlated with its body mass. This is useful because it allows us to estimate the mass of an organism even if we only know its length. This relationship generally takes the form:

mass = a * length^b

Where the parameters a and b vary among groups. This allometric approach is regularly used to estimate the mass of dinosaurs since we cannot weigh something that is only preserved as bones.

The following function estimates the mass of an organism in kg based on its length in meters for a particular set of parameter values, those for Theropoda (where a has been estimated as 0.73 and b has been estimated as 3.63; Seebacher 2001).

  1. Use this function to print out the mass of a Theropoda that is 16 m long based on its reassembled skeleton.
  1. Create a new version of this function called get_mass_from_length() that takes length, a and b as arguments and uses the following code to estimate the mass mass <- a * length ^ b.

Use this function to estimate the mass of a Sauropoda (a = 214.44, b = 1.46) that is 26 m long.

15.7 Default arguments

  • Defaults can be set for common inputs.
  • For example, many of our shrubs are the same height so for those shrubs we only measure the length and width.
  • So we want a default value for the height for cases where we don’t measure it
Do Default Arguments.

This is a follow up to the Use and Modify exercise above.

Allowing a and b to be passed as arguments to get_mass_from_length() made the function more flexible, but for some types of dinosaurs we don’t have specific values of a and b and so we have to use general values that can be applied to a number of different species.

Rewrite your get_mass_from length() function from Use and Modify so that its arguments have default values of a = 39.9 and b = 2.6 (the average values from Seebacher 2001).

  1. Use this function to estimate the mass of a Sauropoda (a = 214.44, b = 1.46) that is 22 m long (by setting a and b when calling the function).
  1. Use this function to estimate the mass of a dinosaur from an unknown taxonomic group that is 16m long. Only pass the function length, not a and b, so that the default values are used.

Discuss why passing a and b in is more useful than having them fixed

15.8 Named vs unnamed arguments

  • When to use or not use argument names

Or

  • You can always use names
    • Value gets assigned to variable of that name
  • If not using names then order determines naming
    • First value is length, second value is width, third value is height
    • If order is hard to remember use names
  • In many cases there are a lot of optional arguments
    • Convention to always name optional argument
  • So, in our case, the most common approach would be

15.9 Combining Functions

  • Each function should be single conceptual chunk of code

  • Functions can be combined to do larger tasks in two ways

  • Calling multiple functions in a row

  • We can also use pipes with our own functions
  • The output from the first function becomes the first argument for the second function
Do Combining Functions.

This is a follow up to the Default Argument exercise above.

Measuring things using the metric system is the standard approach for scientists, but when communicating your results more broadly it may be useful to use different units (at least in some countries). Write a function called convert_kg_to_pounds that converts kilograms into pounds (pounds = 2.205 * kg).

Use that function and your get_mass_from_length() function from Default Arguments to estimate the weight, in pounds, of a 12 m long Stegosaurus with a = 10.95 and b = 2.64 (The estimated a and b values for Stegosauria from Seebacher 2001).

  • We can nest functions
  • But we careful with this because it can make code difficult to read

  • Don’t nest more than two functions

  • Can also call functions from inside other functions

  • Allows organizing function calls into logical groups

  • We don’t need to pass the function name into the function
  • That’s the one violation of the black box rule

15.10 Using dplyr & ggplot in functions

  • There is an extra step we need to take when working with functions from dplyr and ggplot that work with “data variables”, i.e., names of columns that are not in quotes
  • These functions use tidy evaluation, a special type of non-standard evaluation
  • This basically means they do fancy things under the surface to make them easier to work with
  • But it means they don’t work if we just pass things to functions in the most natural way
  • To fix this we have to tell our code which inputs/arguments are this special type of data variable
  • We do this by “embracing” them in double braces

15.11 Code design with functions

  • Functions let us break code up into logical chunks that can be understood in isolation
  • Write functions at the top of your code then call them at the bottom
  • The functions hold the details
  • The function calls show you the outline of the code execution
clean_data <- function(data){
  do_stuff(data)
}

process_data <- function(cleaned_data){
  do_dplyr_stuff(cleaned_data)
}

make_graph <- function(processed_data){
  do_ggplot_stuff(processed_data)
}

raw_data <- read.csv('mydata.csv')
cleaned_data <- clean_data(raw_data)
processed_data <- process_data(cleaned_data)
make_graph(processed_data)

15.12 Documentation & Comments

  • Documentation
    • How to use code
    • Use Roxygen comments for functions
  • Comments
    • Why & how code works
    • Only if it code is confusing to read

15.13 Working with functions in RStudio

  • It is possible to find and jump between functions

  • Click on list of functions at bottom of editor and select

  • Can be helpful to clearly see what is a function

  • Can have RStudio highlight them

  • Global Options -> Code -> Display -> Highlight R function calls