Functions & Automation


Welcome to Week 6!

Welcome to week 6 of Earth Analytics! This week you will learn about efficient coding practices. Specifically, you will learn how to use functions to make our code:

  • Easier to read / simpler
  • More modular
  • More efficient

You will also learn the DRY principle of programming - Don’t Repeat Yourself.

This week is still under development and the content won’t be done until Friday!

TimeTopicSpeaker
9:30 - 9:45Questions / Review 
9:45 - 10:15Write efficient, expressive code - Don’t Repeat yourself DRY 
10:15 - 11:00Write functions in R 
11:- 12:20Loops & functions to automate workflows 

1a. Readings

There are no new readings for this week.

2. Complete the assignment below (5 points)

Homework Submission

Produce a R Markdown Report

Create a new R markdown document. Name it: lastName-firstInitial-week6.Rmd Within your .Rmd document, include the plots listed below. When you are done with your report, use knitr to convert it to html format. Submit both the .Rmd file and the .html (or .pdf file to D2L. Be sure to name your files as instructed above!

Use knitr Code Chunk Arguments

For this week’s assignment please do not hide your code. We will grade the assignment based upon your use of functions and for loops to complete your assignment.

Answer the Following Questions Below in Your Report

  1. Define the acronym DRY. What does DRY mean?
  2. When you document a function, what elements should you include?
  3. Provide an example of a function name that is expressive vs one that is not expressive.
  4. Explain the key difference between a variable that you create when programming line by line compared to a variable that is created within a function. Use the example below to help you answer the question OR use code to answer the question.
# this code below should help you answer question 3.
my_variable <- 1+2

# and a variable created within a function
the_answer <- function(num1, num2){
  # calculate sum
  my_variable2 <- num1 + num2
  return(my_variable)
  }

# given the code above, will the code below run or return an error? Why?
`my_variable2`

Code Assignment

PART ONE:

For this week’s assignment, write some code that does the following.

  1. Write a for loop that breaks up the file: "data/week-02/precipitation/805325-precip-daily-2003-2013.csv" into yearly .csv files.
  2. Each .csv file should be saved in data/week-06/.
  3. Each .csv file should contain a month column with the month specified as a numeric value between 1-12.

# read in the data - but be sure to address na values when you read it in here!
boulder_precip <- read.csv("path-here-dont-forget-na-arguments")

# fix the date using dplyr pipes


# define min and max year

# build your loop
for (the_year in min_yr:max_yr) {
    # use pipes to filter the data by year


   # export a .csv file with the year in the name to your data/week-06/ dir
}

HINT: If you followed along in class, then you have already written this code! This lesson will help you complete this task.

PART TWO:

For each of the .csv files that you created above:

  1. Create a new .csv file that contains the total monthly precipitation in mm.
  2. Name that file data/week-06/outputs/precip_mm/precip-year.csv - note that you will need to make new directories to save the file to the path listed.

An example of what the final data should look like is below:

# open data
precip_2003 <- read.csv("data/week-06/outputs/precip_mm/precip-2003.csv")
head(precip_2003, n = 6)
##   X.1 X     STATION    STATION_NAME ELEVATION LATITUDE LONGITUDE
## 1   1 1 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 2   2 2 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 3   3 3 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 4   4 4 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 5   5 5 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 6   6 6 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
##                  DATE HPCP Measurement.Flag Quality.Flag month precip_mm
## 1 2003-01-01 01:00:00  0.0                g         1000     1      0.00
## 2 2003-02-01 01:00:00  0.0                g         1000     2      0.00
## 3 2003-02-02 19:00:00  0.2                          1000     2      5.08
## 4 2003-02-02 22:00:00  0.1                          1000     2      2.54
## 5 2003-02-03 02:00:00  0.1                          1000     2      2.54
## 6 2003-02-05 02:00:00  0.1                          1000     2      2.54

Use functions to complete this task as follows:

  1. Create a function called check_create_dir() that takes a path to a directory that you want to make and checks to see if it exists and then creates it if it doesn’t exist.
  2. Create a function called in_to_mm() that converts values in inches to mm (you did this in an earlier lesson so you may already have this function in your code).

NOTES:

  • Be sure to consider NA values in your data (e.g. 999.99) when you read your data!
  • Make sure you address NA values when you run the sum() function
  • When you write the .csv make sure that you address NA values!

check_create_dir <- function(dir_path){
  # document your function here

  # include the code required to check for the directory and then create it here
  # because this function is just creating a directory, you don't need to return anything!

}

in_to_mm <- function(precip_in){
  # document your function here

  # include the code required to convert inches to mm here

  return(precip_mm)
}

# create an object with the directory name
new_dir <- "data/week-06/outputs/precip_mm/"
# check to see if the directory exists - make it if it doesn't
check_create_dir(new_dir)

# print the name of each file
for (file in all_precip_files) {
  # read in the csv - be sure to fill in the na strings argument - i didn't do that below
  the_data <- read.csv(file, header = TRUE, na.strings = 999.99) %>%
    mutate(precip_mm = in_to_mm(HPCP)) # add a column with precip in mm and a column with just the month using the month() function
    # group the data by month

    # summarise using the sum function - be sure you address na values when you sum! we discussed this during week 1

  # write output to a new .csv file
  write_csv(the_data, path = paste0("data/week-06/outputs/precip_mm/", basename(file)))
}

Bonus Opportunity - 1 point

Use the lapply() function (instead of a for loop) to

  1. process all of the precipitation data files files that you created for each year, and
  2. add a new column to the file containing anything that you’d like
  3. write a new .csv file to a new directory with that output file.

You can chose to use the same code that you used for the homework assignment, however implemented in a for loop if you want.

Homework Due: Monday October 16 2017 @ 8AM.

Submit your report in both .Rmd and .html format to the D2L dropbox.

Grade Rubric

R Markdown Report Structure & Code: 10%

Full CreditNo Credit
html / pdf and RMD submitted 
Code is written using “clean” code practices following the Hadley Wickham style guide 
Code chunk contains code and runs 
All required R packages are listed at the top of the document in a code chunk. 
Code chunk arguments are used to hide warnings & messages 
All code is visible in the knitted document 
Lines of code are broken up at commas to make the code more readable 

Report Questions: 30%

Full CreditNo Credit
Define the acronym D.R.Y.. What does DRY mean? 
When you document a function, what documentation elements should you include? 
Provide an example of a function name that is expressive vs. one that is not expressive 
Explain the key difference between a variable that you create when programming line by line compared to a variable that is created within a function. 

Code is Worth 60% of the Assignment Grade This Week

For Loop 1 & General for Loop 2

Write a loop that takes the file “data/week-02/precipitation/805325-precip-daily-2003-2013.csv” and creates an individual .csv file for each years worth of data.

Full CreditNo Credit
Code produces an individual .csv file for each year’s worth of data 
Following the code, .csv files are saved in the data/week-06/ directory 
.csv files are named correctly - including the year of data that the file contains 
NA values are handled properly in the code - when the data are read in, and exported to .csv files and for the monthly summary calculation. 
.csv files created contain the correct data (for the year specified) 

Specific for Loop 2

Loop through the files that you created in part one and summarize the data by monthly total precipitation in inches. Create new .csv files for each year.

Full CreditNo Credit
in_to_mm() function is used to convert precipitation from inches to mm 
check_create_dir() function is used to check for and create a directory if one doesn’t exist. 
Data in individual yearly .csv files are summarized by month. 
.csv files are saved in the data/week-06/outputs/precip_mm/ directory 
All functions are documented with what the function does, inputs, outputs and structure of inputs and outputs. 

Updated: