# Functions & Automation

## Welcome to Week 6!

Welcome to week 6 of Earth Analytics! This week you will learn about efficient coding practices. Specifically, you will learn how to use functions to make our code:

• Easier to read / simpler
• More modular
• More efficient

You will also learn the DRY principle of programming - Don’t Repeat Yourself.

This week is still under development and the content won’t be done until Friday!

TimeTopicSpeaker
9:30 - 9:45Questions / Review
9:45 - 10:15Write efficient, expressive code - Don’t Repeat yourself DRY
10:15 - 11:00Write functions in R
11:- 12:20Loops & functions to automate workflows

There are no new readings for this week.

## Homework Submission

### Produce a R Markdown Report

Create a new R markdown document. Name it: lastName-firstInitial-week6.Rmd Within your .Rmd document, include the plots listed below. When you are done with your report, use knitr to convert it to html format. Submit both the .Rmd file and the .html (or .pdf file to D2L. Be sure to name your files as instructed above!

#### Use knitr Code Chunk Arguments

For this week’s assignment please do not hide your code. We will grade the assignment based upon your use of functions and for loops to complete your assignment.

1. Define the acronym DRY. What does DRY mean?
2. When you document a function, what elements should you include?
3. Provide an example of a function name that is expressive vs one that is not expressive.
4. Explain the key difference between a variable that you create when programming line by line compared to a variable that is created within a function. Use the example below to help you answer the question OR use code to answer the question.
# this code below should help you answer question 3.
my_variable <- 1+2

# and a variable created within a function
# calculate sum
my_variable2 <- num1 + num2
return(my_variable)
}

# given the code above, will the code below run or return an error? Why?
my_variable2


#### Code Assignment

PART ONE:

For this week’s assignment, write some code that does the following.

1. Write a for loop that breaks up the file: "data/week-02/precipitation/805325-precip-daily-2003-2013.csv" into yearly .csv files.
2. Each .csv file should be saved in data/week-06/.
3. Each .csv file should contain a month column with the month specified as a numeric value between 1-12.

# read in the data - but be sure to address na values when you read it in here!

# fix the date using dplyr pipes

# define min and max year

for (the_year in min_yr:max_yr) {
# use pipes to filter the data by year

# export a .csv file with the year in the name to your data/week-06/ dir
}


PART TWO:

For each of the .csv files that you created above:

1. Create a new .csv file that contains the total monthly precipitation in mm.
2. Name that file data/week-06/outputs/precip_mm/precip-year.csv - note that you will need to make new directories to save the file to the path listed.

An example of what the final data should look like is below:

# open data
##   X.1 X     STATION    STATION_NAME ELEVATION LATITUDE LONGITUDE
## 1   1 1 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 2   2 2 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 3   3 3 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 4   4 4 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 5   5 5 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
## 6   6 6 COOP:050843 BOULDER 2 CO US      1650    40.03    -105.3
##                  DATE HPCP Measurement.Flag Quality.Flag month precip_mm
## 1 2003-01-01 01:00:00  0.0                g         1000     1      0.00
## 2 2003-02-01 01:00:00  0.0                g         1000     2      0.00
## 3 2003-02-02 19:00:00  0.2                          1000     2      5.08
## 4 2003-02-02 22:00:00  0.1                          1000     2      2.54
## 5 2003-02-03 02:00:00  0.1                          1000     2      2.54
## 6 2003-02-05 02:00:00  0.1                          1000     2      2.54


Use functions to complete this task as follows:

1. Create a function called check_create_dir() that takes a path to a directory that you want to make and checks to see if it exists and then creates it if it doesn’t exist.
2. Create a function called in_to_mm() that converts values in inches to mm (you did this in an earlier lesson so you may already have this function in your code).

NOTES:

• Be sure to consider NA values in your data (e.g. 999.99) when you read your data!
• Make sure you address NA values when you run the sum() function
• When you write the .csv make sure that you address NA values!

check_create_dir <- function(dir_path){

# include the code required to check for the directory and then create it here
# because this function is just creating a directory, you don't need to return anything!

}

in_to_mm <- function(precip_in){

# include the code required to convert inches to mm here

return(precip_mm)
}

# create an object with the directory name
new_dir <- "data/week-06/outputs/precip_mm/"
# check to see if the directory exists - make it if it doesn't
check_create_dir(new_dir)

# print the name of each file
for (file in all_precip_files) {
# read in the csv - be sure to fill in the na strings argument - i didn't do that below
mutate(precip_mm = in_to_mm(HPCP)) # add a column with precip in mm and a column with just the month using the month() function
# group the data by month

# summarise using the sum function - be sure you address na values when you sum! we discussed this during week 1

# write output to a new .csv file
write_csv(the_data, path = paste0("data/week-06/outputs/precip_mm/", basename(file)))
}


#### Bonus Opportunity - 1 point

Use the lapply() function (instead of a for loop) to

1. process all of the precipitation data files files that you created for each year, and
2. add a new column to the file containing anything that you’d like
3. write a new .csv file to a new directory with that output file.

You can chose to use the same code that you used for the homework assignment, however implemented in a for loop if you want.

## Homework Due: Monday October 16 2017 @ 8AM.

Submit your report in both .Rmd and .html format to the D2L dropbox.

#### R Markdown Report Structure & Code: 10%

Full CreditNo Credit
html / pdf and RMD submitted
Code is written using “clean” code practices following the Hadley Wickham style guide
Code chunk contains code and runs
All required R packages are listed at the top of the document in a code chunk.
Code chunk arguments are used to hide warnings & messages
All code is visible in the knitted document
Lines of code are broken up at commas to make the code more readable

#### Report Questions: 30%

Full CreditNo Credit
Define the acronym D.R.Y.. What does DRY mean?
When you document a function, what documentation elements should you include?
Provide an example of a function name that is expressive vs. one that is not expressive
Explain the key difference between a variable that you create when programming line by line compared to a variable that is created within a function.

### Code is Worth 60% of the Assignment Grade This Week

#### For Loop 1 & General for Loop 2

Write a loop that takes the file “data/week-02/precipitation/805325-precip-daily-2003-2013.csv” and creates an individual .csv file for each years worth of data.

Full CreditNo Credit
Code produces an individual .csv file for each year’s worth of data
Following the code, .csv files are saved in the data/week-06/ directory
.csv files are named correctly - including the year of data that the file contains
NA values are handled properly in the code - when the data are read in, and exported to .csv files and for the monthly summary calculation.
.csv files created contain the correct data (for the year specified)

#### Specific for Loop 2

Loop through the files that you created in part one and summarize the data by monthly total precipitation in inches. Create new .csv files for each year.

Full CreditNo Credit
in_to_mm() function is used to convert precipitation from inches to mm
check_create_dir() function is used to check for and create a directory if one doesn’t exist.
Data in individual yearly .csv files are summarized by month.
.csv files are saved in the data/week-06/outputs/precip_mm/ directory
All functions are documented with what the function does, inputs, outputs and structure of inputs and outputs.

Updated: