# Intro to R & work with time series data

## Welcome to Week 2!

Welcome to week 2 of Earth Analytics! In week 02 we will learn how to work with data in R and RStudio. We will also learn how to work with time series data. To work with time series data you need to know how to deal with date and time fields and missing data. It is also helpful to know how to subset the data by date.

The data that we use this week is collected by US Agency managed sensor networks. We will use the USGS stream gage network data and NOAA / National Weather Service precipitation data. All of the data we work with were collected in Boulder, Colorado around the time of the 2013 floods.

Read the assignment below carefully. Use the class and homework lessons to help you complete the assignment.

## Class schedule

timetopicspeaker
9:30 - 9:45 AMReview RStudio / R Markdown / questionsLeah
9:45 - 10:45R coding session - Intro to Scientific programming with RLeah
10:45 - 11:00Break
11:00 - 12:20R coding session continuedLeah

## Important - Data Organization

Before you begin this lesson, be sure that you’ve downloaded the dataset above. You will need to unzip the zip file. When you do this, be sure that your directory looks like the image below: note that all of the data are within the week_02 directory. The data are not nested within another directory. You may have to copy and paste your files into the correct directory to make this look right.

### Why data organization matters

It is important that your data are organized as specified in the lessons because:

2. It will be easier for you to follow along in class if your directory is the same as the instructors.
3. It is good practice to learn how to organize your files in a way that makes it easier for your future self to find and work with your data!

### 2. Videos

Watch the following videos:

### 3. Install QGIS & review homework lessons

Install QGIS. Use the install QGIS homework lesson as a guide if needed. Then review all of the homework lessons - they will help you complete the submission below.

## Homework (5 points): due Friday Sept 15 @ 8pm

#### 1. Create an R Markdown document

Create a new R Markdown document. Name it: youLastName-yourFirstName-week02.rmd

#### 2. Add the text that you wrote last week about the flood events

Add the text that you wrote for the first homework assignment to the top of your report. Then think about where in that text the plots below might fit best to better describe the events that occurred during the 2013 floods.

Add the plots described below to your R Markdown file. IMPORTANT Please add a figure caption to each plot that describes the contents of the plot.

Add the code to produce the following 4 plots in your R Markdown document, using the homework lessons as a guide to walk you through. Use the pipes syntax that we learned in class to subset and summarize the data as required.

Use the data/week_02/precipitation/805325-precip-dailysum-2003-2013.csv file to create:

• PLOT 1: a plot of precipitation from 2003 to 2013 using the ggplot() function.
• PLOT 2: a plot that shows precipitation SUBSETTED from Aug 15 - Oct 15 2013 using the ggplot() function.

Use the data/week_02/discharge/06730200-discharge-daily-1986-2013.csv file to create:

• PLOT 3: a plot of stream discharge from 1986 to 2013 using ggplot() function.
• PLOT 4: a plot that shows stream discharge SUBSETTED from Aug 15 - Oct 15 2013 using the ggplot() function.

### Label plots appropriately

Be sure that each plot has:

1. A figure caption that describes the contents of the plot
2. X and Y axis labels that include appropriate units
3. A carefully composed title that describes the contents of the plot

Below each plot, describe and interpret what the plot shows. Describe how the data demonstrate an impact and / or a driver of the 2013 flood event.

### Write clean code

Be sure that your code follows the style guidelines outlined in the write clean code lessons

Be sure to:

• Label each plot clearly. This includes a title, x and y axis labels
• Write clean code. This includes comments that document / describe the steps you take in your code and clean syntax following Hadley Wickham’s style guide.
• Convert date fields as appropriate.
• Clean no data values as appropriate.
• Show all of your code in the output .html file.

#### 4. Graduate students: add a 5th plot to your .Rmd file

In addition to the plots above, add a plot of precipitation that spans from 1948 - 2013 using the 805333-precip-daily-1948-2013.csv file. For your plot be sure to

1. Subset the data temporally: Jan 1 2013 - Oct 15 2013
2. Summarize the data: plot DAILY total (sum) precipitation

Use the bonus lesson to guide you through creating this plot.

Bonus opportunity 1 (1 point) Generate and add to your report the plot of precipitation for 1948 - 2013 described above (required for all graduate students).

Then, receive a bonus point for:

1. Identifying an anomaly or change in the data that you can clearly see when you plot it
2. Suggesting how to address that anomaly in R to make a more uniform looking plot

Bonus opportunity 1 (1 point)

• Create an interactive plot with a slider (range selector) using dygraphs

## Final submission

When you are happy with your report, convert your R Markdown file into .html format report using knitr. Submit your final report to the d2l drop box in both .html and .Rmd

## Homework plots

## Stackoverflow is a great place to get help:
## http://stackoverflow.com/tags/ggplot2.


## Warning in evalq(DATE >= as.Date("2013-01-01") & DATE <=
## as.Date("2013-10-15"), : Incompatible methods ("Ops.POSIXt", "Ops.Date")
## for ">="
## Warning in evalq(DATE >= as.Date("2013-01-01") & DATE <=
## as.Date("2013-10-15"), : Incompatible methods ("Ops.POSIXt", "Ops.Date")
## for "<="


## Bonus plots

### Report content - text writeup: 30%

«««< HEAD | Element | 5 points | 3 Points | 0 Points | | |:——————————————————————————————————————————————————————|:—————————————————————————————————————————————————————|:—————————————————–|:————————————————————–|:-| | PDF and RMD submitted | Both files are submitted | Only one of the 2 files are submitted | NA | | | Summary text is provided for each plot | Summary text is provided for all of the plots in the report. | Summary text is missing for 1-2 plots in the report. | Summary text is not included for 3 or more plots. | | | Grammar & spelling are accurate throughout the report | No visible grammar or spelling issues in the report | 2-4 grammar and spelling issues in the report | More than 4 spelling / grammar issues in the report | | | File is named with last name-first initial week 3 | File naming is as required | NA | File is not named properly | | | Report contains all 4 plots described in the assignment. | All plots are included in the report | 1 plot is missing | More than 1 plot is missing | | | 2-3 paragraphs exist at the top of the report that summarize the conditions and the events that took place in 2013 to cause a flood that had significant impacts. | Summary text is included at the top of the report. | | There is no introductory, summary text included in the report | | | Introductory text at the top of the document clearly describes the conditions and events that took place in 2013 that yielded the significant flood event. | The summary text adequately describes the drivers including the weather system, rainfall and discharge as it relates to the erosion / deposition that occured. | NA | This information is not included in the report. | | | Introductory text at the top of the document is thoughtful and well written. | It is well written. | NA | Introductory text is not well written. | | ======= | Full credit | | No credit | |:—–|:——–|:———-| | PDF and RMD files submitted | | | | Summary text is provided for each plot | | | | Grammar & spelling are accurate throughout the report | | | | File is named with last name-first initial week 2 | | | | Report contains all 4 plots described in the assignment. | | | | 2-3 paragraphs exist at the top of the report that summarize the conditions and the events that took place in 2013 to cause a flood that had significant impacts. | | | | Introductory text at the top of the document clearly describes the drivers and impacts associated with the 2013 flood event. | | | |=== | Introductory text at the top of the document is organized, clear and thoughtful. | | |

dev-lessons

### Report content - code format: 20%

Element5 points3 Points0 Points
Code is written using “clean” code practices following the Hadley Wickham style guideSpaces are placed after all # comment tags, variable names do not use periods, or function names.Clean coding is used in some of the code but spaces or variable names are incorrect 2-4 timesclean coding is not implemented consistently throughout the report.
YAML contains a title, author and dateAuthor, title and date are in YAMLOne element is missing from the YAML2 or more elements are missing from the YAML
Code chunk contains code and runsAll code runs in the documentThere are 1-2 errors in the code in the document that make it not runThe are more than 3 code errors in the document

### Report plots: 50%

PLOT 1: a plot of precipitation from 2003 to 2013 using ggplot().

5 points3 Points0 Points
Plot is labeled with a title, x and y axis label.Plot is missing 1 or 2 labels.No labels were added to the plot.
Plot is coded using the ggplot() function.NAPlot is not coded using the ggplot() function.
Date on the x axis is formatted as a date class.NADates are not properly formatted.
No data values have been removedNANo data values have not been removed
Code to create the plot is clearly documented with comments in the html / pdf knitr output.NACode is not documented with comments.
Plot is described and interpreted in the text of the report with reference made to how the data demonstrate an impact or driver of the flood event.NAPlot is not interpreted in the text.

PLOT 2: a plot that shows precipitation SUBSETTED from Aug 15 - Oct 15 2013.

5 points3 Points0 Points
Plot is labeled with a title, x and y axis label.Plot is missing 1 or 2 labels.No labels were added to the plot.
Plot is coded using the ggplot() function.NAPlot is not coded using the ggplot() function.
Date on the x axis is formatted as a date class.NADates are not properly formatted.
No data values have been removedNANo data values have not been removed
Code to create the plot is clearly documented with comments in the html / pdf knitr output.NACode is not documented with comments.
Plot is described and interpreted in the text of the report with reference made to how the data demonstrate an impact or driver of the flood event.NAPlot is not interpreted in the text.

PLOT 3: a plot of stream discharge from 1986 to 2016 using ggplot().

5 points3 Points0 Points
Plot is labeled with a title, x and y axis label.Plot is missing 1 or 2 labels.No labels were added to the plot.
Plot is coded using the ggplot() function.NAPlot is not coded using the ggplot() function.
Date on the x axis is formatted as a date class.NADates are not properly formatted.
No data values have been removedNANo data values have not been removed
Code to create the plot is clearly documented with comments in the html / pdf knitr output.NACode is not documented with comments.
Plot is described and interpreted in the text of the report with reference made to how the data demonstrate an impact or driver of the flood event.NAPlot is not interpreted in the text.

PLOT 4: a plot that shows stream discharge SUBSETTED from Aug 15 - Oct 15 2013

5 points3 Points0 Points
Plot is labeled with a title, x and y axis label.Plot is missing 1 or 2 labels.No labels were added to the plot.
Plot is coded using the ggplot() function.NAPlot is not coded using the ggplot() function.
Date on the x axis is formatted as a date class.NADates are not properly formatted.
No data values have been removedNANo data values have not been removed
Code to create the plot is clearly documented with comments in the html / pdf knitr output.NACode is not documented with comments.
Plot is described and interpreted in the text of the report with reference made to how the data demonstrate an impact or driver of the flood event.NAPlot is not interpreted in the text.

PLOT 5 (GRAD STUDENTS ONLY, bonus points for undergrads): a plot of precipitation that spans from 1948 - 2013

5 points3 Points0 Points
Plot is labeled with a title, x and y axis label.Plot is missing 1 or 2 labels.No labels were added to the plot.
Plot is coded using the ggplot() function.NAPlot is not coded using the ggplot() function.
Date on the x axis is formatted as a date class.NADates are not properly formatted.
No data values have been removedNANo data values have not been removed
Code to create the plot is clearly documented with comments in the html / pdf knitr output.NACode is not documented with comments.
Plot is described and interpreted in the text of the report with reference made to how the data demonstrate an impact or driver of the flood event.NAPlot is not interpreted in the text.

#### Plot aesthetics

• PLOT 1: a plot of precipitation from 2003 to 2013 using ggplot().
• PLOT 2: a plot that shows precipitation SUBSETTED from Aug 15 - Oct 15 2013 using ggplot().
• PLOT 3: a plot of stream discharge from 1986 to 2013 using ggplot().
• PLOT 4: a plot that shows stream discharge SUBSETTED from Aug 15 - Oct 15 2013 using ggplot(). ***
• PLOT 5: (GRAD STUDENTS ONLY, bonus points for undergrads): a plot of precipitation that spans from 1948 - 2013

We will review each of the plots listed above for various aesthetics as follows:

Full credit No credit
Plot is labeled with a title, x and y axis label.
Plot is coded using the ggplot() function. (please don’t use qplot())
Date on the x axis is formatted as a date class for all plots. Dates are not properly formatted.
Missing data values have been cleaned / replaced with NA Missing values have not been cleaned
Code to create the plot is clearly documented with comments in the html / pdf knitr output. Code isn’t commented
Plot is described and interpreted in the text of the report with reference made to how the data demonstrate an impact or driver of the flood event. Plot is not discussed and interpreted in the text.

#### Dplyr plot subsetting

Plots 2 and 4 should be temporally subsetted to the dates listed above.

| Full credit | | No credit | |:—–|:——–|:———-| | Plot 2 is temporally subsetted using dplyr pipes to Aug 15 - Oct 15 2013 | | | |=== | Plot 4 is temporally subsetted using dplyr pipes to Aug 15 - Oct 15 2013 | | |

dev-lessons

#### Grading bonus points (2 points potential)

• 1 point: Identify and fix the anomaly in the precipitation 805333-precip-daily-1948-2013.csv
• 1 point: Create an interactive plot using dygraphs in your output html file you

Updated: