Lesson 2. Extract Raster Values Using Vector Boundaries in R


Learning Objectives

After completing this tutorial, you will be able to:

  • Use the extract() function to extract raster values using a vector extent or set of extents
  • Create a scatter plot with a one-to-one line in R
  • Understand the concept of uncertainty as it’s associated with remote sensing data

What You Need

You will need a computer with internet access to complete this lesson and the data for week 4 of the course.

Download Week 4 Data (~500 MB)

# load libraries
library(raster)
library(rgdal)
library(rgeos)
library(ggplot2)
library(dplyr)

options(stringsAsFactors = FALSE)

# set working directory
# setwd("path-here/earth-analytics")

Import Canopy Height Model

First, we will import a canopy height model created by the NEON project. In the previous lessons / weeks we learned how to make a canopy height model by subtracting the digital elevation model (DEM) from the digital surface model (DSM).

# import canopy height model (CHM).
SJER_chm <- raster("data/week-04/california/SJER/2013/lidar/SJER_lidarCHM.tif")
SJER_chm
## class       : RasterLayer 
## dimensions  : 5059, 4296, 21733464  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : 254571, 258867, 4107303, 4112362  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : /Users/lewa8222/Dropbox/earth-analytics/data/week-04/california/SJER/2013/lidar/SJER_lidarCHM.tif 
## names       : SJER_lidarCHM 
## values      : 0, 45.88  (min, max)

# view distribution of pixel values
hist(SJER_chm,
     main = "Histogram of Canopy Height\n NEON SJER Field Site",
     col = "springgreen",
     xlab = "Height (m)")

Histogram of CHM values

There are a lot of values in our CHM that == 0. Let’s set those to NA and plot again.


# set values of 0 to NA as these are not trees
SJER_chm[SJER_chm == 0] <- NA

# plot the modified data
hist(SJER_chm,
     main = "Histogram of Canopy Height\n pixels==0 set to NA",
     col = "springgreen",
     xlab = "Height (m)")

histogram of chm values

Part 2. Does Our CHM Data Compare to Field Measured Tree Heights?

We now have a canopy height model for our study area in California. However, how do the height values extracted from the CHM compare to our laboriously collected, field measured canopy height data? To figure this out, we will use in situ collected tree height data, measured within circular plots across our study area. We will compare the maximum measured tree height value to the maximum lidar derived height value for each circular plot using regression.

For this activity, we will use the a csv (comma separate value) file, located in SJER/2013/insitu/veg_structure/D17_2013_SJER_vegStr.csv.

# import plot centroids
SJER_plots <- readOGR("data/week-04/california/SJER/vector_data/SJER_plot_centroids.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "data/week-04/california/SJER/vector_data/SJER_plot_centroids.shp", layer: "SJER_plot_centroids"
## with 18 features
## It has 5 fields

# Overlay the centroid points and the stem locations on the CHM plot
plot(SJER_chm,
     main = "SJER Plot Locations",
     col = gray.colors(100, start = .3, end = .9))

# pch 0 = square
plot(SJER_plots,
     pch = 16,
     cex = 2,
     col = 2,
     add = TRUE)

canopy height model / plot locations plot

Extract CMH Data Within 20m Radius of Each Plot Centroid

Next, we will create a boundary region (called a buffer) representing the spatial extent of each plot (where trees were measured). We will then extract all CHM pixels that fall within the plot boundary to use to estimate tree height for that plot.

There are a few ways to go about this task. If our plots are circular, then we can use the extract() function.

buffer circular
The extract function in R allows you to specify a circular buffer radius around an x,y point location. Values for all pixels in the specified raster that fall within the circular buffer are extracted. In this case, we will tell R to extract the maximum value of all pixels using the fun=max command. Source: Colin Williams, NEON

Extract Plot Data Using Circle: 20m Radius Plots

# Insitu sampling took place within 40m x 40m square plots, so we use a 20m radius.
# Note that below will return a data.frame containing the max height
# calculated from all pixels in the buffer for each plot
SJER_height <- raster::extract(SJER_chm,
                    SJER_plots,
                    buffer = 20, # specify a 20 m radius
                    fun = mean, # extract the MEAN value from each plot
                    sp = TRUE, # create spatial object
                    stringsAsFactors = FALSE)

# view structure of the spatial data frame attribute table
head(SJER_height@data)
##    Plot_ID  Point northing easting plot_type SJER_lidarCHM
## 1 SJER1068 center  4111568  255852     trees        11.544
## 2  SJER112 center  4111299  257407     trees        10.356
## 3  SJER116 center  4110820  256839     grass         7.512
## 4  SJER117 center  4108752  256177     trees         7.675
## 5  SJER120 center  4110476  255968     grass         4.591
## 6  SJER128 center  4111389  257079     trees         8.979
# note that this is a spatial points data frame
class(SJER_height)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Looking at our new spatial points data frame, we can see a field named SJER_lidarCHM. This is the column name of the mean height for each plot location. Let’s rename that to be something more meaningful.

# raname
colnames(SJER_height@data)[6]
## [1] "SJER_lidarCHM"

# rename the column
colnames(SJER_height@data)[6] <- "lidar_mean_ht"
head(SJER_height@data)
##    Plot_ID  Point northing easting plot_type lidar_mean_ht
## 1 SJER1068 center  4111568  255852     trees        11.544
## 2  SJER112 center  4111299  257407     trees        10.356
## 3  SJER116 center  4110820  256839     grass         7.512
## 4  SJER117 center  4108752  256177     trees         7.675
## 5  SJER120 center  4110476  255968     grass         4.591
## 6  SJER128 center  4111389  257079     trees         8.979

Explore the Data Distribution

If you want to explore the data distribution of pixel height values in each plot, you could remove the fun call to max and generate a list. cent_ovrList <- extract(chm,centroid_sp,buffer = 20). It’s good to look at the distribution of values we’ve extracted for each plot. Then you could generate a histogram for each plot hist(cent_ovrList[[2]]). If we wanted, we could loop through several plots and create histograms using a for loop.

# cent_ovrList <- extract(chm,centroid_sp,buffer = 20)
# create histograms for the first 5 plots of data
# for (i in 1:5) {
#  hist(cent_ovrList[[i]], main=(paste("plot",i)))
#  }

Derive Square Plot Boundaries and CHM Values Around a Point

For how to extract square plots using a plot centroid value, check out the extracting square shapes activity .

Image showing the buffer area for a plot.
If you had square shaped plots, the code in the link above would extract pixel values within a square shaped buffer. Source: Colin Williams, NEON

Plot by Height

# plot canopy height model
plot(SJER_chm,
     main = "Vegetation Plots \nSymbol size by Average Tree Height",
     legend = FALSE)

# add plot location sized by tree height
plot(SJER_height,
     pch = 19,
     cex = (SJER_height$lidar_mean_ht)/10, # size symbols according to tree height attribute normalized by 10
     add = TRUE)

# place legend outside of the plot
par(xpd = TRUE)
legend(SJER_chm@extent@xmax + 250,
       SJER_chm@extent@ymax,
       legend = "plot location \nsized by \ntree height",
       pch = 19,
       bty = 'n')

Plots sized by vegetation height

Leave a Comment