Lesson 2. Plot histograms of raster values in R

Learning objectives

After completing this tutorial, you will be able to:

  • Open raster data in R
  • Create a histogram of raster values in R
  • Draw information on raster attributes from a histogram

What you need

You will need a computer with internet access to complete this lesson.

If you have not already downloaded the week 3 data, please do so now. Download week 3 data (~250 MB)

In the last lesson, we discussed 3 key attributes of a raster dataset:

  1. Spatial resolution
  2. Spatial extent
  3. Coordinate reference systems

In this lesson, we will learn how to use histograms to better understand the distribution of our data.

Open raster data in R

To work with raster data in R, we can use the raster and rgdal packages. Remember we can use the raster() function to import the raster object into R.

# load libraries
library(raster)
library(rgdal)

# Make sure your working directory is set to  wherever your 'earth-analytics' dir is
# setwd("earth-analytics-dir-path-here")

# open raster data
lidar_dem <- raster(x = "data/week_03/BLDR_LeeHill/pre-flood/lidar/pre_DTM.tif")

# plot raster data
plot(lidar_dem,
     main = "Digital Elevation Model - Pre 2013 Flood")

digital surface model raster plot

Raster histograms - distribution of elevation values

The histogram below represents the distribution of pixel elevation values in our data. This plot is useful to:

  1. Identify outlier data values
  2. Assess the min and max values in our data
  3. Explore the general distribution of elevation values in the data (i.e. is the area generally flat, hilly, high elevation or low elevation)

Notice that we are using the xlab and ylab arguments in our plot to label our plot axes.

# plot histogram
hist(lidar_dem,
     main = "Distribution of surface elevation values",
     xlab = "Elevation (meters)", ylab = "Frequency",
     col = "springgreen")

histogram of DEM elevation values

What does a histogram tell us?

A histogram shows us how the data are distributed. Each bin or bar in the plot represents the number or frequency of pixels that fall within the range specified by the bin.

We can use the breaks = argument to specify fewer or more breaks in our histogram. Note that this argument does not result in the exact number of breaks that you may want in your histogram.

# plot histogram
hist(lidar_dem,
     breaks = 3,
     main = "Distribution of surface elevation values with breaks",
     xlab = "Elevation (meters)", ylab = "Frequency",
     col = "springgreen")

histogram of DEM elevation values

Alternatively, we can specify specific break points that we want R to use when it bins the data.

breaks = c(1600, 1800, 2000, 2100)

In this case, R will count the number of pixels that occur within each value range as follows:

bin 1: number of pixels with values between 1600-1800 bin 2: number of pixels with values between 1800-2000 bin 3: number of pixels with values between 2000-2100

# plot histogram
hist(lidar_dem,
     main = "Distribution of surface elevation values",
     breaks = c(1600, 1800, 2000, 2100),
     xlab = "Elevation (meters)", ylab = "Frequency",
     col = "wheat3")

histogram of DEM elevation values

In-class challenge - import DSM

  • Import the file: data/week_03/BLDR_LeeHill/pre-flood/lidar/pre_DSM_hill.tif

Plot the data and a histogram of the data. What do the elevations in the DSM represent? Are they different from the DTM? Discuss this with your neighbor.

  • What is the CRS and spatial resolution for this dataset? What units is the spatial resolution in?

DSM histogram and plotDSM histogram and plot