# Uncertainty and metadata - Earth analytics course module

Welcome to the first lesson in the Uncertainty and metadata module. In this module, we will discuss the concept of uncertainty as it relates to both remote sensing and other data. We will also explore some metadata to learn how to understand more about our data.

# Lesson 1. Lidar remote sensing data - Understand uncertainty / error associated with height metrics extracted from lidar raster data in R

## Learning objectives

After completing this tutorial, you will be able to:

• Be able to list atleast 3 sources of uncertainty / error associated with remote sensing data
• Be able to interpret a scatter plot that compares remote sensing values with field measured values to determine how “well” the two metrics compare
• Be able to describe 1-3 ways to better understand sources of error associated with a comparison between remote sensing values with field measured values

## What you need

You will need a computer with internet access to complete this lesson and the data for week 5 of the course.

## Understanding uncertainty and error.

It is important to consider error and uncertainty when presenting scientific results. Most measurements that we make - be they from instruments or humans - have uncertainty associated with them. We will discuss what that means, below.

## Uncertainty

Uncertainty: Uncertainty quantifies the range of values within which the value of the measure falls within - within a specified level of confidence. The uncertainty quantitatively indicates the “quality” of your measurement. It answers the question: “how well does the result represent the value of the quantity being measured?”

### Tree height measurement example

So for example let’s pretend that we measured the height of a tree 10 times. Each time our tree height measurement may be slightly different? Why? Because maybe each time we visually determined the top of the tree to be in a slightly different place. Or maybe there was wind that day during measurements that caused the tree to shift as we measured it yielding a slightly different height each time. or… what other reasons can you think of that might impact tree height measurements?

## What is the true value?

So you may be wondering, what is the true height of our tree? In the cause of a tree in a forest, it’s very difficult to determine the true height. So we accept that there will be some variation in our measurements and we measure the tree over and over again until we understand the range of heights that we are likely to get when we measure the tree.

# create data frame containing made up tree heights
tree_heights <- data.frame(heights=c(10, 10.1, 9.9, 9.5, 9.7, 9.8,
9.6, 10.5, 10.7, 10.3, 10.6))
# what is the average tree height
mean(tree_heights$heights) ## [1] 10.06364 # what is the standard deviation of measurements? sd(tree_heights$heights)
## [1] 0.4129715
boxplot(tree_heights$heights, main = "Distribution of tree height measurements (m)", ylab="Height (m)", col = "springgreen")  In the example above, our mean tree height value is towards the center of our distribution of measured heights. We might expect that the sample mean of our observations provides a reasonable estimate of the true value. The variation among our measured values may also provide some information about the precision (or lack thereof) of the measurement process. Read more about the basics of a box plot # view distribution of tree height values hist(tree_heights$heights, breaks = c(9,9.6,10.4,11),
main = "Distribution of measured tree height values",
xlab = "Height (m)", col = "purple")


## Measurement accuracy

Measurement accuracy is a concept that relates to whether there is bias in measurements, i.e. whether the expected value of our observations is close to the true value. For low accuracy measurements, we may collect many observations, and the mean of those observations may not provide a good measure of the truth (e.g., the height of the tree). For high accuracy measurements, the mean of many observations would provide a good measure of the true value. This is different from precision, which typically refers to the variation among observations. Accuracy and precision are not always tightly coupled. It is possible to have measurements that are very precise but inaccurate, very imprecise but accurate, etc.

## Systematic vs random error

Systematic error: a systematic error is one that tends to shift all measurements in a systematic way. This means that the mean value of a set of measurements is consistently displaced or varied in a predictable way, leading to inaccurate observations. Causes of systematic errors may be known or unknown but should always be corrected for when present. For instance, no instrument can ever be calibrated perfectly, so when a group of measurements systematically differ from the value of a standard reference specimen, an adjustment in the values should be made. Systematic error can be corrected for only when the “true value” (such as the value assigned to a calibration or reference specimen) is known.

Example: Remote sensing instruments need to be calibrated. For example a laser in a lidar system may be tested in a lab to ensure that the distribution of output light energy is consistent every time the laser “fires”.

Random error: is a component of the total error which, in the course of a number of measurements, varies in an unpredictable way. It is not possible to correct for random error. Random errors can occur for a variety of reasons such as:

• Lack of equipment sensitivity. An instrument may not be able to respond to or indicate a change in some quantity that is too small or the observer may not be able to discern the change.
• Noise in the measurement. Noise is extraneous disturbances that are unpredictable or random and cannot be completely accounted for.
• Imprecise definition. It is difficult to exactly define the dimensions of a object.
For example, it is difficult to determine the ends of a crack with measuring its length. Two people may likely pick two different starting and ending points.

Example: random error may be introduced when we measure tree heights as discussed above.

## Using lidar to estimate tree height

We use lidar data to estimate tree height because it is an efficient way to measure large areas of trees (forests) quantitatively. However, we can process the lidar data in many different ways to estimate height. Which method most closely represents the actual heights of the trees on the ground?

## Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", : Cannot create a RasterLayer object from this file. (file does not exist)
## Error in SJER_chm[SJER_chm == 0] <- NA: object 'SJER_chm' not found
## Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, : Cannot open data source
## Error in extract(SJER_chm, SJER_plots, buffer = 20, fun = max, sp = TRUE, : object 'SJER_chm' not found
## Error in file(file, "rt"): cannot open the connection
## Error in eval(lhs, parent, parent): object 'SJER_insitu' not found
## Error in merge(SJER_height, insitu_stem_height, by.x = "Plot_ID", by.y = "plotid"): object 'SJER_height' not found


## Study site location

To answer the question above, let’s look at some data from a study site location in California - the San Joaquin Experimental range field site. You can see the field site location on the map below.

## Error in eval(expr, envir, enclos): object 'SJER_chm' not found
## Error in eval(expr, envir, enclos): object 'SJER_chm' not found
## Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, : Cannot open data source
## Error in cbind(lon, lat): object 'lon' not found
## Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'coordinates' for signature '"function"'
## Error in spTransform(site_location, CRSobj = crs(state_boundary_us)): object 'site_location' not found
## Error in coordinates(site_location_wgs84): object 'site_location_wgs84' not found

## Error in fortify(data): object 'site_locat_points' not found


## Study area plots

At this study site, we have both lidar data - specifically a canopy height model that was processed by NEON (National Ecological Observatory Network). We also have some “ground truth” data. That is we have measured tree height values collected at a set of field site plots by technicians at NEON. We will call these measured values in situ measurements.

A map of our study plots is below overlaid on top of the canopy height mode.

## Error in plot(SJER_chm, main = "Study area plot locations", col = gray.colors(100, : object 'SJER_chm' not found
## Error in plot(SJER_plots, pch = 15, cex = 2, col = "magenta", add = TRUE): object 'SJER_plots' not found
## Error in legend([email protected]@xmax + 100, [email protected]@ymax, legend = "Plot \nlocations", : object 'SJER_chm' not found


### Compare lidar derived height to in situ measurements

We can compare maximum tree height values at each plot to the maximum pixel value in our CHM for each plot. To do this, we define the geographic boundary of our plot using a polygon - in the case below we use a circle as the boundary. We then extract the raster cell values for each circle and calculate the max value for all of the pixels that fall within the plot area.

Then, we calculate the max height of our measured plot tree height data.

Finally we compare the two using a scatter plot to see how closely the data relate. Do they follow a 1:1 line? Do the data diverge from a 1:1 relationship?

## Error in ggplot([email protected], aes(x = insitu_max, y = SJER_lidarCHM)): object 'SJER_height' not found
## function (...)
## tags$p(...) ## <environment: namespace:htmltools>  ### How different are the data? ## Error in eval(expr, envir, enclos): object 'SJER_height' not found ## Error in gsub("SJER", "", [email protected]$Plot_ID): object 'SJER_height' not found
## Error in ggplot(data = [email protected], aes(x = Plot_ID, y = ht_diff, : object 'SJER_height' not found


## View interactive scatterplot

View scatterplot plotly

## View interactive difference barplot

View scatterplot differences

• Code to add imagery to qgis via the python console: qgis.utils.iface.addRasterLayer(“http://server.arcgisonline.com/arcgis/rest/services/ESRI_Imagery_World_2D/MapServer?f=json&pretty=true”,”raster”)