def cached(func_key, override=False):
"""
A decorator to cache function results
Parameters
==========
key: str
File basename used to save pickled results
override: bool
When True, re-compute even if the results are already stored
"""
def compute_and_cache_decorator(compute_function):
"""
Wrap the caching function
Parameters
==========
compute_function: function
The function to run and cache results
"""
def compute_and_cache(*args, **kwargs):
"""
Perform a computation and cache, or load cached result.
Parameters
==========
args
Positional arguments for the compute function
kwargs
Keyword arguments for the compute function
"""
# Add an identifier from the particular function call
if 'cache_key' in kwargs:
key = '_'.join((func_key, kwargs['cache_key']))
else:
key = func_key
if 'project' in kwargs:
project = kwargs['project']
else:
project = et.Project()
path = project.project_dir / 'jars' / f'{key}.pickle'
# Check if the cache exists already or override caching
if not os.path.exists(path) or override:
# Make jars directory if needed
os.makedirs(os.path.dirname(path), exist_ok=True)
# Run the compute function as the user did
result = compute_function(*args, **kwargs)
# Pickle the object
with open(path, 'wb') as file:
pickle.dump(result, file)
else:
# Unpickle the object
with open(path, 'rb') as file:
result = pickle.load(file)
return result
return compute_and_cache
return compute_and_cache_decoratorLand cover classification at the Mississippi Delta
In this notebook, you will use a k-means unsupervised clustering algorithm to group pixels by similar spectral signatures. k-means is an exploratory method for finding patterns in data. Because it is unsupervised, you don’t need any training data for the model. You also can’t measure how well it “performs” because the clusters will not correspond to any particular land cover class. However, we expect at least some of the clusters to be identifiable as different types of land cover.
You will use the harmonized Sentinal/Landsat multispectral dataset. You can access the data with an Earthdata account and the earthaccess library from NSIDC:
STEP 1: SET UP
- Import all libraries you will need for this analysis
- Configure GDAL parameters to help avoid connection errors:
python os.environ["GDAL_HTTP_MAX_RETRY"] = "5" os.environ["GDAL_HTTP_RETRY_DELAY"] = "1"
Below you can find code for a caching decorator which you can use in your code. To use the decorator:
@cached(key, override)
def do_something(*args, **kwargs):
...
return item_to_cacheThis decorator will pickle the results of running the do_something() function, and only run the code if the results don’t already exist. To override the caching, for example temporarily after making changes to your code, set override=True. Note that to use the caching decorator, you must write your own function to perform each task!
STEP 2: STUDY SITE
For this analysis, you will use a watershed from the Water Boundary Dataset, HU12 watersheds (WBDHU12.shp).
- Download the Water Boundary Dataset for region 8 (Mississippi)
- Select watershed 080902030506
- Generate a site map of the watershed
Try to use the caching decorator
Downloading from https://prd-tnm.s3.amazonaws.com/StagedProducts/Hydrography/WBD/HU2/Shape/WBD_08_HU2_Shape.zip
Extracted output to /home/runner/earth-analytics/data/WBD_08_HU2_Shape
We chose this watershed because it covers parts of New Orleans an is near the Mississippi Delta. Deltas are boundary areas between the land and the ocean, and as a result tend to contain a rich variety of different land cover and land use types.
Write a 2-3 sentence site description (with citations) of this area that helps to put your analysis in context.
YOUR SITE DESCRIPTION HERE
STEP 3: MULTISPECTRAL DATA
Search for data
- Log in to the
earthaccessservice using your Earthdata credentials:python earthaccess.login(persist=True) - Modify the following sample code to search for granules of the HLSL30 product overlapping the watershed boundary from May to October 2023 (there should be 76 granules):
python results = earthaccess.search_data( short_name="...", cloud_hosted=True, bounding_box=tuple(gdf.total_bounds), temporal=("...", "..."), )
# Log in to earthaccess
# Search for HLS tilesCompile information about each granule
I recommend building a GeoDataFrame, as this will allow you to plot the granules you are downloading and make sure they line up with your shapefile. You could also use a DataFrame, dictionary, or a custom object to store this information.
- For each search result:
- Get the following information (HINT: look at the [‘umm’] values for each search result):
- granule id (UR)
- datetime
- geometry (HINT: check out the shapely.geometry.Polygon class to convert points to a Polygon)
- Open the granule files. I recomment opening one granule at a time, e.g. with (
earthaccess.open([result]). - For each file (band), get the following information:
- file handler returned from
earthaccess.open() - tile id
- band number
- file handler returned from
- Get the following information (HINT: look at the [‘umm’] values for each search result):
- Compile all the information you collected into a GeoDataFrame
# Loop through each granule
# Get granule information
# Get URL
# Build metadata DataFrame rows
# Concatenate metadata DataFrameOpen, crop, and mask data
This will be the most resource-intensive step. I recommend caching your results using the cached decorator or by writing your own caching code. I also recommend testing this step with one or two dates before running the full computation.
This code should include at least one function including a numpy-style docstring. A good place to start would be a function for opening a single masked raster, applying the appropriate scale parameter, and cropping.
- For each granule:
Open the Fmask band, crop, and compute a quality mask for the granule. You can use the following code as a starting point, making sure that
mask_bitscontains the quality bits you want to consider: ```python # Expand into a new dimension of binary bits bits = ( np.unpackbits(da.astype(np.uint8), bitorder=‘little’) .reshape(da.shape + (-1,)) )# Select the required bits and check if any are flagged mask = np.prod(bits[…, mask_bits]==0, axis=-1) ```
For each band that starts with ‘B’:
- Open the band, crop, and apply the scale factor
- Name the DataArray after the band using the
.nameattribute - Apply the cloud mask using the
.where()method - Store the DataArray in your data structure (e.g. adding a GeoDataFrame column with the DataArray in it. Note that you will need to remove the rows for unused bands)
# Loop through each image
# Open granule cloud cover
# Compute cloud mask
# Loop through each spectral band
# Open, crop, and mask the band
# Add the DataArray to the metadata DataFrame row
# Reassemble the metadata DataFrameProcessing granule T15RYN 2024-06-07 16:31:11.509000+00:00
Processing granule T15RYP 2024-06-07 16:31:11.509000+00:00
Processing granule T16RBT 2024-06-07 16:31:11.509000+00:00
Processing granule T16RBU 2024-06-07 16:31:11.509000+00:00
Processing granule T15RYN 2024-06-15 16:31:19.154000+00:00
Processing granule T15RYP 2024-06-15 16:31:19.154000+00:00
Processing granule T16RBT 2024-06-15 16:31:19.154000+00:00
Processing granule T16RBU 2024-06-15 16:31:19.154000+00:00
Processing granule T15RYN 2024-06-23 16:31:21.277000+00:00
Processing granule T15RYP 2024-06-23 16:31:21.277000+00:00
Processing granule T16RBT 2024-06-23 16:31:21.277000+00:00
Processing granule T16RBU 2024-06-23 16:31:21.277000+00:00
Processing granule T15RYN 2024-07-01 16:31:17.338000+00:00
Processing granule T15RYP 2024-07-01 16:31:17.338000+00:00
Processing granule T16RBT 2024-07-01 16:31:17.338000+00:00
Processing granule T16RBU 2024-07-01 16:31:17.338000+00:00
Processing granule T15RYN 2024-07-09 16:31:29.187000+00:00
Processing granule T15RYP 2024-07-09 16:31:29.187000+00:00
Processing granule T16RBT 2024-07-09 16:31:29.187000+00:00
Processing granule T16RBU 2024-07-09 16:31:29.187000+00:00
Processing granule T15RYN 2024-07-17 16:31:33.271000+00:00
Processing granule T15RYP 2024-07-17 16:31:33.271000+00:00
Processing granule T16RBT 2024-07-17 16:31:33.271000+00:00
Processing granule T16RBU 2024-07-17 16:31:33.271000+00:00
Processing granule T15RYN 2024-07-25 16:31:43.265000+00:00
Processing granule T15RYP 2024-07-25 16:31:43.265000+00:00
Processing granule T16RBT 2024-07-25 16:31:43.265000+00:00
Processing granule T16RBU 2024-07-25 16:31:43.265000+00:00
Processing granule T15RYN 2024-08-02 16:31:36.413000+00:00
Processing granule T15RYP 2024-08-02 16:31:36.413000+00:00
Processing granule T16RBT 2024-08-02 16:31:36.413000+00:00
Processing granule T16RBU 2024-08-02 16:31:36.413000+00:00
Processing granule T15RYN 2024-08-10 16:31:42.335000+00:00
Processing granule T15RYP 2024-08-10 16:31:42.335000+00:00
Processing granule T16RBT 2024-08-10 16:31:42.335000+00:00
Processing granule T16RBU 2024-08-10 16:31:42.335000+00:00
Processing granule T15RYN 2024-08-18 16:31:46.256000+00:00
Processing granule T15RYP 2024-08-18 16:31:46.256000+00:00
Processing granule T16RBT 2024-08-18 16:31:46.256000+00:00
Processing granule T16RBU 2024-08-18 16:31:46.256000+00:00
Processing granule T15RYN 2024-08-26 16:31:51.172000+00:00
Processing granule T15RYP 2024-08-26 16:31:51.172000+00:00
Processing granule T16RBT 2024-08-26 16:31:51.172000+00:00
Processing granule T16RBU 2024-08-26 16:31:51.172000+00:00
Merge and Composite Data
You will notice for this watershed that: 1. The raster data for each date are spread across 4 granules 2. Any given image is incomplete because of clouds
For each band:
For each date:
- Merge all 4 granules
- Mask any negative values created by interpolating from the nodata value of -9999 (
rioxarrayshould account for this, but doesn’t appear to when merging. If you leave these values in they will create problems down the line)
Concatenate the merged DataArrays along a new date dimension
Take the mean in the date dimension to create a composite image that fills cloud gaps
Add the band as a dimension, and give the DataArray a name
Concatenate along the band dimension
# Merge and composite and image for each band
# Merge granules for each date
# Mask negative values
# Composite images across dates<xarray.DataArray 'reflectance' (band: 10, y: 556, x: 624)> Size: 14MB
array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
...
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]],
shape=(10, 556, 624), dtype=float32)
Coordinates:
* band (band) int64 80B 1 2 3 4 5 6 7 9 10 11
* y (y) float64 4kB 3.304e+06 3.304e+06 ... 3.287e+06 3.287e+06
* x (x) float64 5kB 7.926e+05 7.926e+05 ... 8.112e+05 8.113e+05
spatial_ref int64 8B 0
Attributes: (12/35)
ACCODE: Lasrc; Lasrc
add_offset: 0.0
arop_ave_xshift(meters): 0, 0
arop_ave_yshift(meters): 0, 0
arop_ncp: 0, 0
arop_rmse(meters): 0, 0
... ...
TIRS_SSM_MODEL: PRELIMINARY; PRELIMINARY
TIRS_SSM_POSITION_STATUS: ESTIMATED; ESTIMATED
ULX: 699960
ULY: 3300000
USGS_SOFTWARE: LPGS_16.4.0
AREA_OR_POINT: AreaSTEP 4: K-MEANS
Cluster your data by spectral signature using the k-means algorithm.
- Convert your DataArray into a tidy DataFrame of reflectance values (hint: check out the
.to_dataframe()and.unstack()methods) - Filter out all rows with no data (all 0s or any N/A values)
- Fit a k-means model. You can experiment with the number of groups to find what works best.
# Convert spectral DataArray to a tidy DataFrame
# Running the fit and predict functions at the same time.
# We can do this since we don't have target data.
# Add the predicted values back to the model DataFrame| band | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 9 | clusters | |
|---|---|---|---|---|---|---|---|---|---|---|
| y | x | |||||||||
| 3.303783e+06 | 810148.062907 | 0.01560 | 0.0225 | 0.0409 | 0.0366 | 0.0478 | 0.0281 | 0.0181 | 0.0006 | 5 |
| 810178.062907 | 0.01895 | 0.0256 | 0.0396 | 0.0413 | 0.0426 | 0.0284 | 0.0220 | 0.0006 | 5 | |
| 810208.062907 | 0.01915 | 0.0246 | 0.0387 | 0.0377 | 0.0384 | 0.0273 | 0.0236 | 0.0007 | 5 | |
| 810238.062907 | 0.02040 | 0.0247 | 0.0440 | 0.0445 | 0.0629 | 0.0418 | 0.0266 | 0.0007 | 0 | |
| 810268.062907 | 0.01815 | 0.0245 | 0.0437 | 0.0444 | 0.0618 | 0.0397 | 0.0259 | 0.0008 | 0 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3.287163e+06 | 793768.062907 | 0.02650 | 0.0345 | 0.0548 | 0.0427 | 0.0218 | 0.0098 | 0.0074 | 0.0007 | 5 |
| 793798.062907 | 0.02790 | 0.0351 | 0.0549 | 0.0439 | 0.0221 | 0.0104 | 0.0076 | 0.0008 | 5 | |
| 793828.062907 | 0.02580 | 0.0331 | 0.0534 | 0.0419 | 0.0194 | 0.0080 | 0.0059 | 0.0009 | 5 | |
| 793858.062907 | 0.02570 | 0.0326 | 0.0521 | 0.0402 | 0.0182 | 0.0064 | 0.0046 | 0.0007 | 5 | |
| 793888.062907 | 0.02550 | 0.0340 | 0.0541 | 0.0423 | 0.0199 | 0.0083 | 0.0060 | 0.0007 | 5 |
317917 rows × 9 columns
STEP 5: PLOT
Create a plot that shows the k-means clusters next to an RGB image of the area. You may need to brighten your RGB image by multiplying it by 10. The code for reshaping and plotting the clusters is provided for you below, but you will have to create the RGB plot yourself!
So, what is .sortby(['x', 'y']) doing for us? Try the code without it and find out.
# Plot the k-means clusters
(
rgb_plot
+
model_df.clusters.to_xarray().sortby(['x', 'y']).hvplot(
cmap="Colorblind", aspect='equal')
)Don’t forget to interpret your plot!
YOUR PLOT HEADLINE AND DESCRIPTION HERE