Urban Greenspace and Asthma Prevalence
Get Started with Big Data Pipelines
- Access larger-than-memory data in chunks
- Compute fragmentation statistics
- Compare urban greenspace distribution to health outcomes
Vegetation has the potential to provide many ecosystem services in Urban areas, such as cleaner air and water and flood mitigation. However, the results are mixed on relationships between a simple measurement of vegetation cover (such as average NDVI, a measurement of vegetation health) and human health. We do, however, find relationships between landscape metrics that attempt to quantify the connectivity and structure of greenspace and human health. These types of metrics include mean patch size, edge density, and fragmentation.
Check out this study by Tsai et al. (2019) on the relationship between edge density and life expectancy in Baltimore, MD. The authors also discuss the influence of scale (e.g. the resolution of the imagery) on these relationships, which is important for this case study.
In this notebook, you will write code to calculate patch, edge, and fragmentation statistics about urban greenspace in Chicago. These statistics should be reflective of the connectivity and spread of urban greenspace, which are important for ecosystem function and access. You will then use a linear model to identify statistically significant correlations between the distribution of greenspace and health data compiled by the US Center for Disease Control.
Working with larger-than-memory (big) data
For this project, we are going to split up the green space (NDVI) data by census tract, because this matches the human health data from the CDC. If we were interested in the average NDVI at this spatial scale, we could easily use a source of multispectral data like Landsat (30m) or even MODIS (250m) without a noticeable impact on our results. However, because we need to know more about the structure of green space within each tract, we need higher resolution data. For that, we will access the National Agricultural Imagery Program (NAIP) data, which is taken for the continental US every few years at 1m resolution. That’s enough to see individual trees and cars! The main purpose of the NAIP data is, as the name suggests, agriculture. However, it’s also a great source for urban areas where lots is happening on a very small scale.
The NAIP data for the City of Chicago takes up about 20GB of space. This amount of data is likely to crash your kernel if you try to load it all in at once. It also would be inconvenient to store on your harddrive so that you can load it in a bit at a time for analysis. Even if your are using a computer that would be able to handle this amount of data, imagine if you were analysing the entire United States over multiple years!
To help with this problem, you will use cloud-based tools to calculate your statistics instead of downloading rasters to your computer or container. You can crop the data entirely in the cloud, thereby saving on your memory and internet connection, using rioxarray.
Check your work with testing!
This notebook does not have pre-built tests. You will need to write your own test code to make sure everything is working the way that you want. For many operations, this will be as simple as creating a plot to check that all your data lines up spatially the way you were expecting, or printing values as you go. However, if you don’t test as you go, you are likely to end up with intractable problems with the final code.
STEP 1: Set up your analysis
As always, before you get started:
- Import necessary packages
- Create reproducible file paths for your project file structure.
- To use cloud-optimized GeoTiffs, we recommend some settings to make sure your code does not get stopped by a momentary connection lapse – see the code cell below.
# Import libraries
# Prevent GDAL from quitting due to momentary disruptions
os.environ["GDAL_HTTP_MAX_RETRY"] = "5"
os.environ["GDAL_HTTP_RETRY_DELAY"] = "1"See our solution!
import os
import pathlib
import time
import warnings
import geopandas as gpd
import geoviews as gv
import holoviews as hv
import hvplot.pandas
import hvplot.xarray
import numpy as np
import pandas as pd
import pystac_client
import rioxarray as rxr
import rioxarray.merge as rxrmerge
import shapely
import xarray as xr
from cartopy import crs as ccrs
from scipy.ndimage import convolve
from sklearn.model_selection import KFold
from scipy.ndimage import label
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from tqdm.notebook import tqdm
data_dir = os.path.join(
pathlib.Path.home(),
'earth-analytics',
'data',
'chicago-greenspace')
os.makedirs(data_dir, exist_ok=True)
warnings.simplefilter('ignore')
# Prevent GDAL from quitting due to momentary disruptions
os.environ["GDAL_HTTP_MAX_RETRY"] = "5"
os.environ["GDAL_HTTP_RETRY_DELAY"] = "1"STEP 2: Create a site map
We’ll be using the Center for Disease Control (CDC) Places dataset for human health data to compare with vegetation. CDC Places also provides some modified census tracts, clipped to the city boundary, to go along with the health data. We’ll start by downloading the matching geographic data, and then select the City of Chicago.
- Download the Census tract Shapefile that goes along with CDC Places
- Use a row filter to select only the census tracts in Chicago
- Use a conditional statement to cache your download. There is no need to cache the full dataset – stick with your pared down version containing only Chicago.
# Set up the census tract path
# Download the census tracts (only once)
# Load in the census tract data
# Site plot -- Census tracts with satellite imagery in the backgroundSee our solution!
# Set up the census tract path
tract_dir = os.path.join(data_dir, 'chicago-tract')
os.makedirs(tract_dir, exist_ok=True)
tract_path = os.path.join(tract_dir, 'chicago-tract.shp')
# Download the census tracts (only once)
if not os.path.exists(tract_path):
tract_url = ('https://data.cdc.gov/download/x7zy-2xmx/application%2Fzip')
tract_gdf = gpd.read_file(tract_url)
chi_tract_gdf = tract_gdf[tract_gdf.PlaceName=='Chicago']
chi_tract_gdf.to_file(tract_path, index=False)
# Load in the census tract data
chi_tract_gdf = gpd.read_file(tract_path)
# Site plot -- Census tracts with satellite imagery in the background
(
chi_tract_gdf
.to_crs(ccrs.Mercator())
.hvplot(
line_color='orange', fill_color=None,
crs=ccrs.Mercator(), tiles='EsriImagery',
frame_width=600)
)What do you notice about the City of Chicago from the coarse satellite image? Is green space evenly distributed? What can you learn about Chicago from websites, scientific papers, or other resources that might help explain what you see in the site map?
Download census tracts and select your urban area
You can obtain urls for the U.S. Census Tract shapefiles from the TIGER service. You’ll notice that these URLs use the state FIPS, which you can get by looking it up (e.g. here, or by installing and using the us package.
- Download the Census tract Shapefile for the state of Illinois (IL).
- Use a conditional statement to cache the download
- Use a spatial join to select only the Census tracts that lie at least partially within the City of Chicago boundary.
STEP 3 - Access Asthma and Urban Greenspaces Data
Access human health data
The U.S. Center for Disease Control (CDC) provides a number of health variables through their Places Dataset that might be correlated with urban greenspace. For this assignment, start with adult asthma. Try to limit the data as much as possible for download. Selecting the state and county is a one way to do this.
- You can access Places data with an API, but as with many APIs it is easier to test out your search before building a URL. Navigate to the Places Census Tract Data Portal and search for the data you want.
- The data portal will make an API call for you, but there is a simpler, easier to read and modify way to form an API call. Check out to the socrata documentation to see how. You can also find some limited examples and a list of available parameters for this API on CDC Places SODA Consumer API Documentation.
- Once you have formed your query, you may notice that you have exactly 1000 rows. The Places SODA API limits you to 1000 records in a download. Either narrow your search or check out the
&$limit=parameter to increase the number of rows downloaded. You can find more information on the Paging page of the SODA API documentation - You should also clean up this data by renaming the
'data_value'to something descriptive, and possibly selecting a subset of columns.
# Set up a path for the asthma data
# Download asthma data (only once)
# Load in asthma data
# Preview asthma dataSee our solution!
# Set up a path for the asthma data
cdc_path = os.path.join(data_dir, 'asthma.csv')
# Download asthma data (only once)
if not os.path.exists(cdc_path):
cdc_url = (
"https://data.cdc.gov/resource/cwsq-ngmh.csv"
"?year=2023"
"&stateabbr=IL"
"&countyname=Cook"
"&measureid=CASTHMA"
"&$limit=1500"
)
cdc_df = (
pd.read_csv(cdc_url)
.rename(columns={
'data_value': 'asthma',
'low_confidence_limit': 'asthma_ci_low',
'high_confidence_limit': 'asthma_ci_high',
'locationname': 'tract'})
[[
'year', 'tract',
'asthma', 'asthma_ci_low', 'asthma_ci_high', 'data_value_unit',
'totalpopulation', 'totalpop18plus'
]]
)
cdc_df.to_csv(cdc_path, index=False)
# Load in asthma data
cdc_df = pd.read_csv(cdc_path)
# Preview asthma data
cdc_df| year | tract | asthma | asthma_ci_low | asthma_ci_high | data_value_unit | totalpopulation | totalpop18plus | |
|---|---|---|---|---|---|---|---|---|
| 0 | 2023 | 17031071700 | 9.1 | 8.1 | 10.1 | % | 1660 | 1325 |
| 1 | 2023 | 17031120300 | 8.8 | 7.8 | 9.8 | % | 6920 | 5212 |
| 2 | 2023 | 17031010501 | 9.9 | 8.9 | 11.0 | % | 4206 | 3762 |
| 3 | 2023 | 17031031000 | 8.9 | 7.9 | 9.9 | % | 3868 | 3439 |
| 4 | 2023 | 17031062400 | 9.2 | 8.2 | 10.3 | % | 1673 | 1389 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1323 | 2023 | 17031840400 | 7.9 | 7.1 | 8.8 | % | 3369 | 2662 |
| 1324 | 2023 | 17031837300 | 13.7 | 12.3 | 15.0 | % | 2489 | 1835 |
| 1325 | 2023 | 17031840000 | 8.4 | 7.5 | 9.3 | % | 3001 | 2428 |
| 1326 | 2023 | 17031840700 | 8.3 | 7.3 | 9.2 | % | 3900 | 2885 |
| 1327 | 2023 | 17031838700 | 14.9 | 13.3 | 16.6 | % | 4132 | 2874 |
1328 rows × 8 columns
Join health data with census tract boundaries
- Join the census tract
GeoDataFramewith the asthma prevalenceDataFrameusing the.merge()method. - You will need to change the data type of one of the merge columns to match, e.g. using
.astype('int64') - There are a few census tracts in Chicago that do not have data. You should be able to confirm that they are not listed through the interactive Places Data Portal. However, if you have large chunks of the city missing, it may mean that you need to expand the record limit for your download.
# Change tract identifier datatype for merging
# Merge census data with geometry
# Plot asthma data as chloroplethSee our solution!
# Change tract identifier datatype for merging
chi_tract_gdf.tract2010 = chi_tract_gdf.tract2010.astype('int64')
# Merge census data with geometry
tract_cdc_gdf = (
chi_tract_gdf
.merge(cdc_df, left_on='tract2010', right_on='tract', how='inner')
)
# Plot asthma data as chloropleth
(
gv.tile_sources.EsriImagery
*
gv.Polygons(
tract_cdc_gdf.to_crs(ccrs.Mercator()),
vdims=['asthma', 'tract2010'],
crs=ccrs.Mercator()
).opts(color='asthma', colorbar=True, tools=['hover'])
).opts(width=600, height=600, xaxis=None, yaxis=None)Write a description and citation for the asthma prevalence data. Do you notice anything about the spatial distribution of asthma in Chicago? From your research on the city, what might be some potential causes of any patterns you see?
Get Data URLs
NAIP data are freely available through the Microsoft Planetary Computer SpatioTemporal Access Catalog (STAC).
Get started with STAC by accessing the planetary computer catalog with the following code:
e84_catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1"
)# Connect to the planetary computer catalogSee our solution!
# Connect to the planetary computer catalog
e84_catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1"
)
e84_catalog.title'Microsoft Planetary Computer STAC API'
Using a loop, for each Census Tract:
Use the following sample code to search for data, replacing the names with applicable values with descriptive names:
search = e84_catalog.search( collections=["naip"], intersects=shapely.to_geojson(tract_geometry), datetime=f"{year}" )Access the url using
search.assets['image'].href
Accumulate the urls in a
pd.DataFrameordictfor laterOccasionally you may find that the STAC service is momentarily unavailable. You should include code that will retry the request up to 5 times when you get the
pystac_client.exceptions.APIError.
As always – DO NOT try to write this loop all at once! Stick with one step at a time, making sure to test your work. You also probably want to add a break into your loop to stop the loop after a single iteration. This will help prevent long waits during debugging.
# Convert geometry to lat/lon for STAC
# Define a path to save NDVI stats
# Check for existing data - do not access duplicate tracts
# Loop through each census tract
# Check if statistics are already downloaded for this tract
# Repeat up to 5 times in case of a momentary disruption
# Try accessing the STAC
# Search for tiles
# Build dataframe with tracts and tile urls
# Try again in case of an APIErrorSee our solution!
# Convert geometry to lat/lon for STAC
tract_latlon_gdf = tract_cdc_gdf.to_crs(4326)
# Define a path to save NDVI stats
ndvi_stats_path = os.path.join(data_dir, 'chicago-ndvi-stats.csv')
# Check for existing data - do not access duplicate tracts
downloaded_tracts = []
if os.path.exists(ndvi_stats_path):
ndvi_stats_df = pd.read_csv(ndvi_stats_path)
downloaded_tracts = ndvi_stats_df.tract.values
else:
print('No census tracts downloaded so far')
# Loop through each census tract
scene_dfs = []
for i, tract_values in tqdm(tract_latlon_gdf.iterrows()):
tract = tract_values.tract2010
# Check if statistics are already downloaded for this tract
if not (tract in downloaded_tracts):
# Retry up to 5 times in case of a momentary disruption
i = 0
retry_limit = 5
while i < retry_limit:
# Try accessing the STAC
try:
# Search for tiles
naip_search = e84_catalog.search(
collections=["naip"],
intersects=shapely.to_geojson(tract_values.geometry),
datetime="2021"
)
# Build dataframe with tracts and tile urls
scene_dfs.append(pd.DataFrame(dict(
tract=tract,
date=[pd.to_datetime(scene.datetime).date()
for scene in naip_search.items()],
rgbir_href=[scene.assets['image'].href for scene in naip_search.items()],
)))
break
# Try again in case of an APIError
except pystac_client.exceptions.APIError:
print(
f'Could not connect with STAC server. '
f'Retrying tract {tract}...')
time.sleep(2)
i += 1
continue
# Concatenate the url dataframes
if scene_dfs:
scene_df = pd.concat(scene_dfs).reset_index(drop=True)
else:
scene_df = None
# Preview the URL DataFrame
scene_dfNo census tracts downloaded so far
| tract | date | rgbir_href | |
|---|---|---|---|
| 0 | 17031010100 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1 | 17031010201 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 2 | 17031010201 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 3 | 17031010202 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 4 | 17031010300 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| ... | ... | ... | ... |
| 1252 | 17031807900 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1253 | 17031807900 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1254 | 17031807900 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1255 | 17031808100 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
| 1256 | 17031808100 | 2021-09-08 | https://naipeuwest.blob.core.windows.net/naip/... |
1257 rows × 3 columns
Compute NDVI Statistics
Next, calculate some metrics to get at different aspects of the distribution of greenspace in each census tract. These types of statistics are called fragmentation statistics, and can be implemented with the scipy package. Some examples of fragmentation statistics are:
- Percentage vegetation
- The percentage of pixels that exceed a vegetation threshold (.12 is common with Landsat) Mean patch size
-
The average size of patches, or contiguous area exceeding the vegetation threshold. Patches can be identified with the
labelfunction fromscipy.ndimageEdge density - The proportion of edge pixels among vegetated pixels. Edges can be identified by convolving the image with a kernel designed to detect pixels that are different from their surroundings.
If you are familiar with differential equations, convolution is an approximation of the LaPlace transform.
For the purposes of calculating edge density, convolution means that we are taking all the possible 3x3 chunks for our image, and multiplying it by the kernel:
\[ \text{Kernel} = \begin{bmatrix} 1 & 1 & 1 \\ 1 & -8 & 1 \\ 1 & 1 & 1 \end{bmatrix} \]
The result is a matrix the same size as the original, minus the outermost edge. If the center pixel is the same as the surroundings, its value in the final matrix will be \(-8 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 0\). If it is higher than the surroundings, the result will be negative, and if it is lower than the surroundings, the result will be positive. As such, the edge pixels of our patches will be negative.
- Select a single row from the census tract
GeoDataFrameusing e.g..loc[[0]], then select all the rows from your URLDataFramethat match the census tract. - For each URL in crop, merge, clip, and compute NDVI for that census tract
- Set a threshold to get a binary mask of vegetation
- Using the sample code to compute the fragmentation statistics. Feel free to add any other statistics you think are relevant, but make sure to include a fraction vegetation, mean patch size, and edge density. If you are not sure what any line of code is doing, make a plot or print something to find out! You can also ask ChatGPT or the class.
# Skip this step if data are already downloaded
# Get an example tract
# Loop through all images for tract
# Open vsi connection to data
# Crop data to the bounding box of the census tract
# Clip data to the boundary of the census tract
# Compute NDVI
# Accumulate result
# Merge data
# Mask vegetation
# Calculate mean patch size
labeled_patches, num_patches = label(veg_mask)
patch_sizes = np.bincount(labeled_patches.ravel())[1:]
mean_patch_size = patch_sizes.mean()
# Calculate edge density
kernel = np.array([
[1, 1, 1],
[1, -8, 1],
[1, 1, 1]])
edges = convolve(veg_mask, kernel, mode='constant')
edge_density = np.sum(edges != 0) / veg_mask.sizeSee our solution!
# Skip this step if data are already downloaded
if not scene_df is None:
# Get an example tract
tract = chi_tract_gdf.loc[0].tract2010
ex_scene_gdf = scene_df[scene_df.tract==tract]
# Loop through all images for tract
tile_das = []
for _, href_s in ex_scene_gdf.iterrows():
# Open vsi connection to data
tile_da = rxr.open_rasterio(
href_s.rgbir_href, masked=True).squeeze()
# Crop data to the bounding box of the census tract
boundary = (
tract_cdc_gdf
.set_index('tract2010')
.loc[[tract]]
.to_crs(tile_da.rio.crs)
.geometry
)
crop_da = tile_da.rio.clip_box(
*boundary.envelope.total_bounds,
auto_expand=True)
# Clip data to the boundary of the census tract
clip_da = crop_da.rio.clip(boundary, all_touched=True)
# Compute NDVI
ndvi_da = (
(clip_da.sel(band=4) - clip_da.sel(band=1))
/ (clip_da.sel(band=4) + clip_da.sel(band=1))
)
# Accumulate result
tile_das.append(ndvi_da)
# Merge data
scene_da = rxrmerge.merge_arrays(tile_das)
# Mask vegetation
veg_mask = (scene_da>.3)
# Calculate mean patch size
labeled_patches, num_patches = label(veg_mask)
# Count patch pixels, ignoring background at patch 0
patch_sizes = np.bincount(labeled_patches.ravel())[1:]
mean_patch_size = patch_sizes.mean()
# Calculate edge density
kernel = np.array([
[1, 1, 1],
[1, -8, 1],
[1, 1, 1]])
edges = convolve(veg_mask, kernel, mode='constant')
edge_density = np.sum(edges != 0) / veg_mask.sizeRepeat for all tracts
Using a loop, for each Census Tract:
Using a loop, for each data URL:
- Use
rioxarrayto open up a connection to the STAC asset, just like you would a file on your computer - Crop and then clip your data to the census tract boundary > HINT: check out the
.clip_boxparameterauto_expandand theclipparameterall_touchedto make sure you don’t end up with an empty array - Compute NDVI for the tract
- Use
Merge the NDVI rasters
Compute:
- total number of pixels within the tract
- fraction of pixels with an NDVI greater than .12 within the tract (and any other statistics you would like to look at)
Accumulate the statistics in a file for later
Using a conditional statement, ensure that you do not run this computation if you have already saved values. You do not want to run this step many times, or have to restart from scratch! There are many approaches to this, but we actually recommend implementing your caching in the previous cell when you generate your dataframe of URLs, since that step can take a few minutes as well. However, the important thing to cache is the computation.
# Skip this step if data are already downloaded
# Loop through the census tracts with URLs
# Open all images for tract
# Open vsi connection to data
# Clip data
# Compute NDVI
# Accumulate result
# Merge data
# Mask vegetation
# Calculate statistics and save data to file
# Calculate mean patch size
# Calculate edge density
# Add a row to the statistics file for this tract
# Re-load results from fileSee our solution!
# Skip this step if data are already downloaded
if not scene_df is None:
ndvi_dfs = []
# Loop through the census tracts with URLs
for tract, tract_date_gdf in tqdm(scene_df.groupby('tract')):
# Open all images for tract
tile_das = []
for _, href_s in tract_date_gdf.iterrows():
# Open vsi connection to data
tile_da = rxr.open_rasterio(
href_s.rgbir_href, masked=True).squeeze()
# Clip data
boundary = (
tract_cdc_gdf
.set_index('tract2010')
.loc[[tract]]
.to_crs(tile_da.rio.crs)
.geometry
)
crop_da = tile_da.rio.clip_box(
*boundary.envelope.total_bounds,
auto_expand=True)
clip_da = crop_da.rio.clip(boundary, all_touched=True)
# Compute NDVI
ndvi_da = (
(clip_da.sel(band=4) - clip_da.sel(band=1))
/ (clip_da.sel(band=4) + clip_da.sel(band=1))
)
# Accumulate result
tile_das.append(ndvi_da)
# Merge data
scene_da = rxrmerge.merge_arrays(tile_das)
# Mask vegetation
veg_mask = (scene_da>.3)
# Calculate statistics and save data to file
total_pixels = scene_da.notnull().sum()
veg_pixels = veg_mask.sum()
# Calculate mean patch size
labeled_patches, num_patches = label(veg_mask)
# Count patch pixels, ignoring background at patch 0
patch_sizes = np.bincount(labeled_patches.ravel())[1:]
mean_patch_size = patch_sizes.mean()
# Calculate edge density
kernel = np.array([
[1, 1, 1],
[1, -8, 1],
[1, 1, 1]])
edges = convolve(veg_mask, kernel, mode='constant')
edge_density = np.sum(edges != 0) / veg_mask.size
# Add a row to the statistics file for this tract
pd.DataFrame(dict(
tract=[tract],
total_pixels=[int(total_pixels)],
frac_veg=[float(veg_pixels/total_pixels)],
mean_patch_size=[mean_patch_size],
edge_density=[edge_density]
)).to_csv(
ndvi_stats_path,
mode='a',
index=False,
header=(not os.path.exists(ndvi_stats_path))
)
# Re-load results from file
ndvi_stats_df = pd.read_csv(ndvi_stats_path)
ndvi_stats_df| tract | total_pixels | frac_veg | mean_patch_size | edge_density | |
|---|---|---|---|---|---|
| 0 | 17031010100 | 1059109 | 0.178500 | 55.326602 | 0.118459 |
| 1 | 17031010201 | 1533140 | 0.213616 | 57.537421 | 0.161389 |
| 2 | 17031010202 | 978523 | 0.186242 | 63.256508 | 0.123770 |
| 3 | 17031010300 | 1308371 | 0.191591 | 57.061689 | 0.126558 |
| 4 | 17031010400 | 1516950 | 0.198593 | 53.019183 | 0.079399 |
| ... | ... | ... | ... | ... | ... |
| 783 | 17031843500 | 5647565 | 0.075255 | 9.732756 | 0.104605 |
| 784 | 17031843600 | 1142931 | 0.054342 | 9.176862 | 0.101092 |
| 785 | 17031843700 | 6025574 | 0.027667 | 7.482496 | 0.047655 |
| 786 | 17031843800 | 3638977 | 0.093921 | 24.170934 | 0.105051 |
| 787 | 17031843900 | 4522021 | 0.199102 | 23.988090 | 0.124304 |
788 rows × 5 columns
STEP 3 - Explore your data with plots
Chloropleth plots
Before running any statistical models on your data, you should check that your download worked. You should see differences in both median income and mean NDVI across the City.
Create a plot that contains:
- 2 side-by-side Chloropleth plots
- Asthma prevelence on one and mean NDVI on the other
- Make sure to include a title and labeled color bars
# Merge census data with geometry
# Plot chloropleths with vegetation statisticsSee our solution!
# Merge census data with geometry
chi_ndvi_cdc_gdf = (
tract_cdc_gdf
.merge(
ndvi_stats_df,
left_on='tract2010', right_on='tract', how='inner')
)
# Plot chloropleths with vegetation statistics
def plot_chloropleth(gdf, **opts):
"""Generate a chloropleth with the given color column"""
return gv.Polygons(
gdf.to_crs(ccrs.Mercator()),
crs=ccrs.Mercator()
).opts(xaxis=None, yaxis=None, colorbar=True, **opts)
(
plot_chloropleth(
chi_ndvi_cdc_gdf, color='asthma', cmap='viridis')
+
plot_chloropleth(chi_ndvi_cdc_gdf, color='edge_density', cmap='Greens')
)