Lesson 3. How to Download MACA2 Climate Data Using Python
Open MACA v2 Climate data Programmatically using Open Source Python and Xarray
In this lesson, you will learn how to work with Climate Data Sets (MACA v2 for the Continental United States - CONUS) stored in netcdf 4 format using open source Python.
Learning Objectives
After completing this chapter, you will be able to:
- Download different types of MACA v2 climate data in
netcdf 4
format - Open and process netcdf4 data using
xarray
Get Started Downloading MACA v2 Climate Data in Python
To begin, load the libraries below.
# Import packages
import numpy as np
import netCDF4
import matplotlib.pyplot as plt
import xarray as xr
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import seaborn as sns
# Plotting options
sns.set(font_scale=1.3)
sns.set_style("white")
Get Started With Downloading Data
The data you will use in this lesson are “Monthly aggregation of downscaled daily meteorological data of Monthly Precipitation Amount from College of Global Change and Earth System Science, Beijing Normal University”. In short, the data contain a monthly summary of lots of meteorological data, such as precipitation, air temperature, and more. The data are derived from a climate model that predicts future trendsin these variables over time.
Below, you will create and assign three Python variables that allow you to programatically select which data you wish to download in this notebook. This workflow could then be convered into an automated workflow that accesses and slices MACA v2 data for an analysis.
The variables including:
Select a Climate Model
model =
This Python variable can be set to any number between 0 and 19 which represents the 20 climate models that are available for MACA v2 data. The model represents how (the methods used) the climate data were created. You can learn more about each model by clicking here
# Models to chose from
model_name = ('bcc-csm1-1',
'bcc-csm1-1-m',
'BNU-ESM',
'CanESM2',
'CCSM4',
'CNRM-CM5',
'CSIRO-Mk3-6-0',
'GFDL-ESM2G',
'GFDL-ESM2M',
'HadGEM2-CC365',
'HadGEM2-ES365',
'inmcm4',
'IPSL-CM5A-MR',
'IPSL-CM5A-LR',
'IPSL-CM5B-LR',
'MIROC5',
'MIROC-ESM',
'MIROC-ESM-CHEM',
'MRI-CGCM3',
'NorESM1-M')
Climate Data Variables
var =
is the variable in the dataset you want to work with. There are 9 options for variables and they are listed in both short and long name versions below. You can assignvar =
to any number between 0 and 8, where 0 is the first option in the list, and 8 is the last. In the list below note that isvar = 0
you would be selecting tax_max or max temperature.
# These are the variable options for the met data
variable_name = ('tasmax',
'tasmin',
'rhsmax',
'rhsmin',
'pr',
'rsds',
'uas',
'vas',
'huss')
# These are var options in long form
var_long_name = ('air_temperature',
'air_temperature',
'relative_humidity',
'relative_humidity',
'precipitation',
'surface_downwelling_shortwave_flux_in_air',
'eastward_wind',
'northward_wind',
'specific_humidity')
Climate Data Scenarios
scenario =
can be chosen to pick which climate scenario you want to you.0
is the historical actual data. This data is based on actual data and is not modeled.1
is thercp45
scenario, which is described as an intermediate climate scenario.2
is thercp85
scenario, which is a worst case (strongest immissions) climate scenario.
Data Tip: You can learn more about the various variables and scenario options by going to the toolbox and clicking on the small yellow question mark next to “variable” or “scenario”. Note that the scenario options are only available when you try to download future predicted data.
Select Data Download Options
Below you first create lists containing the the options that you wish to use to download your data.
# This is the base url required to download data from the thredds server.
dir_path = 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/'
# These are the variable options for the met data
variable_name = ('tasmax',
'tasmin',
'rhsmax',
'rhsmin',
'pr',
'rsds',
'uas',
'vas',
'huss')
# These are var options in long form
var_long_name = ('air_temperature',
'air_temperature',
'relative_humidity',
'relative_humidity',
'precipitation',
'surface_downwelling_shortwave_flux_in_air',
'eastward_wind',
'northward_wind',
'specific_humidity')
# Models to chose from
model_name = ('bcc-csm1-1',
'bcc-csm1-1-m',
'BNU-ESM',
'CanESM2',
'CCSM4',
'CNRM-CM5',
'CSIRO-Mk3-6-0',
'GFDL-ESM2G',
'GFDL-ESM2M',
'HadGEM2-CC365',
'HadGEM2-ES365',
'inmcm4',
'IPSL-CM5A-MR',
'IPSL-CM5A-LR',
'IPSL-CM5B-LR',
'MIROC5',
'MIROC-ESM',
'MIROC-ESM-CHEM',
'MRI-CGCM3',
'NorESM1-M')
# Scenarios
scenario_type = ('historical', 'rcp45', 'rcp85')
# Year start and ends (historical vs projected)
year_start = ('1950', '2006', '2006')
year_end = ('2005', '2099', '2099')
run_num = [1] * 20
run_num[4] = 6 # setting CCSM4 with run 6
domain = 'CONUS'
Next, select the options that you want to use for your data download.
# Model options between 0-19
model = 2
# Options 0-8 will work for var. Var maps to the variable name below
var = 0
# Options range from 0-2
scenario = 2
try:
print("Great! You have selected: \n \u2705 Variable: {} \n \u2705 Model: {}, "
"\n \u2705 Scenario: {}".format(variable_name[var],
model_name[model],
scenario_type[scenario]))
except IndexError as e:
raise IndexError("Oops, it looks like you selected value that is "
"not within the range of values which is 0-2. please look"
"closely at your selected values.")
Great! You have selected:
✅ Variable: tasmax
✅ Model: BNU-ESM,
✅ Scenario: rcp85
Finally, use the scenario
variable to select the time period associated with the options selected above.
try:
time = year_start[scenario]+'_' + year_end[scenario]
print("\u2705 Your selected time period is:", time)
except IndexError as e:
raise IndexError("Oops, it looks like you selected a scenario value that is \
not within the range of values which is 0-2")
✅ Your selected time period is: 2006_2099
Below you create a path to the correct MACA data using the Python variables created abive. The file name containing both agg_macav2metdata_
and _monthly.nc
represents monthly data. You will use that data for this lesson over the daily data because it will be a smaller file to download.
Data Access Tip
Monthly vs. Daily Data
The example below creates a path to the non aggregated monthly CONUS (Continental United States) data. However you can also access the daily or aggregated data using a similar approach
- Here is a slightly dated but good examples of accessing MACA v2 data using Python. The demo further shows you how to access data for specific locations rather than needing to download the entire file.
# This code creates a path to the monthly MACA v2 data
file_name = ('agg_macav2metdata_' +
str(variable_name[var]) +
'_' +
str(model_name[model]) +
'_r' +
str(run_num[model])+'i1p1_' +
str(scenario_type[scenario]) +
'_' +
time + '_' +
domain + '_monthly.nc')
print("\u2705 You are accessing:\n", file_name, "\n data in netcdf format")
✅ You are accessing:
agg_macav2metdata_tasmax_BNU-ESM_r1i1p1_rcp85_2006_2099_CONUS_monthly.nc
data in netcdf format
full_file_path = dir_path + file_name
print("The full path to your data is: \n", full_file_path)
The full path to your data is:
http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_macav2metdata_tasmax_BNU-ESM_r1i1p1_rcp85_2006_2099_CONUS_monthly.nc
Open Your Data
Below you open your data with xarray. The open data code is wrapped in a try/except block to ensure that it fails gracefully if the data can’t be accessed. Remember that when you are opening data here, you are hitting a server online. Thus you need internet access to run the code below.
# Open the data from the thredds server
try:
max_temp_xr = xr.open_dataset(full_file_path)
except OSError as oe:
print("Oops, it looks like the file that you are trying to connect to, "
"{}, doesn't exist. Try to revisit your model options to ensure "
"the data exist on the server. ".format(full_file_path))
# View your temperature data
max_temp_xr
<xarray.Dataset> Dimensions: (lat: 585, crs: 1, lon: 1386, time: 1128) Coordinates: * lat (lat) float64 25.06 25.1 25.15 25.19 ... 49.31 49.35 49.4 * crs (crs) int32 1 * lon (lon) float64 235.2 235.3 235.3 235.4 ... 292.9 292.9 292.9 * time (time) object 2006-01-15 00:00:00 ... 2099-12-15 00:00:00 Data variables: air_temperature (time, lat, lon) float32 ... Attributes: (12/46) description: Multivariate Adaptive Constructed Analog... id: MACAv2-METDATA naming_authority: edu.uidaho.reacch Metadata_Conventions: Unidata Dataset Discovery v1.0 Metadata_Link: cdm_data_type: FLOAT ... ... contributor_role: Postdoctoral Fellow publisher_name: REACCH publisher_email: reacch@uidaho.edu publisher_url: http://www.reacchpna.org/ license: Creative Commons CC0 1.0 Universal Dedic... coordinate_system: WGS84,EPSG:4326
- lat: 585
- crs: 1
- lon: 1386
- time: 1128
- lat(lat)float6425.06 25.1 25.15 ... 49.35 49.4
- long_name :
- latitude
- standard_name :
- latitude
- units :
- degrees_north
- axis :
- Y
- description :
- Latitude of the center of the grid cell
array([25.063078, 25.104744, 25.14641 , ..., 49.312691, 49.354359, 49.396023])
- crs(crs)int321
- grid_mapping_name :
- latitude_longitude
- longitude_of_prime_meridian :
- 0.0
- semi_major_axis :
- 6378137.0
- inverse_flattening :
- 298.257223563
array([1], dtype=int32)
- lon(lon)float64235.2 235.3 235.3 ... 292.9 292.9
- units :
- degrees_east
- axis :
- X
- description :
- Longitude of the center of the grid cell
- long_name :
- longitude
- standard_name :
- longitude
array([235.227844, 235.269501, 235.311157, ..., 292.851929, 292.893585, 292.935242])
- time(time)object2006-01-15 00:00:00 ... 2099-12-...
- description :
- days since 1900-01-01
array([cftime.DatetimeNoLeap(2006, 1, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 2, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 3, 15, 0, 0, 0, 0, has_year_zero=True), ..., cftime.DatetimeNoLeap(2099, 10, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2099, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2099, 12, 15, 0, 0, 0, 0, has_year_zero=True)], dtype=object)
- air_temperature(time, lat, lon)float32...
- long_name :
- Monthly Average of Daily Maximum Near-Surface Air Temperature
- units :
- K
- grid_mapping :
- crs
- standard_name :
- air_temperature
- height :
- 2 m
- cell_methods :
- time: maximum(interval: 24 hours);mean over days
- _ChunkSizes :
- [ 10 44 107]
[914593680 values with dtype=float32]
- description :
- Multivariate Adaptive Constructed Analogs (MACA) method, version 2.3,Dec 2013.
- id :
- MACAv2-METDATA
- naming_authority :
- edu.uidaho.reacch
- Metadata_Conventions :
- Unidata Dataset Discovery v1.0
- Metadata_Link :
- cdm_data_type :
- FLOAT
- title :
- Monthly aggregation of downscaled daily meteorological data of Monthly Average of Daily Maximum Near-Surface Air Temperature from College of Global Change and Earth System Science, Beijing Normal University (BNU-ESM) using the run r1i1p1 of the rcp85 scenario.
- summary :
- This archive contains monthly downscaled meteorological and hydrological projections for the Conterminous United States at 1/24-deg resolution. These monthly values are obtained by aggregating the daily values obtained from the downscaling using the Multivariate Adaptive Constructed Analogs (MACA, Abatzoglou, 2012) statistical downscaling method with the METDATA (Abatzoglou,2013) training dataset. The downscaled meteorological variables are maximum/minimum temperature(tasmax/tasmin), maximum/minimum relative humidity (rhsmax/rhsmin),precipitation amount(pr), downward shortwave solar radiation(rsds), eastward wind(uas), northward wind(vas), and specific humidity(huss). The downscaling is based on the 365-day model outputs from different global climate models (GCMs) from Phase 5 of the Coupled Model Inter-comparison Project (CMIP3) utlizing the historical (1950-2005) and future RCP4.5/8.5(2006-2099) scenarios.
- keywords :
- monthly, precipitation, maximum temperature, minimum temperature, downward shortwave solar radiation, specific humidity, wind velocity, CMIP5, Gridded Meteorological Data
- keywords_vocabulary :
- standard_name_vocabulary :
- CF-1.0
- history :
- No revisions.
- comment :
- geospatial_bounds :
- POLYGON((-124.7722 25.0631,-124.7722 49.3960, -67.0648 49.3960,-67.0648, 25.0631, -124.7722,25.0631))
- geospatial_lat_min :
- 25.0631
- geospatial_lat_max :
- 49.3960
- geospatial_lon_min :
- -124.7722
- geospatial_lon_max :
- -67.0648
- geospatial_lat_units :
- decimal degrees north
- geospatial_lon_units :
- decimal degrees east
- geospatial_lat_resolution :
- 0.0417
- geospatial_lon_resolution :
- 0.0417
- geospatial_vertical_min :
- 0.0
- geospatial_vertical_max :
- 0.0
- geospatial_vertical_resolution :
- 0.0
- geospatial_vertical_positive :
- up
- time_coverage_start :
- 2091-01-01T00:0
- time_coverage_end :
- 2095-12-31T00:00
- time_coverage_duration :
- P5Y
- time_coverage_resolution :
- P1M
- date_created :
- 2014-05-15
- date_modified :
- 2014-05-15
- date_issued :
- 2014-05-15
- creator_name :
- John Abatzoglou
- creator_url :
- http://maca.northwestknowledge.net
- creator_email :
- jabatzoglou@uidaho.edu
- institution :
- University of Idaho
- processing_level :
- GRID
- project :
- contributor_name :
- Katherine C. Hegewisch
- contributor_role :
- Postdoctoral Fellow
- publisher_name :
- REACCH
- publisher_email :
- reacch@uidaho.edu
- publisher_url :
- http://www.reacchpna.org/
- license :
- Creative Commons CC0 1.0 Universal Dedication(http://creativecommons.org/publicdomain/zero/1.0/legalcode)
- coordinate_system :
- WGS84,EPSG:4326
Subset Your Data
Currently, the dataset you have is too big to work with. You can fix this by subsetting the data. There are two ways you can subset the data: spatially, and temporally.
To spatially subset the data, you will only look at data from one point in the xarray Dataset. Below, assign a new number for latitude
and longitude
to pick a new point. The data’s latitude values range from about 25 to 50, and the data’s longitude values range from 235 to 292. So try and pick new values within those ranges.
To temporally subset the data, you can pick a start date and end date to trim the data to. Below, assign new values for the data to start and end at. Make sure the values you assign stay in the quotes provided. The format should be 'yyyy-mm'
. Keep in mind that depending on which scenario you chose above, the years of your data will be different. So pick dates that are within the scenario you chose.
Scenario Number | Date Range |
---|---|
0 | 1950-2005 |
1 | 2006-2099 |
2 | 2006-2099 |
# Select the latitude, longitude, and timeframe to subset the data to
# Ensure your latitude value is between 25 and 50, and your longitude value is between 235 and 292
# latitude = 35
# longitude = 270
start_date = '2008-01'
end_date = '2012-09'
# Select a lat / lon location that you wish to use to extract the data
latitude = max_temp_xr.lat.values[300]
longitude = max_temp_xr.lon.values[150]
print("You selected the following x,y location:", longitude, latitude)
You selected the following x,y location: 241.4777374267578 37.5628776550293
# Slice one lat/lon data point
temp_single_point = max_temp_xr["air_temperature"].sel(
lat=latitude,
lon=longitude)
temp_single_point
<xarray.DataArray 'air_temperature' (time: 1128)> array([282.93192, 285.54318, 291.04315, ..., 301.6674 , 290.809 , 288.78992], dtype=float32) Coordinates: lat float64 37.56 lon float64 241.5 * time (time) object 2006-01-15 00:00:00 ... 2099-12-15 00:00:00 Attributes: long_name: Monthly Average of Daily Maximum Near-Surface Air Tempera... units: K grid_mapping: crs standard_name: air_temperature height: 2 m cell_methods: time: maximum(interval: 24 hours);mean over days _ChunkSizes: [ 10 44 107]
- time: 1128
- 282.9 285.5 291.0 290.9 292.2 298.0 ... 308.8 306.6 301.7 290.8 288.8
array([282.93192, 285.54318, 291.04315, ..., 301.6674 , 290.809 , 288.78992], dtype=float32)
- lat()float6437.56
- long_name :
- latitude
- standard_name :
- latitude
- units :
- degrees_north
- axis :
- Y
- description :
- Latitude of the center of the grid cell
array(37.56287766)
- lon()float64241.5
- units :
- degrees_east
- axis :
- X
- description :
- Longitude of the center of the grid cell
- long_name :
- longitude
- standard_name :
- longitude
array(241.47773743)
- time(time)object2006-01-15 00:00:00 ... 2099-12-...
- description :
- days since 1900-01-01
array([cftime.DatetimeNoLeap(2006, 1, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 2, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 3, 15, 0, 0, 0, 0, has_year_zero=True), ..., cftime.DatetimeNoLeap(2099, 10, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2099, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2099, 12, 15, 0, 0, 0, 0, has_year_zero=True)], dtype=object)
- long_name :
- Monthly Average of Daily Maximum Near-Surface Air Temperature
- units :
- K
- grid_mapping :
- crs
- standard_name :
- air_temperature
- height :
- 2 m
- cell_methods :
- time: maximum(interval: 24 hours);mean over days
- _ChunkSizes :
- [ 10 44 107]
Below you quickly plot the data. You will learn more about working with these data (and creating nicer plots) in the following lessons.
# Quick plot of the data
temp_single_point.plot.line()
plt.show()
Leave a Comment