Get Climate Data Online

Climate change is impacting the way people live around the world

Learning Goals:
  • Analyze temperature data over time
  • Parse date information so that it is represented as a datetime type
  • Use operators to convert to different units
  • Resample time-series data to different frequencies

Before we get started, let’s define some parameters. You can use these if you want to change how the workflow runs from the top:

id = 'shortcourse'
project_dirname = 'climate-karachi'
ncei_filename = 'ncei-climate-karachi.csv'
location = 'Karachi, Pakistan'
station_id = 'PKM00041780'
start_date = '1942-10-01'
end_date = '2024-09-30'
data_type = 'TAVG'

There are more Earth Observation data online than any one person could ever look at

NASA’s Earth Observing System Data and Information System (EOSDIS) alone manages over 9PB of data. 1 PB is roughly 100 times the entire Library of Congress (a good approximation of all the books available in the US). It’s all available to you once you learn how to download what you want.

Here we’re using the NOAA National Centers for Environmental Information (NCEI) Access Data Service application progamming interface (API) to request data from their web servers. We will be using data collected as part of the Global Historical Climatology Network daily (GHCNd) from their Climate Data Online library program at NOAA.

For this example we’re requesting daily summary data in Karachi, Pakistan (station ID PKM00041780).

  1. Research the Global Historical Climatology Network - Daily data source.
  2. In the cell below, write a 2-3 sentence description of the data source.
  3. Include a citation of the data (HINT: See the ‘Data Citation’ tab on the GHCNd overview page).

Your description should include:

  • who takes the data
  • where the data were taken
  • what the maximum temperature units are
  • how the data are collected

Access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️

The cell below contains the URL for the data you will use in this part of the notebook. We created this URL by generating what is called an API endpoint using the NCEI API documentation.

What’s an API?

An application programming interface (API) is a way for two or more computer programs or components to communicate with each other. It is a type of software interface, offering a service to other pieces of software (Wikipedia).

First things first – you will need to import the earthpy library to help with data management and the pandas library to work with tabular data:

# Import required packages
See our solution!
# Import required packages
import pandas as pd

The cell below contains the URL you will use to download climate data. There are two things to notice about the URL code:

  1. It is surrounded by quotes – that means Python will interpret it as a string, or text, type, which makes sense for a URL.
  2. The URL is too long to display as one line on most screens. We’ve put parentheses around it so that we can easily split it into multiple lines by writing two strings – one on each line.
Try It: Format your URL for readability
  1. Pick an expressive variable name for the URL.
  2. Reformat the URL so that it adheres to the 79-character PEP-8 line limit, and so that it is easy to read. If you are using GitHub Codespaces, you should see two vertical lines in each cell – don’t let your code go past the second line.
  3. Replace ‘DATATYPE’, ‘STATION’, and the start and end dates ‘YYYY-MM-DD’, with the values for the data you want to download.
stuff23 = ('https://www.ncei.noaa.gov/access/services/da'
'ta/v1?dataset=daily-summaries&dataTypes=DATATYPE&stations=STATION&startDate=YYYY-MM-DD&endDate=YYYY-MM-DD&units=standard')
stuff23
See our solution!
ncei_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1'
    '?dataset=daily-summaries'
    f'&dataTypes={data_type}'
    f'&stations={station_id}'
    f'&startDate={start_date}'
    f'&endDate={end_date}'
    '&units=standard'
)
ncei_url
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG&stations=PKM00041780&startDate=1942-10-01&endDate=2024-09-30&units=standard'

Get NCEI data using the API

Try It
  1. Replace url with the name of your URL
  2. Run the code to download and check your data
# Download the climate data
climate_df = pd.read_csv(
    url,
    index_col='DATE',
    parse_dates=True,
    na_values=['NaN']
)

# Check that the download worked
climate_df.head()
See our solution!
# Download the climate data
# Retry a few times in case the server is down
for i in range(10):
    try:
        climate_df = pd.read_csv(
            ncei_url,
            index_col='DATE',
            parse_dates=True,
            na_values=['NaN'])
        break
    except:
        continue
        

# Check that the download worked
climate_df.head()
STATION TAVG
DATE
1942-10-01 PKM00041780 81
1942-10-02 PKM00041780 81
1942-10-03 PKM00041780 84
1942-10-04 PKM00041780 84
1942-10-05 PKM00041780 84

Save climate data to your computer

Try It
  1. Replace filename with the name of the file you want to save your data in. Your data file should end up in the same folder
  2. (optional) You can also construct a reproducible file path using the pathlib or os libraries and use that, or use earthpy to make a data directory based on your system settings.
  3. Run the code to save your data
Warning

For this activity it’s fine, but as a general rule you don’t want to upload data files to a GitHub repository! You can get into a situation where it’s impossible to upload to GitHub.

# Save the climate data
climate_df.to_csv('filename')
See our solution!
# Save the climate data to a project data folder
import earthpy
project = earthpy.Project(dirname=project_dirname)
climate_df.to_csv(
    project.project_dir / ncei_filename)
Reflect and Respond

What question do you want to answer with climate data? The options are limitless! To get started, you could think about:

  • How is climate change happening in your home town?
  • How is climate change different at different latitudes?
  • Do heat waves affect urban areas more?

Pick a new location and/or measurement to plot 🌏 📈

Recreate the workflow you just did in a place that interests you OR with a different measurement. You will need to make your own new Markdown and Code cells below this one, or create a new notebook.

Your analysis should include:

  1. A researched (with citations or links) site description, including why you chose the site
  2. A researched (with citations or links) data description, including a data citation
  3. A researched (with citations or links) methods overview
  4. Some kind of visual evidence (plot, chart, diagram) for your results
  5. A headline and description for the visual evidence that interprets your analysis and puts it in context

You should also delete the instructions before posting a portfolio page.

BONUS: Create a shareable Markdown of your work

Below is some code that you can run that will save a Markdown file of your work that is easily shareable and can be uploaded to GitHub Pages. You can use it as a starting point for writing your portfolio post!

%%capture
%%bash
jupyter nbconvert *.ipynb --to markdown