Lesson 6. Handle missing spatial attribute data: GIS in Python
Learning Objectives
- Work with data sets that have missing data.
- Replace missing data values
This lesson covers how to rename and clean up attribute data using **geopandas.
import os
import numpy as np
import pandas as pd
import geopandas as gpd
import earthpy as et
# Set working dir & get data
data = et.data.get_data('spatial-vector-lidar')
os.chdir(os.path.join(et.io.HOME, 'earth-analytics'))
# Import roads shapefile
sjer_roads_path = os.path.join("data", "spatial-vector-lidar", "california",
"madera-county-roads", "tl_2013_06039_roads.shp")
sjer_roads = gpd.read_file(sjer_roads_path)
type(sjer_roads)
geopandas.geodataframe.GeoDataFrame
Explore Data Values
There are several ways to use pandas
to explore your data and determine if you have any missing values.
- To find the number of missing values per column in a DataFrame you can run
dfname.is_null().sum()
- Look at the unique values for a specific column of a DataFrame
dfname['column'].unique()
sjer_roads.isnull().sum()
LINEARID 0
FULLNAME 5149
RTTYP 5149
MTFCC 0
geometry 0
dtype: int64
Based on this method there are no NaN
or None
type obejcts as values in the geodataframe
. Double check the unique values in the road type column.
# View data type
print(type(sjer_roads['RTTYP']))
# View unique attributes for each road in the data
print(sjer_roads['RTTYP'].unique())
<class 'pandas.core.series.Series'>
['M' None 'S' 'C']
Replacing Values
If the value you want to replace is a
Nan
orNonetype
you can usedfname.loc[dfname['column'].isnull(), 'column' = 'newvaluu'
Or you can use the
pandas
.fillna()
method and .fullna
takes in the value that you want to replace.
Hmmmm there’s a road type that’s given an empty string
as a name. It would be helpful to fix this before doing more analyis or mapping with this dataset.
There are several ways to deal with this issue. One is to use the .replace
method to replace all instances of None in the attribute data with some new value. In this case, you will use - ‘Unknown’.
# Map each value to a new value
sjer_roads["RTTYP"] = sjer_roads["RTTYP"].fillna("Unknown")
print(sjer_roads['RTTYP'].unique())
['M' 'Unknown' 'S' 'C']
Alternatively you can use the .isnull()
function to select all attribute cells with a value equal to null
and set those to ‘Unknown’.
If the value you want to change is not NaN
or a Nonetype
then you will have to specify the origina value that you want to change, as shown below.
sjer_roads.head()
LINEARID | FULLNAME | RTTYP | MTFCC | geometry | |
---|---|---|---|---|---|
0 | 110454239066 | N 14th St | M | S1400 | LINESTRING (-120.27227 37.11615, -120.27244 37... |
1 | 110454239052 | N 11th St | M | S1400 | LINESTRING (-120.26788 37.11667, -120.26807 37... |
2 | 110454239056 | N 12th St | M | S1400 | LINESTRING (-120.27053 37.11749, -120.27045 37... |
3 | 110454239047 | N 10th St | M | S1400 | LINESTRING (-120.26703 37.11735, -120.26721 37... |
4 | 110454243091 | N Westberry Blvd | M | S1400 | LINESTRING (-120.10122 36.96524, -120.10123 36... |
Removing Values
In some specific instances you will want to remove NaN
values from your DataFrame
, to do this you can use the pandas
.dropna
function, note that this function will remove all rows from the dataframe that have a Nan
value in any of the columns.
Optional Challenge: Import & Plot Roads Shapefile
Import the madera-county-roads layer - california/madera-county-roads/tl_2013_06039_roads.shp
. Plot the roads.
Next, try to overlay the plot locations california/SJER/vector_data/SJER_plot_centroids.shp
and sjer_crop- california/SJER/vector_data/SJER_crop.shp
on top of the SJER crop extent. What happens?
- Check the CRS of both layers. What do you notice?
Leave a Comment