import os
import numpy as np
import matplotlib.pyplot as plt
import rioxarray as rioxr
import geopandas as gpd
from shapely.geometry import Polygon
21 rioxarray
In this lesson we will introduce rioxarray
, a Python extension for xarray
to manipulate xarray.DataArray
s as rasters. The name rioxarray
stands for raster input/output + xarray. We will use the rioxarray
’s rio
accessor to obtain raster information from an xarray.DataArray
and do some raster manipulations (calculate NDVI).
21.1 Data
The raster files we will use today come from the US National Agriculture Imagery Program (NAIP). NAIP images are are high-resolution aerial images with four spectral bands: Red, Green, Blue and Near-infrared (NIR). The raster’s we’ll use today are from 2020.
For this lesson, I did some pre-processing of the data to separate the RGB bands from the NIR band and clipped a scene. The data was accessed and pre-processed at Microsoft’s Planetary Computer NAIP data repository.
21.2 Import .tif
Let’s start by loading the libraries we’ll use:
There are multiple ways of opening a ‘.tif’ file using xarray
or rioxarray
. Using the rioxarray.open_rasterio()
function to open the ‘.tif’ file is a simple way to make sure all our geospatial data gets loaded correctly:
# load NIR tif file
= os.path.join(os.getcwd(),'data','naip','nir.tif')
nir_fp = rioxr.open_rasterio(nir_fp)
nir nir
<xarray.DataArray (band: 1, y: 3208, x: 2419)> [7760152 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 2.512e+05 2.512e+05 ... 2.527e+05 2.527e+05 * y (y) float64 3.813e+06 3.813e+06 ... 3.811e+06 3.811e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
21.3 xr.DataArray
exploration
First, let’s verify the raster we loaded is an xarray.DataArray
:
type(nir)
xarray.core.dataarray.DataArray
Notice we see all the components of an xarray.DataArray
: its dimensions (band, y, x), the coordiantes for each dimension, and some attributes. We can also directly access some of these attribues:
# print shape and data type
print('shape: ', nir.shape)
print('data type: ', nir.dtype, '\n')
shape: (1, 3208, 2419)
data type: uint8
Using the .values
attribute we can get a quick view at the values at the corners in our data array.
print(type(nir.values))
nir.values
<class 'numpy.ndarray'>
array([[[167, 164, 161, ..., 147, 152, 151],
[170, 170, 168, ..., 151, 149, 154],
[176, 177, 177, ..., 151, 151, 151],
...,
[ 94, 88, 101, ..., 83, 88, 79],
[108, 95, 103, ..., 92, 91, 75],
[ 94, 90, 104, ..., 87, 88, 82]]], dtype=uint8)
We can also plot our data:
nir.plot()
<matplotlib.collections.QuadMesh at 0x10a80ce90>
Notice the coordinates on the x and y axes. This map shows the light captured in the near-infrared spectrum by a sensor on a plane. Can you guess where this? If you guessed Santa Barbara downtown, you guessed right!
21.4 rio
accessor
An accessor in Python let’s us access a different set of properties of an object. In our case, we use the .rio
accessor for xarray.DataArray
s to access its raster properties. For example, its number of bands, height, width, spatial bounding box, and CRS:
# check geospatial attributes
print('# bands: ', nir.rio.count)
print('height: ', nir.rio.height)
print('width: ', nir.rio.width, '\n')
print('spatial bounding box: ')
print(nir.rio.bounds(), '\n')
print('CRS: ', nir.rio.crs)
# bands: 1
height: 3208
width: 2419
spatial bounding box:
(251218.8, 3811027.2, 252670.19999999998, 3812952.0)
CRS: EPSG:26911
We have used accessors before, for example the .str
and .dt
accessors in pandas
.
21.5 Multi-band raster
Let’s now import the RGB raster:
# open RGB raster
= os.path.join(os.getcwd(),'data','naip','rgb.tif')
rgb_fp = rioxr.open_rasterio(rgb_fp)
rgb rgb
<xarray.DataArray (band: 3, y: 3208, x: 2419)> [23280456 values with dtype=uint8] Coordinates: * band (band) int64 1 2 3 * x (x) float64 2.512e+05 2.512e+05 ... 2.527e+05 2.527e+05 * y (y) float64 3.813e+06 3.813e+06 ... 3.811e+06 3.811e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
Notice this raster has three bands, instead of one. This makes sense because we know these bands correspond tothe Red, Green and Blue bands of the image. We can also check this information by looking directly at the attributes:
print('rgb shape: ', rgb.shape)
print('rgb # bands: ', rgb.rio.count)
rgb shape: (3, 3208, 2419)
rgb # bands: 3
Let’s check the geospatial data too:
# check geospatial attributes
print('shape: ', rgb.shape)
print('data type: ', rgb.dtype)
print('# bands: ', rgb.rio.count)
print('CRS: ', rgb.rio.crs)
# check if the CRSs of the rasters match
print( rgb.rio.crs == nir.rio.crs)
shape: (3, 3208, 2419)
data type: uint8
# bands: 3
CRS: EPSG:26911
True
On the last line we checked the nir and rgb rasters have the same ESPG:26911 CRS. This is a projected CRS.
Finally, let’s plot this raster. Since it has three bands, we can plot it as an image using the .plot.imshow()
method, which will interpret the three bands of the object as RGB.
# parameters for plotting rasters
= 6 # height in in of plot height
size = rgb.rio.width / rgb.rio.height # ratio of widht/height
aspect
# plot three bands as RGB image
=size, aspect=aspect) rgb.plot.imshow(size
<matplotlib.image.AxesImage at 0x145989f10>
21.6 Box for clipping
Our area of interest (aoi) for this lesson is a smaller region that includes only a few blocks around the NCEAS building. An easy way to obtain coordinates for such a region:
- go to https://geojson.io/ website,
- zoom in until you find the NCEAS building in Santa Barbara, it might help to change to ‘Satellite Streets’ view on the bottom left corner,
- click on the rectangle icon on the right-side toolbar and draw a small region around the NCEAS buildng,
- the geoJSON code representing this area will appear in the code box,
- one option is two copy-paste this geoJSON into an empty text file and save such file with the .json extension, we could then read it in using
geopandas
- instead, we will create our region of index by just copy-pasting the list of points and storing it as a variable:
# vertices of our aoi box
= [[-119.70608227128903, 34.426300194372274],
points -119.70608227128903, 34.42041139020533],
[-119.6967885126002, 34.42041139020533],
[-119.6967885126002, 34.426300194372274],
[-119.70608227128903, 34.426300194372274]] [
We can then create a new geopandas.GeoDataFrame
:
# create geodataframe with aoi
= gpd.GeoDataFrame(geometry=[Polygon(points)],
aoi ='epsg:4326')
crs aoi
geometry | |
---|---|
0 | POLYGON ((-119.70608 34.42630, -119.70608 34.4... |
Let’s break this down a bit:
- first, we use the
shapely
’sPolygon()
function to create a polygon from ourpoints
list. - in
[Polygon(points)]
we put this polygon inside a list so we can form the geometry column of our newgpd.GeoDataFrame
- we know all the geoJSON files have CRS equal to EPSG:4326/WGS 84, so we set the the CRS of our new
gpd.GeoDataFrame
to this.
21.7 Clip raster
Remember: if two geospatial sets will interact they need to be in the same CRS.
In our case, the aoi gpd.GeoDataFrame
does not have the same CRS as the rasters:
# check CRss
print('aoi CRS: ', aoi.crs)
print('nir CRS: ', nir.rio.crs)
print('rgb CRS: ', rgb.rio.crs)
aoi CRS: epsg:4326
nir CRS: EPSG:26911
rgb CRS: EPSG:26911
So let’s reproject:
# reproject aoi to rgb crs
= aoi.to_crs(rgb.rio.crs)
aoi print('matched crs?', aoi.crs == rgb.rio.crs)
aoi.crs
matched crs? True
<Projected CRS: EPSG:26911>
Name: NAD83 / UTM zone 11N
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: UTM zone 11N
- method: Transverse Mercator
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
And plot them together:
# plot aoi outline and RGB raster together
= plt.subplots()
fig, ax *aspect)) # reuse size and aspect
fig.set_size_inches((size, size=ax)
rgb.plot.imshow(ax=ax, alpha=0.6) aoi.plot(ax
<Axes: title={'center': 'spatial_ref = 0'}, xlabel='x', ylabel='y'>
To clip the raster using the aoi polygon we use the .rio.clip_box()
method:
# clip rasters to aoi
= rgb.rio.clip_box(*aoi.total_bounds)
rgb_small = nir.rio.clip_box(*aoi.total_bounds) nir_small
Notice a few things: - we had to use the .rio
accessor to access the clip_box()
method - similarly to the shapely.box()
function we’ve used previously, .rio.clip_box()
usual parameters are minx, miny, maxx, maxy. We are using the *
asterisk as an unpacking operator to get these from the list aoi.total_bounds
.
Let’s check our clipped data:
# check shape updates
print('original shape: ', rgb.shape)
print('reduced shape: ', rgb_small.shape)
# plot with correct sizes
= 4
size = rgb_small.rio.width/ rgb_small.rio.height
aspect =size, aspect=aspect) rgb_small.plot.imshow(size
original shape: (3, 3208, 2419)
reduced shape: (3, 1128, 1454)
<matplotlib.image.AxesImage at 0x145c332d0>
# check shape updates
print('original shape: ', nir.shape)
print('reduced shape: ', nir_small.shape)
nir_small.plot()
original shape: (1, 3208, 2419)
reduced shape: (1, 1128, 1454)
<matplotlib.collections.QuadMesh at 0x145d43890>
21.8 Compute NDVI
We often want to combine values of and perform calculations on rasters to create a new output raster. In our case, we are interested in computing the Normalized Difference Vegetation Index (NDVI) over our area of interest. The NDVI is an index commonly used to check if an area has live green vegetation or not.
According to the Earth Observing System > The results of the NDVI calculation range from -1 to 1. Negative values correspond to areas with water surfaces, manmade structures, rocks, clouds, snow; bare soil usually falls within 0.1-0.2 range; and plants will always have positive values between 0.2 and 1. Healthy, dense vegetation canopy should be above 0.5, and sparse vegetation will most likely fall within 0.2 to 0.5.
The NDVI is calculated using the NIR and red bands. The formula is
\(NDVI = \frac{NIR - Red}{NIR + Red}.\)
First, we need to select the red band:
= rgb_small.sel(band=1)
red red
<xarray.DataArray (y: 1128, x: 1454)> [1640112 values with dtype=uint8] Coordinates: band int64 1 * x (x) float64 2.513e+05 2.513e+05 ... 2.522e+05 2.522e+05 * y (y) float64 3.813e+06 3.813e+06 ... 3.812e+06 3.812e+06 spatial_ref int64 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
To be able to perform the calculation successfully, we will need to udpate the data type of our rasters:
= red.astype('int16')
red16 = nir_small.astype('int16')
nir16 print('RED: original dtype:', rgb_small.dtype, '.... converted dtype:', red16.dtype)
print('NIR: original dtype:', nir.dtype, '.... converted dtype:', nir16.dtype)
RED: original dtype: uint8 .... converted dtype: int16
NIR: original dtype: uint8 .... converted dtype: int16
We can perform raster calculations using the same arithmetic we use for np.array
s (because, underneath it all, they are). So our NDVI calculation is as follows:
# calculate and plot NDVI
= (nir16 - red16)/(nir16+red16)
ndvi ndvi.plot()
<matplotlib.collections.QuadMesh at 0x145e5f290>
Remember that plants will always have positive NDVI values between 0.2 and 1. Can you spot the Courthouse?
The uint8
(8-bit unsigned integer) is a very small data type that only holds integers from 0 up to 255. In particular, calculations don’t return what what we are used to when working with intgers (they’re done module 256):
150) + np.uint8(150) np.uint8(
/var/folders/gm/chd1kps96_g7xdxyfw150wm80000gp/T/ipykernel_84845/1890984988.py:1: RuntimeWarning: overflow encountered in scalar add
np.uint8(150) + np.uint8(150)
44
In the NDVI formula we have to add NIR + Red. If both NIR and Red are very close to 255, when we add them, the calculation overflows the uint8
data type and we don’t get the expected results:
= (nir - red)/(nir + red)
x x.plot()
<matplotlib.collections.QuadMesh at 0x145f49d10>
This is why we need to manually convert both rasters into int16
, which will be big enough to hold all the numbers that appear in the calculations.
Notice too, that when we performed the NDVI calculation we did not get any warning, although we were overflowing the computation at every cell of our array. This is can be an example of failing silently, where we don’t get any warnings about the errors in our computation. That’s why it’s so important to double-check our results!