Calculating NDVI directly from xarray DataArrays
Apply (NIR - Red) / (NIR + Red) with label-based band selection and safe division, and the result retains full spatial coordinate metadata:
import xarray as xr
import numpy as np
def ndvi(data: xr.DataArray | xr.Dataset) -> xr.DataArray:
nir = (data["nir"] if isinstance(data, xr.Dataset) else data.sel(band="nir")).astype("float32")
red = (data["red"] if isinstance(data, xr.Dataset) else data.sel(band="red")).astype("float32")
denom = nir + red
result = (nir - red) / denom.where(denom != 0)
return result.clip(-1, 1)
That four-liner is the core. The rest of this page explains why each step is necessary and shows production-ready variants.
Why this case arises in remote sensing workflows
NDVI is the most widely used spectral index in land-cover monitoring, phenology tracking, and agricultural yield modelling. But computing it correctly from raw satellite data involves several failure modes that do not exist in simple NumPy array math: unsigned-integer underflow, division by zero at scene borders, coordinate misalignment between separately-downloaded band files, and loss of the CRS and geotransform during array arithmetic.
This page belongs to the Band Math Operations with Xarray workflow. For the broader data-loading and coordinate management context, see Core Raster Fundamentals & STAC Mapping.
Environment & setup
| Package | Minimum version | Role on this page |
|---|---|---|
xarray |
2023.6 | Labeled n-D arrays, .sel(), .where(), .clip() |
numpy |
1.24 | np.nan sentinel value |
rioxarray |
0.15 | Spatial extension: .rio.to_raster(), CRS propagation |
dask[complete] |
2023.9 | Lazy computation for large scenes (optional but recommended) |
pip install "xarray>=2023.6" "numpy>=1.24" "rioxarray>=0.15" "dask[complete]>=2023.9"
How xarray NDVI calculation works — data-flow diagram
The SVG below shows the transformation path from a raw multi-band DataArray to a masked, CF-annotated NDVI output.
Complete working example
import xarray as xr
import numpy as np
import rioxarray # noqa: F401 — activates the .rio accessor on DataArrays
def calculate_ndvi(data: xr.DataArray | xr.Dataset) -> xr.DataArray:
"""
Compute NDVI from a labeled xarray object.
Parameters
----------
data : xr.DataArray with a 'band' coordinate containing 'nir' and 'red',
OR an xr.Dataset with variables named 'nir' and 'red'.
Returns
-------
xr.DataArray : float32 NDVI, range [-1, 1], nodata pixels set to NaN,
spatial coordinates and CRS inherited from input.
"""
# --- Band extraction (structure-agnostic) ---
if isinstance(data, xr.Dataset):
# Dataset: bands are separate variables; direct key access preserves attrs
nir = data["nir"]
red = data["red"]
else:
# DataArray: bands stacked on a 'band' coordinate with string labels.
# .sel() is label-based — immune to band-order differences between sensors.
nir = data.sel(band="nir")
red = data.sel(band="red")
# --- float32 cast (MUST precede arithmetic on uint16 inputs) ---
# Sentinel-2 L2A and Landsat Collection 2 SR are uint16.
# uint16 subtraction wraps: 50 - 100 → 65486, not -50.
# float32 gives ~7 significant figures — ample for NDVI's [-1, 1] range.
nir = nir.astype("float32")
red = red.astype("float32")
# --- Core formula with safe division ---
denominator = nir + red
# Dividing by NaN propagates NaN rather than raising ZeroDivisionError
ndvi = (nir - red) / denominator.where(denominator != 0)
# --- Enforce physical bounds ---
# Floating-point drift or miscalibrated pixels can produce values outside [-1, 1].
# .clip() is cheap and prevents downstream colormap / histogram anomalies.
ndvi = ndvi.clip(-1.0, 1.0)
# --- CF-compliant metadata ---
# Tools like rioxarray, Panoply, and MetPy read these keys automatically
# for unit-aware plotting and auto-scaling.
ndvi.attrs.update({
"long_name": "Normalized Difference Vegetation Index",
"standard_name": "normalized_difference_vegetation_index",
"units": "1", # dimensionless — CF convention for pure ratios
"formula": "(NIR - Red) / (NIR + Red)",
"valid_range": [-1.0, 1.0],
})
return ndvi
# --- Minimal usage example ---
if __name__ == "__main__":
import rasterio
# Open a multi-band GeoTIFF where band order is: 1=Red, 2=NIR (e.g. Landsat 8 B4, B5)
with rasterio.open("landsat_scene.tif") as src:
da = xr.open_rasterio(src).assign_coords(
band=["red", "nir"] # label the positional bands
)
ndvi_result = calculate_ndvi(da)
print(ndvi_result)
# Output: <xarray.DataArray (y: ..., x: ...)>
# Coordinates: x, y, spatial_ref
# Write to cloud-optimized GeoTIFF
ndvi_result.rio.to_raster("ndvi_output.tif", driver="GTiff",
compress="deflate", dtype="float32")
Variant patterns
Variant 1 — Dask-enabled large-scene processing
For multi-temporal stacks or continental extents, open the data with Dask chunking before calling calculate_ndvi. The function is fully lazy: all operations compose into a single task graph that only materialises when you write to disk.
import xarray as xr
import rioxarray # noqa: F401
# stackstac or odc-stac produce chunked DataArrays directly from STAC items.
# chunk=(1, 1024, 1024) keeps each task ~4 MB in float32 — a good baseline.
da = xr.open_dataset(
"sentinel2_stack.zarr",
engine="zarr",
chunks={"time": 1, "y": 1024, "x": 1024},
)["nir_red_stack"]
# calculate_ndvi returns a lazy graph — no RAM spike yet
ndvi_lazy = calculate_ndvi(da)
# Write directly to Zarr; Dask executes chunk-by-chunk
ndvi_lazy.to_zarr("ndvi_timeseries.zarr", mode="w")
Avoid chunking along the band dimension when you plan to compute multiple indices (EVI, NDWI, SAVI) from the same stack — it fragments the task graph and increases scheduler overhead.
Variant 2 — Separate single-band GeoTIFF files
Sentinel-2 data downloaded from the Copernicus Hub arrives as one GeoTIFF per band. Use xr.align to reconcile any minor extent differences before applying the formula.
import xarray as xr
import rioxarray # noqa: F401
nir = xr.open_dataarray("T32TNT_20240601_B08.tif", engine="rasterio").squeeze("band")
red = xr.open_dataarray("T32TNT_20240601_B04.tif", engine="rasterio").squeeze("band")
# align() reindexes both arrays to a common coordinate grid.
# join="inner" clips to the shared extent; "outer" pads with NaN.
nir_aligned, red_aligned = xr.align(nir, red, join="inner")
nir_f = nir_aligned.astype("float32")
red_f = red_aligned.astype("float32")
denom = nir_f + red_f
ndvi = ((nir_f - red_f) / denom.where(denom != 0)).clip(-1, 1)
ndvi.rio.to_raster("ndvi_sentinel2.tif", driver="GTiff", compress="deflate")
Variant 3 — Nodata-aware masking from raster metadata
Raw COG files encode a nodata value (often 0 or 65535). Apply an explicit nodata mask before the NDVI formula so fill pixels do not produce spurious low-NDVI values that contaminate aggregations.
import xarray as xr
import rioxarray # noqa: F401
import numpy as np
da = xr.open_dataarray("s2_b4_b8.tif", engine="rasterio")
# .rio.nodata reads the nodata value recorded in the GeoTIFF metadata
nodata_val = da.rio.nodata # e.g. 0 for Sentinel-2 L2A
# Replace nodata with NaN before any arithmetic
da = da.where(da != nodata_val).astype("float32")
nir = da.sel(band=1).rename({"band": None}) # single-band squeeze
red = da.sel(band=0).rename({"band": None})
denom = nir + red
ndvi = ((nir - red) / denom.where(denom != 0)).clip(-1, 1)
Common errors
FloatingPointError: invalid value encountered in true_divide
This appears when numpy’s floating-point error handling is set to raise (e.g. via np.seterr(invalid='raise')). Fix: either reset to default with np.seterr(invalid='ignore') before the calculation, or use the .where(denominator != 0) pattern shown above to pre-mask zeros.
ValueError: cannot select a dimension that does not exist: 'band'
Your DataArray uses a different dimension name — common variants are 'bands', 'wavelength', or a positional integer index. Print data.dims to inspect, then pass the correct name to .sel(), or rename the dimension first: data = data.rename({"bands": "band"}).
xr.MergeError: cannot broadcast arrays when combining separate band files
The NIR and Red GeoTIFFs have slightly different extents or pixel grids (common with Sentinel-2 10 m vs. 20 m bands). Fix: resample to a common grid using rioxarray’s .rio.reproject_match(reference_da) before calling the formula. See Handling Pixel Resolution and Scaling for a full reprojection workflow.
Related
- Band Math Operations with Xarray — parent page covering the full band-math workflow including EVI, NDWI, and multi-temporal stacking
- Core Raster Fundamentals & STAC Mapping — coordinate systems, COG structure, and STAC ingestion that feed this pipeline
- Handling Pixel Resolution and Scaling — reproject-match two rasters to a shared grid before band arithmetic
- Spectral Index Calculation Pipelines — extend the NDVI pattern to EVI, SAVI, NDWI, and multi-index batch runs
- Querying STAC Catalogs Programmatically — how to fetch the band URIs that feed into
xr.open_dataarray()