How to Clip Rasters to Irregular Polygon Boundaries in Python
Use rasterio.mask.mask() with crop=True and all_touched=True. This reads only the intersecting raster window, burns a precise boolean mask from the vector geometry, and writes a georeferenced GeoTIFF that follows the polygon edge exactly — no rectangular padding, no missing edge pixels.
from rasterio.mask import mask
with rasterio.open("sentinel2_band4.tif") as src:
out_image, out_transform = mask(src, [polygon_geom], crop=True, all_touched=True, nodata=-9999, filled=True)
Why Irregular Boundaries Are a Distinct Problem
Rectangular bounding-box crops are trivial with rasterio.windows.from_bounds. Irregular polygons — watershed divides, national park perimeters, floodplain extents, agricultural field parcels — require a second step: rasterizing the polygon edge to a pixel mask so that cells partially covered by the boundary are handled deliberately rather than silently dropped.
This case arises constantly in remote sensing workflows. Sentinel-2 tiles are 100 × 100 km squares; study areas rarely are. Before any spectral index calculation or cloud and shadow masking can proceed, the scene must be clipped to the exact analysis domain. The clipping step lives inside the broader Automated Image Clipping and Cropping workflow, which is itself one preprocessing stage within Satellite Processing Workflows & Index Pipelines.
The diagram below shows what happens at the polygon edge under the two rasterization rules all_touched=False (default) and all_touched=True.
Environment and Setup
Only the packages this page’s code actually uses:
| Package | Minimum version | Why required |
|---|---|---|
rasterio |
1.3.0 | rasterio.mask.mask() and windowed I/O |
geopandas |
0.14.0 | Reading vector files; make_valid() geometry repair |
shapely |
2.0.0 | Geometry objects accepted by rasterio.mask |
numpy |
1.24.0 | Array inspection after masking |
Install with conda-forge on Windows and macOS to avoid GDAL compilation issues:
conda install -c conda-forge rasterio geopandas shapely numpy
# or on Linux with system GDAL already present:
pip install rasterio>=1.3.0 geopandas>=0.14.0 shapely>=2.0.0 numpy>=1.24.0
GDAL 3.4+ is required for the all_touched rasterization path and for reliable PROJ transformations. Check with python -c "import rasterio; print(rasterio.__gdal_version__)".
Complete Working Example
The function below handles single- and multi-band rasters, validates coordinate systems, repairs malformed geometries, and writes a compressed GeoTIFF with correct spatial metadata.
import rasterio
from rasterio.mask import mask
import geopandas as gpd
import numpy as np
from pathlib import Path
def clip_raster_to_polygon(
raster_path: str,
polygon_path: str,
output_path: str,
all_touched: bool = True,
nodata_fallback: float = -9999.0,
) -> Path:
"""
Clip a raster to an irregular polygon boundary.
Parameters
----------
raster_path : Path to the source GeoTIFF (single or multi-band).
polygon_path : Path to the vector file (GeoJSON, GeoPackage, Shapefile).
output_path : Destination GeoTIFF path.
all_touched : Include pixels touched by the polygon edge (True recommended
for irregular boundaries).
nodata_fallback : Fill value used when the source raster defines no nodata.
Returns
-------
Path to the written output file.
"""
# --- 1. Load vector and repair any invalid geometries ---
gdf = gpd.read_file(polygon_path)
gdf = gdf[~gdf.geometry.is_empty & gdf.geometry.notna()] # drop nulls
gdf["geometry"] = gdf.geometry.make_valid() # fix self-intersections
# Dissolve multiple features into one unified boundary
if len(gdf) > 1:
gdf = gdf.dissolve()
with rasterio.open(raster_path) as src:
# --- 2. Strict CRS check — rasterio.mask does NOT auto-reproject ---
if gdf.crs is None:
raise ValueError("Vector file has no CRS. Assign one before clipping.")
if not src.crs.equals(gdf.crs):
# Reproject vector to raster CRS rather than the reverse;
# reprojecting a small polygon is cheaper than reprojecting a full scene.
gdf = gdf.to_crs(src.crs)
polygon_geom = gdf.geometry.values[0]
# Use the source nodata when present; fall back to the caller-supplied value.
fill_nodata = src.nodata if src.nodata is not None else nodata_fallback
# --- 3. Mask and crop in a single rasterio pass ---
out_image, out_transform = mask(
src,
[polygon_geom], # must be an iterable of geometry objects
crop=True, # trim array to polygon bounding box
all_touched=all_touched, # include edge-intersected pixels
nodata=fill_nodata,
filled=True, # return dense array, not a masked array
)
# --- 4. Build updated output profile ---
out_meta = src.meta.copy()
out_meta.update({
"driver": "GTiff",
"height": out_image.shape[1], # rows after crop
"width": out_image.shape[2], # cols after crop
"transform": out_transform, # new affine — origin shifted to bbox corner
"count": out_image.shape[0], # band count unchanged
"nodata": fill_nodata,
"compress": "deflate", # lossless; drop if downstream tools baulk
})
# --- 5. Write output ---
with rasterio.open(output_path, "w", **out_meta) as dest:
dest.write(out_image)
return Path(output_path)
Validation after clipping
Always re-open the output and assert invariants before passing it to downstream steps such as spectral index calculation or advanced resampling:
def validate_clip_output(output_path: str, source_gdf: gpd.GeoDataFrame) -> None:
with rasterio.open(output_path) as check:
# CRS must match the (possibly reprojected) vector
assert check.crs.equals(source_gdf.crs), (
f"CRS drift: output {check.crs} != vector {source_gdf.crs}"
)
# No dimension should be zero — indicates a non-overlapping geometry
assert check.width > 0 and check.height > 0, "Output has zero-size dimension"
# Quick nodata sanity: the array should not be entirely nodata
with rasterio.open(output_path) as src2:
arr = src2.read(1)
valid_pixels = np.sum(arr != check.nodata)
assert valid_pixels > 0, "All output pixels are nodata — check CRS and polygon overlap"
print(f"OK: {check.width}x{check.height} px, CRS={check.crs.to_epsg()}, "
f"nodata={check.nodata}, valid_pixels={valid_pixels:,}")
Variant Patterns
Variant 1 — Reproject the polygon on the fly (no pre-aligned inputs)
When inputs come from heterogeneous sources (e.g., a field-collected GeoJSON in EPSG:4326 and a Sentinel-2 scene in UTM), reprojecting inside the function is safer than expecting callers to pre-align. The clip_raster_to_polygon function above already handles this in step 2. If you want an explicit standalone helper for CRS mismatch — the root cause described in Fixing EPSG Mismatches in rasterio.open — use:
import rasterio
import geopandas as gpd
from shapely.geometry import mapping
def get_polygon_in_raster_crs(raster_path: str, polygon_path: str) -> list:
"""Return polygon geometry in the raster's CRS as a list suitable for rasterio.mask."""
gdf = gpd.read_file(polygon_path)
with rasterio.open(raster_path) as src:
raster_crs = src.crs
if not gdf.crs.equals(raster_crs):
gdf = gdf.to_crs(raster_crs)
gdf["geometry"] = gdf.geometry.make_valid()
return [mapping(geom) for geom in gdf.geometry]
Pass the returned list directly as the shapes argument to rasterio.mask.mask().
Variant 2 — Clip a multi-polygon (e.g., a protected-area network with holes)
Real administrative boundaries often contain multiple disjoint polygons or polygons with interior holes (donut geometries). Pass all geometries as a list; rasterio.mask unions them internally during rasterization:
import rasterio
from rasterio.mask import mask
import geopandas as gpd
def clip_to_multipolygon(
raster_path: str, polygon_path: str, output_path: str
) -> None:
gdf = gpd.read_file(polygon_path)
with rasterio.open(raster_path) as src:
if not gdf.crs.equals(src.crs):
gdf = gdf.to_crs(src.crs)
# Keep each polygon separate in the list — rasterio takes the union
shapes = [geom for geom in gdf.geometry if geom is not None]
fill_nodata = src.nodata if src.nodata is not None else -9999
out_image, out_transform = mask(
src, shapes, crop=True, all_touched=True,
nodata=fill_nodata, filled=True
)
out_meta = src.meta.copy()
out_meta.update({
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform,
"nodata": fill_nodata,
})
with rasterio.open(output_path, "w", **out_meta) as dest:
dest.write(out_image)
Variant 3 — Memory-safe windowed clip for large scenes
When working with sub-metre drone orthomosaics or uncompressed raw-DN GeoTIFFs that are several gigabytes, avoid loading the full bounding box into RAM before masking. Use rasterio.windows to compute only the window you need, then apply the mask to the windowed read:
import rasterio
from rasterio.mask import mask
from rasterio.windows import from_bounds
import geopandas as gpd
def clip_large_raster(raster_path: str, polygon_path: str, output_path: str) -> None:
gdf = gpd.read_file(polygon_path)
with rasterio.open(raster_path) as src:
if not gdf.crs.equals(src.crs):
gdf = gdf.to_crs(src.crs)
# Compute the window that covers the polygon bounding box
minx, miny, maxx, maxy = gdf.total_bounds
window = from_bounds(minx, miny, maxx, maxy, transform=src.transform)
windowed_transform = src.window_transform(window)
# Read only that window — avoids loading the whole scene
windowed_data = src.read(window=window)
fill_nodata = src.nodata if src.nodata is not None else -9999
# Now mask the windowed sub-array
# We need a temporary in-memory dataset for rasterio.mask to consume
import tempfile, os
from rasterio.io import MemoryFile
sub_meta = src.meta.copy()
sub_meta.update({
"height": windowed_data.shape[1],
"width": windowed_data.shape[2],
"transform": windowed_transform,
})
with MemoryFile() as memfile:
with memfile.open(**sub_meta) as mem_ds:
mem_ds.write(windowed_data)
with memfile.open() as mem_ds:
shapes = list(gdf.geometry)
out_image, out_transform = mask(
mem_ds, shapes, crop=True, all_touched=True,
nodata=fill_nodata, filled=True
)
out_meta.update({
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform,
"nodata": fill_nodata,
"compress": "deflate",
})
with rasterio.open(output_path, "w", **out_meta) as dest:
dest.write(out_image)
Common Errors
ValueError: Input shapes do not overlap raster
The polygon bounding box does not intersect the raster extent. Confirm both are in the same CRS (print(src.crs, gdf.crs)) and that gdf.total_bounds falls within src.bounds. The most frequent cause is forgetting to call gdf.to_crs(src.crs) before masking.
rasterio.errors.WindowError: Window is out of bounds
This surfaces in the windowed variant when the computed window extends beyond the raster edges. Clamp the window with window.intersection(src.window(src.bounds)) before reading.
Output is entirely nodata even though CRS looks correct
Usually caused by a polygon dissolved to an empty geometry after make_valid() strips away degenerate rings. Print gdf.geometry.is_empty to identify the bad row, then inspect the source geometry with a desktop GIS tool. Alternatively, a sub-pixel polygon (feature smaller than one raster cell) produces zero included pixels under all_touched=False; switching to all_touched=True resolves it.
Related
- Automated Image Clipping and Cropping — the parent workflow covering CRS validation, batch orchestration, and memory-efficient masking patterns.
- Satellite Processing Workflows & Index Pipelines — the full preprocessing-to-index pipeline context this clipping step feeds into.
- Spectral Index Calculation Pipelines — compute NDVI, EVI, and NBR on the clipped arrays produced by this workflow.
- Fixing EPSG Mismatches in rasterio.open — diagnose and repair CRS mismatches that cause silent empty outputs when clipping.
- Choosing the Right Resampling Method for Sentinel-2 — apply after clipping when pixel grids need to be standardised across heterogeneous sensor inputs.