How to Clip Rasters to Irregular Polygon Boundaries in Python

Use rasterio.mask.mask() with crop=True and all_touched=True. This reads only the intersecting raster window, burns a precise boolean mask from the vector geometry, and writes a georeferenced GeoTIFF that follows the polygon edge exactly — no rectangular padding, no missing edge pixels.

from rasterio.mask import mask
with rasterio.open("sentinel2_band4.tif") as src:
    out_image, out_transform = mask(src, [polygon_geom], crop=True, all_touched=True, nodata=-9999, filled=True)

Why Irregular Boundaries Are a Distinct Problem

Rectangular bounding-box crops are trivial with rasterio.windows.from_bounds. Irregular polygons — watershed divides, national park perimeters, floodplain extents, agricultural field parcels — require a second step: rasterizing the polygon edge to a pixel mask so that cells partially covered by the boundary are handled deliberately rather than silently dropped.

This case arises constantly in remote sensing workflows. Sentinel-2 tiles are 100 × 100 km squares; study areas rarely are. Before any spectral index calculation or cloud and shadow masking can proceed, the scene must be clipped to the exact analysis domain. The clipping step lives inside the broader Automated Image Clipping and Cropping workflow, which is itself one preprocessing stage within Satellite Processing Workflows & Index Pipelines.

The diagram below shows what happens at the polygon edge under the two rasterization rules all_touched=False (default) and all_touched=True.

all_touched rasterization: default vs enabled Left panel shows a jagged polygon boundary with only centre-included pixels shaded, leaving gaps at the edge. Right panel shows the same boundary with all intersected pixels shaded, closing the gaps. all_touched=False (default) all_touched=True gap filled included pixel polygon edge

Environment and Setup

Only the packages this page’s code actually uses:

Package Minimum version Why required
rasterio 1.3.0 rasterio.mask.mask() and windowed I/O
geopandas 0.14.0 Reading vector files; make_valid() geometry repair
shapely 2.0.0 Geometry objects accepted by rasterio.mask
numpy 1.24.0 Array inspection after masking

Install with conda-forge on Windows and macOS to avoid GDAL compilation issues:

conda install -c conda-forge rasterio geopandas shapely numpy
# or on Linux with system GDAL already present:
pip install rasterio>=1.3.0 geopandas>=0.14.0 shapely>=2.0.0 numpy>=1.24.0

GDAL 3.4+ is required for the all_touched rasterization path and for reliable PROJ transformations. Check with python -c "import rasterio; print(rasterio.__gdal_version__)".

Complete Working Example

The function below handles single- and multi-band rasters, validates coordinate systems, repairs malformed geometries, and writes a compressed GeoTIFF with correct spatial metadata.

import rasterio
from rasterio.mask import mask
import geopandas as gpd
import numpy as np
from pathlib import Path


def clip_raster_to_polygon(
    raster_path: str,
    polygon_path: str,
    output_path: str,
    all_touched: bool = True,
    nodata_fallback: float = -9999.0,
) -> Path:
    """
    Clip a raster to an irregular polygon boundary.

    Parameters
    ----------
    raster_path      : Path to the source GeoTIFF (single or multi-band).
    polygon_path     : Path to the vector file (GeoJSON, GeoPackage, Shapefile).
    output_path      : Destination GeoTIFF path.
    all_touched      : Include pixels touched by the polygon edge (True recommended
                       for irregular boundaries).
    nodata_fallback  : Fill value used when the source raster defines no nodata.

    Returns
    -------
    Path to the written output file.
    """
    # --- 1. Load vector and repair any invalid geometries ---
    gdf = gpd.read_file(polygon_path)
    gdf = gdf[~gdf.geometry.is_empty & gdf.geometry.notna()]  # drop nulls
    gdf["geometry"] = gdf.geometry.make_valid()               # fix self-intersections

    # Dissolve multiple features into one unified boundary
    if len(gdf) > 1:
        gdf = gdf.dissolve()

    with rasterio.open(raster_path) as src:
        # --- 2. Strict CRS check — rasterio.mask does NOT auto-reproject ---
        if gdf.crs is None:
            raise ValueError("Vector file has no CRS. Assign one before clipping.")

        if not src.crs.equals(gdf.crs):
            # Reproject vector to raster CRS rather than the reverse;
            # reprojecting a small polygon is cheaper than reprojecting a full scene.
            gdf = gdf.to_crs(src.crs)

        polygon_geom = gdf.geometry.values[0]

        # Use the source nodata when present; fall back to the caller-supplied value.
        fill_nodata = src.nodata if src.nodata is not None else nodata_fallback

        # --- 3. Mask and crop in a single rasterio pass ---
        out_image, out_transform = mask(
            src,
            [polygon_geom],     # must be an iterable of geometry objects
            crop=True,          # trim array to polygon bounding box
            all_touched=all_touched,  # include edge-intersected pixels
            nodata=fill_nodata,
            filled=True,        # return dense array, not a masked array
        )

        # --- 4. Build updated output profile ---
        out_meta = src.meta.copy()
        out_meta.update({
            "driver": "GTiff",
            "height": out_image.shape[1],    # rows after crop
            "width": out_image.shape[2],     # cols after crop
            "transform": out_transform,      # new affine — origin shifted to bbox corner
            "count": out_image.shape[0],     # band count unchanged
            "nodata": fill_nodata,
            "compress": "deflate",           # lossless; drop if downstream tools baulk
        })

        # --- 5. Write output ---
        with rasterio.open(output_path, "w", **out_meta) as dest:
            dest.write(out_image)

    return Path(output_path)

Validation after clipping

Always re-open the output and assert invariants before passing it to downstream steps such as spectral index calculation or advanced resampling:

def validate_clip_output(output_path: str, source_gdf: gpd.GeoDataFrame) -> None:
    with rasterio.open(output_path) as check:
        # CRS must match the (possibly reprojected) vector
        assert check.crs.equals(source_gdf.crs), (
            f"CRS drift: output {check.crs} != vector {source_gdf.crs}"
        )
        # No dimension should be zero — indicates a non-overlapping geometry
        assert check.width > 0 and check.height > 0, "Output has zero-size dimension"
        # Quick nodata sanity: the array should not be entirely nodata
        with rasterio.open(output_path) as src2:
            arr = src2.read(1)
        valid_pixels = np.sum(arr != check.nodata)
        assert valid_pixels > 0, "All output pixels are nodata — check CRS and polygon overlap"
        print(f"OK: {check.width}x{check.height} px, CRS={check.crs.to_epsg()}, "
              f"nodata={check.nodata}, valid_pixels={valid_pixels:,}")

Variant Patterns

Variant 1 — Reproject the polygon on the fly (no pre-aligned inputs)

When inputs come from heterogeneous sources (e.g., a field-collected GeoJSON in EPSG:4326 and a Sentinel-2 scene in UTM), reprojecting inside the function is safer than expecting callers to pre-align. The clip_raster_to_polygon function above already handles this in step 2. If you want an explicit standalone helper for CRS mismatch — the root cause described in Fixing EPSG Mismatches in rasterio.open — use:

import rasterio
import geopandas as gpd
from shapely.geometry import mapping


def get_polygon_in_raster_crs(raster_path: str, polygon_path: str) -> list:
    """Return polygon geometry in the raster's CRS as a list suitable for rasterio.mask."""
    gdf = gpd.read_file(polygon_path)
    with rasterio.open(raster_path) as src:
        raster_crs = src.crs

    if not gdf.crs.equals(raster_crs):
        gdf = gdf.to_crs(raster_crs)

    gdf["geometry"] = gdf.geometry.make_valid()
    return [mapping(geom) for geom in gdf.geometry]

Pass the returned list directly as the shapes argument to rasterio.mask.mask().

Variant 2 — Clip a multi-polygon (e.g., a protected-area network with holes)

Real administrative boundaries often contain multiple disjoint polygons or polygons with interior holes (donut geometries). Pass all geometries as a list; rasterio.mask unions them internally during rasterization:

import rasterio
from rasterio.mask import mask
import geopandas as gpd


def clip_to_multipolygon(
    raster_path: str, polygon_path: str, output_path: str
) -> None:
    gdf = gpd.read_file(polygon_path)

    with rasterio.open(raster_path) as src:
        if not gdf.crs.equals(src.crs):
            gdf = gdf.to_crs(src.crs)

        # Keep each polygon separate in the list — rasterio takes the union
        shapes = [geom for geom in gdf.geometry if geom is not None]
        fill_nodata = src.nodata if src.nodata is not None else -9999

        out_image, out_transform = mask(
            src, shapes, crop=True, all_touched=True,
            nodata=fill_nodata, filled=True
        )

        out_meta = src.meta.copy()
        out_meta.update({
            "height": out_image.shape[1],
            "width": out_image.shape[2],
            "transform": out_transform,
            "nodata": fill_nodata,
        })

        with rasterio.open(output_path, "w", **out_meta) as dest:
            dest.write(out_image)

Variant 3 — Memory-safe windowed clip for large scenes

When working with sub-metre drone orthomosaics or uncompressed raw-DN GeoTIFFs that are several gigabytes, avoid loading the full bounding box into RAM before masking. Use rasterio.windows to compute only the window you need, then apply the mask to the windowed read:

import rasterio
from rasterio.mask import mask
from rasterio.windows import from_bounds
import geopandas as gpd


def clip_large_raster(raster_path: str, polygon_path: str, output_path: str) -> None:
    gdf = gpd.read_file(polygon_path)

    with rasterio.open(raster_path) as src:
        if not gdf.crs.equals(src.crs):
            gdf = gdf.to_crs(src.crs)

        # Compute the window that covers the polygon bounding box
        minx, miny, maxx, maxy = gdf.total_bounds
        window = from_bounds(minx, miny, maxx, maxy, transform=src.transform)
        windowed_transform = src.window_transform(window)

        # Read only that window — avoids loading the whole scene
        windowed_data = src.read(window=window)

        fill_nodata = src.nodata if src.nodata is not None else -9999

        # Now mask the windowed sub-array
        # We need a temporary in-memory dataset for rasterio.mask to consume
        import tempfile, os
        from rasterio.io import MemoryFile

        sub_meta = src.meta.copy()
        sub_meta.update({
            "height": windowed_data.shape[1],
            "width": windowed_data.shape[2],
            "transform": windowed_transform,
        })

        with MemoryFile() as memfile:
            with memfile.open(**sub_meta) as mem_ds:
                mem_ds.write(windowed_data)

            with memfile.open() as mem_ds:
                shapes = list(gdf.geometry)
                out_image, out_transform = mask(
                    mem_ds, shapes, crop=True, all_touched=True,
                    nodata=fill_nodata, filled=True
                )

        out_meta.update({
            "height": out_image.shape[1],
            "width": out_image.shape[2],
            "transform": out_transform,
            "nodata": fill_nodata,
            "compress": "deflate",
        })

        with rasterio.open(output_path, "w", **out_meta) as dest:
            dest.write(out_image)

Common Errors

ValueError: Input shapes do not overlap raster The polygon bounding box does not intersect the raster extent. Confirm both are in the same CRS (print(src.crs, gdf.crs)) and that gdf.total_bounds falls within src.bounds. The most frequent cause is forgetting to call gdf.to_crs(src.crs) before masking.

rasterio.errors.WindowError: Window is out of bounds This surfaces in the windowed variant when the computed window extends beyond the raster edges. Clamp the window with window.intersection(src.window(src.bounds)) before reading.

Output is entirely nodata even though CRS looks correct Usually caused by a polygon dissolved to an empty geometry after make_valid() strips away degenerate rings. Print gdf.geometry.is_empty to identify the bad row, then inspect the source geometry with a desktop GIS tool. Alternatively, a sub-pixel polygon (feature smaller than one raster cell) produces zero included pixels under all_touched=False; switching to all_touched=True resolves it.