Mastering CRS Transformations in Rasterio

Mismatched projections are the most common source of silent data corruption in multi-source raster pipelines. When satellite imagery, LiDAR derivatives, and climate grids each arrive in a different Coordinate Reference System (CRS), pixel alignment breaks, area calculations are distorted, and downstream index results become meaningless. For the full architectural context of where CRS management fits in Python raster work, see Core Raster Fundamentals & STAC Mapping.

This page walks through a production-ready reprojection workflow in rasterio: from validating source metadata and normalizing CRS objects, to computing the correct affine transform, selecting a resampling kernel, and writing a Cloud-Optimized GeoTIFF output that is ready for streaming.


Prerequisites

Install dependencies

pip install "rasterio>=1.3.0" "pyproj>=3.0.0" "numpy>=1.21"

Library version table

Library Minimum version Why required
rasterio 1.3.0 calculate_default_transform parameter API stabilised in 1.3
pyproj 3.0.0 Required by rasterio’s CRS backend; provides CRS.from_user_input
numpy 1.21 Masked-array behaviour used during band-isolated warp
PROJ (system) 7.2 Datum grid shifts and accurate datum transformations

Conceptual prerequisites

GDAL environment configuration

Set these variables before importing rasterio to control memory allocation, threading, and datum-grid downloads:

import os

os.environ["GDAL_NUM_THREADS"] = "ALL_CPUS"   # parallelise warp kernel
os.environ["GDAL_CACHEMAX"] = "1024"           # MB of GDAL block cache
os.environ["PROJ_NETWORK"] = "ON"              # fetch remote datum grids when missing
os.environ["CPL_VSIL_CURL_ALLOWED_EXTENSIONS"] = ".tif,.tiff,.vrt"

Reprojection pipeline: how the steps connect

The six steps below form a deterministic sequence. Skipping validation or relying on implicit defaults frequently results in shifted rasters, clipped extents, or corrupted nodata masks.

CRS reprojection pipeline Six sequential boxes connected by arrows: Validate CRS, Normalize target CRS, Compute transform, Warp bands, Write COG output, Verify integrity ① Validate ② Normalize ③ Compute ④ Warp ⑤ Write COG ⑥ Verify source CRS target CRS transform + dimensions band-by-band tiled + overviews bounds + stats

Step-by-step workflow

Step 1 — Validate source metadata

Before computing any target geometry, confirm that the source file contains a valid CRS and a coherent affine transform. Legacy files may store the projection as a malformed WKT string, carry a None transform, or have conflicting EPSG codes embedded in different metadata fields.

import rasterio
from rasterio.crs import CRS

def inspect_source(src_path: str) -> dict:
    """
    Return a summary of CRS and affine metadata for diagnostic purposes.
    Raises ValueError for files that cannot safely be reprojected.
    """
    with rasterio.open(src_path) as src:
        if src.crs is None:
            raise ValueError(
                f"{src_path}: no CRS found. "
                "Use gdal_edit.py -a_srs EPSG:XXXX to assign one."
            )
        if src.transform is None or src.transform == rasterio.transform.IDENTITY:
            raise ValueError(
                f"{src_path}: identity or missing affine transform — "
                "pixel-to-world mapping cannot be determined."
            )
        return {
            "crs_wkt": src.crs.to_wkt(),
            "epsg": src.crs.to_epsg(),
            "transform": src.transform,
            "width": src.width,
            "height": src.height,
            "bounds": src.bounds,
            "nodata": src.nodata,
            "dtype": src.dtypes,
        }

When you encounter ambiguous or conflicting EPSG codes — a common issue with older Landsat scenes and exported QGIS files — the dedicated troubleshooting patterns in Fixing EPSG mismatches in rasterio.open walk through each diagnostic step.

Step 2 — Normalize the target CRS

Accept any CRS representation the caller may provide (EPSG integer, PROJ4 string, WKT, or an existing CRS object) and normalise it to a single canonical form. Comparing source and target CRS objects directly avoids redundant no-op warp operations.

from rasterio.crs import CRS

def resolve_crs(user_input: int | str | CRS) -> CRS:
    """
    Convert any CRS specification into a rasterio CRS object.
    Raises CRSError for unrecognised strings.
    """
    if isinstance(user_input, CRS):
        return user_input
    return CRS.from_user_input(user_input)

# Example: all three are equivalent
crs_a = resolve_crs(32633)               # UTM zone 33N
crs_b = resolve_crs("EPSG:32633")
crs_c = resolve_crs(CRS.from_epsg(32633))
assert crs_a == crs_b == crs_c

Step 3 — Compute destination transform and dimensions

calculate_default_transform derives the exact affine matrix and pixel grid dimensions that preserve the full spatial extent of the source without padding or clipping. Always pass explicit bounds rather than relying on the convenience overload that reads them from the open file handle, because explicit bounds are reproducible and testable.

from rasterio.warp import calculate_default_transform

def compute_target_geometry(src_path: str, dst_crs: CRS, resolution: tuple | None = None):
    """
    Return (transform, width, height) for the reprojected raster.
    Pass resolution=(x_res, y_res) in target CRS units to fix the output GSD.
    """
    with rasterio.open(src_path) as src:
        dst_transform, dst_width, dst_height = calculate_default_transform(
            src_crs=src.crs,
            dst_crs=dst_crs,
            width=src.width,
            height=src.height,
            left=src.bounds.left,
            bottom=src.bounds.bottom,
            right=src.bounds.right,
            top=src.bounds.top,
            resolution=resolution,   # None → preserve source GSD in target units
        )
    return dst_transform, dst_width, dst_height

Omitting resolution lets rasterio infer a pixel size that minimises area distortion at the centre of the scene. Pass an explicit resolution tuple whenever you need a fixed ground sampling distance — for example when aligning Sentinel-2 10 m bands with 20 m bands before band math operations with xarray.

Step 4 — Execute the warp

Reproject each band individually using rasterio.warp.reproject. Reading with masked=True preserves the nodata boundary during interpolation and prevents edge fill-values from bleeding into valid pixels.

import numpy as np
from rasterio.warp import reproject, Resampling

def warp_band(
    src,           # open rasterio DatasetReader
    band_index: int,
    dst_transform,
    dst_crs: CRS,
    dst_width: int,
    dst_height: int,
    resampling: Resampling,
    dst_nodata: float | None,
) -> np.ndarray:
    """
    Reproject a single band and return the destination array.
    """
    src_band = src.read(band_index, masked=True)
    dst_band = np.full(
        (dst_height, dst_width),
        fill_value=dst_nodata if dst_nodata is not None else src.nodata,
        dtype=src.dtypes[band_index - 1],
    )

    reproject(
        source=src_band,
        destination=dst_band,
        src_transform=src.transform,
        src_crs=src.crs,
        dst_transform=dst_transform,
        dst_crs=dst_crs,
        resampling=resampling,
        src_nodata=src.nodata,
        dst_nodata=dst_nodata if dst_nodata is not None else src.nodata,
    )
    return dst_band

Step 5 — Write COG-compliant output

Update the dataset profile with the new CRS, transform, and dimensions, then write each reprojected band. Build internal overviews before closing the file so the output is immediately streamable as a Cloud-Optimized GeoTIFF.

import logging
import rasterio
from rasterio.warp import calculate_default_transform, Resampling
from rasterio.crs import CRS
import numpy as np

logger = logging.getLogger(__name__)

def reproject_raster(
    src_path: str,
    dst_path: str,
    dst_crs: str | int | CRS,
    resolution: tuple[float, float] | None = None,
    resampling: Resampling = Resampling.bilinear,
    dst_nodata: float | None = None,
    compress: str = "lzw",
) -> None:
    """
    Reproject src_path to dst_crs and write a tiled, overview-bearing GeoTIFF.

    Args:
        src_path:   Path to the source raster.
        dst_path:   Path for the reprojected output.
        dst_crs:    Target CRS — EPSG int, PROJ string, WKT, or CRS object.
        resolution: Optional (x_res, y_res) in target CRS units.
        resampling: Resampling kernel — use Resampling.nearest for categorical data.
        dst_nodata: Override nodata for the output; inherits from source if None.
        compress:   GDAL compression driver (lzw, deflate, zstd).
    """
    target_crs = CRS.from_user_input(dst_crs)

    with rasterio.open(src_path) as src:
        if src.crs is None:
            raise ValueError("Source dataset lacks a valid CRS definition.")

        dst_transform, dst_width, dst_height = calculate_default_transform(
            src_crs=src.crs,
            dst_crs=target_crs,
            width=src.width,
            height=src.height,
            left=src.bounds.left,
            bottom=src.bounds.bottom,
            right=src.bounds.right,
            top=src.bounds.top,
            resolution=resolution,
        )

        effective_nodata = dst_nodata if dst_nodata is not None else src.nodata

        profile = src.profile.copy()
        profile.update({
            "crs": target_crs,
            "transform": dst_transform,
            "width": dst_width,
            "height": dst_height,
            "nodata": effective_nodata,
            "compress": compress,
            "tiled": True,
            "blockxsize": 256,
            "blockysize": 256,
            "interleave": "band",
        })

        with rasterio.open(dst_path, "w", **profile) as dst:
            for i in range(1, src.count + 1):
                src_band = src.read(i, masked=True)
                dst_band = np.full(
                    (dst_height, dst_width),
                    fill_value=effective_nodata,
                    dtype=src.dtypes[i - 1],
                )
                from rasterio.warp import reproject
                reproject(
                    source=src_band,
                    destination=dst_band,
                    src_transform=src.transform,
                    src_crs=src.crs,
                    dst_transform=dst_transform,
                    dst_crs=target_crs,
                    resampling=resampling,
                    src_nodata=src.nodata,
                    dst_nodata=effective_nodata,
                )
                dst.write(dst_band, i)

            # Build internal overviews — required for COG compliance
            dst.build_overviews([2, 4, 8, 16], Resampling.average)
            dst.update_tags(ns="rio_overview", resampling="AVERAGE")

    logger.info("Reprojected %s → %s (%s)", src_path, dst_path, target_crs.to_epsg())

Parameter reference

Parameter Type Default Usage note
dst_crs str | int | CRS Required. Accepts EPSG int, "EPSG:XXXX", WKT, PROJ4 dict, or CRS object
resolution tuple[float, float] | None None Pin output pixel size in target CRS units. None infers GSD from source
resampling Resampling bilinear See resampling table below. Use nearest for categorical data
dst_nodata float | None None Override nodata fill. Inherits from source when None
compress str "lzw" Passed directly to the GDAL driver. deflate or zstd for higher ratios
blockxsize / blockysize int 256 COG tile dimensions. 512 can improve throughput on high-bandwidth storage
interleave str "band" Band-sequential layout; use "pixel" for RGB display outputs only

Resampling kernel reference

Kernel Resampling enum Use case Notes
Nearest neighbour nearest Land cover, crop maps, any categorical raster Preserves original class values; no interpolation
Bilinear bilinear Reflectance, temperature, elevation 2×2 weighted average; smooths gradients
Cubic cubic High-resolution display imagery 4×4 cubic convolution; sharpest edges
Average average Downscaling / overview generation Mean of all contributing source pixels
Mode mode Discrete classes at coarser resolution Returns the most frequent value in the footprint

For continuous environmental variables, bilinear or cubic resampling preserves spectral continuity. However, when preparing data for band math operations with xarray, confirm that resampling has not introduced dynamic-range drift that would skew index calculations such as NDVI.


Verification and testing

Automated integrity check

Run this immediately after the warp. Relying on visual inspection alone misses subtle geometric drift that accumulates across multi-stage pipelines.

import rasterio
import numpy as np
from rasterio.crs import CRS


def validate_reprojection(
    src_path: str,
    dst_path: str,
    expected_crs: str | int | CRS,
    stat_tolerance: float = 0.005,
) -> dict:
    """
    Compare source and destination rasters. Raises AssertionError on failure.
    Returns a dict of key metrics for logging.
    """
    target_crs = CRS.from_user_input(expected_crs)

    with rasterio.open(src_path) as src, rasterio.open(dst_path) as dst:
        # CRS identity check
        assert dst.crs == target_crs, (
            f"CRS mismatch: expected {target_crs.to_epsg()}, "
            f"got {dst.crs.to_epsg()}"
        )

        # Pixel size sanity — destination must have a non-identity transform
        assert dst.transform.a > 0, "Destination x-pixel size is zero or negative"
        assert dst.transform.e < 0, "Destination y-pixel size is zero or positive (north-up required)"

        # Statistical preservation for band 1
        src_data = src.read(1, masked=True).compressed()
        dst_data = dst.read(1, masked=True).compressed()

        src_mean = float(np.mean(src_data)) if src_data.size else 0.0
        dst_mean = float(np.mean(dst_data)) if dst_data.size else 0.0

        if src_mean != 0:
            drift = abs(dst_mean - src_mean) / abs(src_mean)
            assert drift < stat_tolerance, (
                f"Statistical drift {drift:.4%} exceeds tolerance {stat_tolerance:.4%}. "
                "Check resampling kernel and nodata handling."
            )

        return {
            "src_crs": src.crs.to_epsg(),
            "dst_crs": dst.crs.to_epsg(),
            "src_bounds": dict(src.bounds._asdict()),
            "dst_bounds": dict(dst.bounds._asdict()),
            "src_transform": list(src.transform)[:6],
            "dst_transform": list(dst.transform)[:6],
            "band1_mean_src": src_mean,
            "band1_mean_dst": dst_mean,
            "nodata_preserved": src.nodata == dst.nodata,
        }

What to check in the output

  1. CRS alignmentdst.crs.to_epsg() matches the intended target.
  2. Transform pixel sizedst.transform.a equals your intended x-resolution; abs(dst.transform.e) equals y-resolution.
  3. Extent coverage — destination bounds encompass the projected source footprint. A significantly smaller extent indicates the antimeridian clipping problem described in the Troubleshooting section below.
  4. Statistical integrity — band mean, min, and max remain within ±0.5% for continuous data; exact match for categorical rasters.
  5. Internal overviewsdst.overviews(1) returns [2, 4, 8, 16]; an empty list means COG compliance failed.

Troubleshooting

PROJ_NETWORK required for datum shifts

Symptom: Coordinates shift by tens of metres after reprojection between NAD27 and WGS84 or between national datum systems.

Cause: Sub-metre-accurate datum transformations require NTv2 or VERTCON grid shift files (.tif / .gsb) that are not bundled with every PROJ installation.

Fix: Set PROJ_NETWORK=ON before importing rasterio so PROJ can download missing grids from the CDN on first use. Alternatively, install the proj-data package for offline availability.

import os
os.environ["PROJ_NETWORK"] = "ON"
import rasterio  # must be imported AFTER setting the env var

Output extent smaller than expected (antimeridian clipping)

Symptom: Reprojecting a global dataset to Web Mercator (EPSG:3857) produces an output that is half the expected width, or the eastern hemisphere disappears.

Cause: calculate_default_transform does not split geometries that cross ±180°. Bounding boxes that span the antimeridian are passed as-is and get incorrectly clipped.

Fix: Split the source into eastern and western hemispheres before reprojection, or reproject to an equal-area global CRS (e.g. EPSG:6933) that does not have a singularity at the antimeridian.

Resolution drift on large reprojections

Symptom: The reprojected raster has more or fewer pixels than expected, and the pixel footprint on the ground varies across the scene.

Cause: When resolution is None, calculate_default_transform selects a pixel size that minimises distortion at the scene centre. Over large extents or oblique projections the effective GSD varies.

Fix: Pass an explicit resolution=(x_res, y_res) in target CRS units. This enforces a constant ground sampling distance but may expand the output extent slightly to accommodate an integer number of pixels.

NoCRSError on legacy files

Symptom: rasterio.open succeeds but src.crs is None, causing calculate_default_transform to raise CRSError.

Cause: The file was written without an embedded spatial reference — common with older GeoTIFFs, raw binary rasters, or files exported from non-GIS software.

Fix: Assign the CRS externally using gdal_edit.py -a_srs EPSG:XXXX <file>, or pass it explicitly when writing the destination profile.

Memory exhaustion on wide multi-band rasters

Symptom: MemoryError or OOM kill when warping a 10-band, 10000×10000 raster.

Cause: Reading all bands simultaneously before warping allocates the entire source array in memory.

Fix: Reproject band by band (as shown in the workflow above), or use windowed reads to process spatial tiles independently. Windowed reprojection is covered in optimizing rasterio window reads for memory efficiency.


Advanced considerations for global datasets

Polar projections, antimeridian crossings, and multi-datum archives introduce complications beyond the standard workflow.

Polar stereographic projections (e.g. NSIDC EPSG:3413 or EPSG:3976) define pixel size in metres near the pole; calculate_default_transform handles them correctly, but the output extent in degrees can be misleading when you inspect dst.bounds. Always inspect the bounds in the projected CRS, not after converting back to WGS84.

Datum transformations — EPSG codes alone are insufficient when shifting between legacy national datums. Always verify whether a grid shift file is required for sub-metre accuracy before running production pipelines over archival datasets.

Resolution pinning across sensors — when stacking Sentinel-2 10 m, 20 m, and 60 m bands for a spectral index calculation pipeline, reproject all bands to a shared UTM CRS with a fixed resolution=(10, 10) before any band math. A single consistent affine transform across the stack eliminates pixel-offset errors that otherwise skew index results.


FAQ

Why does rasterio raise NoCRSError even though my file has a projection?

The file likely stores the CRS as an embedded WKT string that PROJ cannot parse, often because GDAL was built against an older PROJ version. Open the file with rasterio.open and print src.crs — if it is None, pass crs=CRS.from_epsg(XXXX) explicitly to the write profile or repair the file with gdal_edit.py -a_srs EPSG:XXXX.

Should I reproject to WGS84 or keep UTM for area calculations?

Keep a projected CRS such as UTM or equal-area (e.g. EPSG:6933 EASE-Grid 2.0) for area calculations — geographic CRS like WGS84 (EPSG:4326) uses angular degrees, which makes pixel area vary with latitude. Reproject to WGS84 only for display, tile serving, or export to formats that require geographic coordinates.

What resampling method should I use for a land-cover classification raster?

Always use nearest-neighbour resampling for categorical datasets such as land cover or crop type maps. Bilinear or cubic interpolation blends class labels and produces fractional values that do not correspond to any real category.


Deep-Dive Articles