Mastering CRS Transformations in Rasterio
Mismatched projections are the most common source of silent data corruption in multi-source raster pipelines. When satellite imagery, LiDAR derivatives, and climate grids each arrive in a different Coordinate Reference System (CRS), pixel alignment breaks, area calculations are distorted, and downstream index results become meaningless. For the full architectural context of where CRS management fits in Python raster work, see Core Raster Fundamentals & STAC Mapping.
This page walks through a production-ready reprojection workflow in rasterio: from validating source metadata and normalizing CRS objects, to computing the correct affine transform, selecting a resampling kernel, and writing a Cloud-Optimized GeoTIFF output that is ready for streaming.
Prerequisites
Install dependencies
pip install "rasterio>=1.3.0" "pyproj>=3.0.0" "numpy>=1.21"
Library version table
| Library | Minimum version | Why required |
|---|---|---|
rasterio |
1.3.0 | calculate_default_transform parameter API stabilised in 1.3 |
pyproj |
3.0.0 | Required by rasterio’s CRS backend; provides CRS.from_user_input |
numpy |
1.21 | Masked-array behaviour used during band-isolated warp |
| PROJ (system) | 7.2 | Datum grid shifts and accurate datum transformations |
Conceptual prerequisites
- Understand how the affine geotransform governs pixel resolution and scaling — reprojection replaces this matrix entirely, so you must know what each coefficient means before you override it.
- Know how to extract and parse raster metadata from source files, because improper CRS extraction is the most common failure point in automated ingestion pipelines.
- Understand the COG tiling and overview structure — block alignment and internal overviews interact directly with warp output and must be explicitly set.
GDAL environment configuration
Set these variables before importing rasterio to control memory allocation, threading, and datum-grid downloads:
import os
os.environ["GDAL_NUM_THREADS"] = "ALL_CPUS" # parallelise warp kernel
os.environ["GDAL_CACHEMAX"] = "1024" # MB of GDAL block cache
os.environ["PROJ_NETWORK"] = "ON" # fetch remote datum grids when missing
os.environ["CPL_VSIL_CURL_ALLOWED_EXTENSIONS"] = ".tif,.tiff,.vrt"
Reprojection pipeline: how the steps connect
The six steps below form a deterministic sequence. Skipping validation or relying on implicit defaults frequently results in shifted rasters, clipped extents, or corrupted nodata masks.
Step-by-step workflow
Step 1 — Validate source metadata
Before computing any target geometry, confirm that the source file contains a valid CRS and a coherent affine transform. Legacy files may store the projection as a malformed WKT string, carry a None transform, or have conflicting EPSG codes embedded in different metadata fields.
import rasterio
from rasterio.crs import CRS
def inspect_source(src_path: str) -> dict:
"""
Return a summary of CRS and affine metadata for diagnostic purposes.
Raises ValueError for files that cannot safely be reprojected.
"""
with rasterio.open(src_path) as src:
if src.crs is None:
raise ValueError(
f"{src_path}: no CRS found. "
"Use gdal_edit.py -a_srs EPSG:XXXX to assign one."
)
if src.transform is None or src.transform == rasterio.transform.IDENTITY:
raise ValueError(
f"{src_path}: identity or missing affine transform — "
"pixel-to-world mapping cannot be determined."
)
return {
"crs_wkt": src.crs.to_wkt(),
"epsg": src.crs.to_epsg(),
"transform": src.transform,
"width": src.width,
"height": src.height,
"bounds": src.bounds,
"nodata": src.nodata,
"dtype": src.dtypes,
}
When you encounter ambiguous or conflicting EPSG codes — a common issue with older Landsat scenes and exported QGIS files — the dedicated troubleshooting patterns in Fixing EPSG mismatches in rasterio.open walk through each diagnostic step.
Step 2 — Normalize the target CRS
Accept any CRS representation the caller may provide (EPSG integer, PROJ4 string, WKT, or an existing CRS object) and normalise it to a single canonical form. Comparing source and target CRS objects directly avoids redundant no-op warp operations.
from rasterio.crs import CRS
def resolve_crs(user_input: int | str | CRS) -> CRS:
"""
Convert any CRS specification into a rasterio CRS object.
Raises CRSError for unrecognised strings.
"""
if isinstance(user_input, CRS):
return user_input
return CRS.from_user_input(user_input)
# Example: all three are equivalent
crs_a = resolve_crs(32633) # UTM zone 33N
crs_b = resolve_crs("EPSG:32633")
crs_c = resolve_crs(CRS.from_epsg(32633))
assert crs_a == crs_b == crs_c
Step 3 — Compute destination transform and dimensions
calculate_default_transform derives the exact affine matrix and pixel grid dimensions that preserve the full spatial extent of the source without padding or clipping. Always pass explicit bounds rather than relying on the convenience overload that reads them from the open file handle, because explicit bounds are reproducible and testable.
from rasterio.warp import calculate_default_transform
def compute_target_geometry(src_path: str, dst_crs: CRS, resolution: tuple | None = None):
"""
Return (transform, width, height) for the reprojected raster.
Pass resolution=(x_res, y_res) in target CRS units to fix the output GSD.
"""
with rasterio.open(src_path) as src:
dst_transform, dst_width, dst_height = calculate_default_transform(
src_crs=src.crs,
dst_crs=dst_crs,
width=src.width,
height=src.height,
left=src.bounds.left,
bottom=src.bounds.bottom,
right=src.bounds.right,
top=src.bounds.top,
resolution=resolution, # None → preserve source GSD in target units
)
return dst_transform, dst_width, dst_height
Omitting resolution lets rasterio infer a pixel size that minimises area distortion at the centre of the scene. Pass an explicit resolution tuple whenever you need a fixed ground sampling distance — for example when aligning Sentinel-2 10 m bands with 20 m bands before band math operations with xarray.
Step 4 — Execute the warp
Reproject each band individually using rasterio.warp.reproject. Reading with masked=True preserves the nodata boundary during interpolation and prevents edge fill-values from bleeding into valid pixels.
import numpy as np
from rasterio.warp import reproject, Resampling
def warp_band(
src, # open rasterio DatasetReader
band_index: int,
dst_transform,
dst_crs: CRS,
dst_width: int,
dst_height: int,
resampling: Resampling,
dst_nodata: float | None,
) -> np.ndarray:
"""
Reproject a single band and return the destination array.
"""
src_band = src.read(band_index, masked=True)
dst_band = np.full(
(dst_height, dst_width),
fill_value=dst_nodata if dst_nodata is not None else src.nodata,
dtype=src.dtypes[band_index - 1],
)
reproject(
source=src_band,
destination=dst_band,
src_transform=src.transform,
src_crs=src.crs,
dst_transform=dst_transform,
dst_crs=dst_crs,
resampling=resampling,
src_nodata=src.nodata,
dst_nodata=dst_nodata if dst_nodata is not None else src.nodata,
)
return dst_band
Step 5 — Write COG-compliant output
Update the dataset profile with the new CRS, transform, and dimensions, then write each reprojected band. Build internal overviews before closing the file so the output is immediately streamable as a Cloud-Optimized GeoTIFF.
import logging
import rasterio
from rasterio.warp import calculate_default_transform, Resampling
from rasterio.crs import CRS
import numpy as np
logger = logging.getLogger(__name__)
def reproject_raster(
src_path: str,
dst_path: str,
dst_crs: str | int | CRS,
resolution: tuple[float, float] | None = None,
resampling: Resampling = Resampling.bilinear,
dst_nodata: float | None = None,
compress: str = "lzw",
) -> None:
"""
Reproject src_path to dst_crs and write a tiled, overview-bearing GeoTIFF.
Args:
src_path: Path to the source raster.
dst_path: Path for the reprojected output.
dst_crs: Target CRS — EPSG int, PROJ string, WKT, or CRS object.
resolution: Optional (x_res, y_res) in target CRS units.
resampling: Resampling kernel — use Resampling.nearest for categorical data.
dst_nodata: Override nodata for the output; inherits from source if None.
compress: GDAL compression driver (lzw, deflate, zstd).
"""
target_crs = CRS.from_user_input(dst_crs)
with rasterio.open(src_path) as src:
if src.crs is None:
raise ValueError("Source dataset lacks a valid CRS definition.")
dst_transform, dst_width, dst_height = calculate_default_transform(
src_crs=src.crs,
dst_crs=target_crs,
width=src.width,
height=src.height,
left=src.bounds.left,
bottom=src.bounds.bottom,
right=src.bounds.right,
top=src.bounds.top,
resolution=resolution,
)
effective_nodata = dst_nodata if dst_nodata is not None else src.nodata
profile = src.profile.copy()
profile.update({
"crs": target_crs,
"transform": dst_transform,
"width": dst_width,
"height": dst_height,
"nodata": effective_nodata,
"compress": compress,
"tiled": True,
"blockxsize": 256,
"blockysize": 256,
"interleave": "band",
})
with rasterio.open(dst_path, "w", **profile) as dst:
for i in range(1, src.count + 1):
src_band = src.read(i, masked=True)
dst_band = np.full(
(dst_height, dst_width),
fill_value=effective_nodata,
dtype=src.dtypes[i - 1],
)
from rasterio.warp import reproject
reproject(
source=src_band,
destination=dst_band,
src_transform=src.transform,
src_crs=src.crs,
dst_transform=dst_transform,
dst_crs=target_crs,
resampling=resampling,
src_nodata=src.nodata,
dst_nodata=effective_nodata,
)
dst.write(dst_band, i)
# Build internal overviews — required for COG compliance
dst.build_overviews([2, 4, 8, 16], Resampling.average)
dst.update_tags(ns="rio_overview", resampling="AVERAGE")
logger.info("Reprojected %s → %s (%s)", src_path, dst_path, target_crs.to_epsg())
Parameter reference
| Parameter | Type | Default | Usage note |
|---|---|---|---|
dst_crs |
str | int | CRS |
— | Required. Accepts EPSG int, "EPSG:XXXX", WKT, PROJ4 dict, or CRS object |
resolution |
tuple[float, float] | None |
None |
Pin output pixel size in target CRS units. None infers GSD from source |
resampling |
Resampling |
bilinear |
See resampling table below. Use nearest for categorical data |
dst_nodata |
float | None |
None |
Override nodata fill. Inherits from source when None |
compress |
str |
"lzw" |
Passed directly to the GDAL driver. deflate or zstd for higher ratios |
blockxsize / blockysize |
int |
256 |
COG tile dimensions. 512 can improve throughput on high-bandwidth storage |
interleave |
str |
"band" |
Band-sequential layout; use "pixel" for RGB display outputs only |
Resampling kernel reference
| Kernel | Resampling enum |
Use case | Notes |
|---|---|---|---|
| Nearest neighbour | nearest |
Land cover, crop maps, any categorical raster | Preserves original class values; no interpolation |
| Bilinear | bilinear |
Reflectance, temperature, elevation | 2×2 weighted average; smooths gradients |
| Cubic | cubic |
High-resolution display imagery | 4×4 cubic convolution; sharpest edges |
| Average | average |
Downscaling / overview generation | Mean of all contributing source pixels |
| Mode | mode |
Discrete classes at coarser resolution | Returns the most frequent value in the footprint |
For continuous environmental variables, bilinear or cubic resampling preserves spectral continuity. However, when preparing data for band math operations with xarray, confirm that resampling has not introduced dynamic-range drift that would skew index calculations such as NDVI.
Verification and testing
Automated integrity check
Run this immediately after the warp. Relying on visual inspection alone misses subtle geometric drift that accumulates across multi-stage pipelines.
import rasterio
import numpy as np
from rasterio.crs import CRS
def validate_reprojection(
src_path: str,
dst_path: str,
expected_crs: str | int | CRS,
stat_tolerance: float = 0.005,
) -> dict:
"""
Compare source and destination rasters. Raises AssertionError on failure.
Returns a dict of key metrics for logging.
"""
target_crs = CRS.from_user_input(expected_crs)
with rasterio.open(src_path) as src, rasterio.open(dst_path) as dst:
# CRS identity check
assert dst.crs == target_crs, (
f"CRS mismatch: expected {target_crs.to_epsg()}, "
f"got {dst.crs.to_epsg()}"
)
# Pixel size sanity — destination must have a non-identity transform
assert dst.transform.a > 0, "Destination x-pixel size is zero or negative"
assert dst.transform.e < 0, "Destination y-pixel size is zero or positive (north-up required)"
# Statistical preservation for band 1
src_data = src.read(1, masked=True).compressed()
dst_data = dst.read(1, masked=True).compressed()
src_mean = float(np.mean(src_data)) if src_data.size else 0.0
dst_mean = float(np.mean(dst_data)) if dst_data.size else 0.0
if src_mean != 0:
drift = abs(dst_mean - src_mean) / abs(src_mean)
assert drift < stat_tolerance, (
f"Statistical drift {drift:.4%} exceeds tolerance {stat_tolerance:.4%}. "
"Check resampling kernel and nodata handling."
)
return {
"src_crs": src.crs.to_epsg(),
"dst_crs": dst.crs.to_epsg(),
"src_bounds": dict(src.bounds._asdict()),
"dst_bounds": dict(dst.bounds._asdict()),
"src_transform": list(src.transform)[:6],
"dst_transform": list(dst.transform)[:6],
"band1_mean_src": src_mean,
"band1_mean_dst": dst_mean,
"nodata_preserved": src.nodata == dst.nodata,
}
What to check in the output
- CRS alignment —
dst.crs.to_epsg()matches the intended target. - Transform pixel size —
dst.transform.aequals your intended x-resolution;abs(dst.transform.e)equals y-resolution. - Extent coverage — destination bounds encompass the projected source footprint. A significantly smaller extent indicates the antimeridian clipping problem described in the Troubleshooting section below.
- Statistical integrity — band mean, min, and max remain within ±0.5% for continuous data; exact match for categorical rasters.
- Internal overviews —
dst.overviews(1)returns[2, 4, 8, 16]; an empty list means COG compliance failed.
Troubleshooting
PROJ_NETWORK required for datum shifts
Symptom: Coordinates shift by tens of metres after reprojection between NAD27 and WGS84 or between national datum systems.
Cause: Sub-metre-accurate datum transformations require NTv2 or VERTCON grid shift files (.tif / .gsb) that are not bundled with every PROJ installation.
Fix: Set PROJ_NETWORK=ON before importing rasterio so PROJ can download missing grids from the CDN on first use. Alternatively, install the proj-data package for offline availability.
import os
os.environ["PROJ_NETWORK"] = "ON"
import rasterio # must be imported AFTER setting the env var
Output extent smaller than expected (antimeridian clipping)
Symptom: Reprojecting a global dataset to Web Mercator (EPSG:3857) produces an output that is half the expected width, or the eastern hemisphere disappears.
Cause: calculate_default_transform does not split geometries that cross ±180°. Bounding boxes that span the antimeridian are passed as-is and get incorrectly clipped.
Fix: Split the source into eastern and western hemispheres before reprojection, or reproject to an equal-area global CRS (e.g. EPSG:6933) that does not have a singularity at the antimeridian.
Resolution drift on large reprojections
Symptom: The reprojected raster has more or fewer pixels than expected, and the pixel footprint on the ground varies across the scene.
Cause: When resolution is None, calculate_default_transform selects a pixel size that minimises distortion at the scene centre. Over large extents or oblique projections the effective GSD varies.
Fix: Pass an explicit resolution=(x_res, y_res) in target CRS units. This enforces a constant ground sampling distance but may expand the output extent slightly to accommodate an integer number of pixels.
NoCRSError on legacy files
Symptom: rasterio.open succeeds but src.crs is None, causing calculate_default_transform to raise CRSError.
Cause: The file was written without an embedded spatial reference — common with older GeoTIFFs, raw binary rasters, or files exported from non-GIS software.
Fix: Assign the CRS externally using gdal_edit.py -a_srs EPSG:XXXX <file>, or pass it explicitly when writing the destination profile.
Memory exhaustion on wide multi-band rasters
Symptom: MemoryError or OOM kill when warping a 10-band, 10000×10000 raster.
Cause: Reading all bands simultaneously before warping allocates the entire source array in memory.
Fix: Reproject band by band (as shown in the workflow above), or use windowed reads to process spatial tiles independently. Windowed reprojection is covered in optimizing rasterio window reads for memory efficiency.
Advanced considerations for global datasets
Polar projections, antimeridian crossings, and multi-datum archives introduce complications beyond the standard workflow.
Polar stereographic projections (e.g. NSIDC EPSG:3413 or EPSG:3976) define pixel size in metres near the pole; calculate_default_transform handles them correctly, but the output extent in degrees can be misleading when you inspect dst.bounds. Always inspect the bounds in the projected CRS, not after converting back to WGS84.
Datum transformations — EPSG codes alone are insufficient when shifting between legacy national datums. Always verify whether a grid shift file is required for sub-metre accuracy before running production pipelines over archival datasets.
Resolution pinning across sensors — when stacking Sentinel-2 10 m, 20 m, and 60 m bands for a spectral index calculation pipeline, reproject all bands to a shared UTM CRS with a fixed resolution=(10, 10) before any band math. A single consistent affine transform across the stack eliminates pixel-offset errors that otherwise skew index results.
FAQ
Why does rasterio raise NoCRSError even though my file has a projection?
The file likely stores the CRS as an embedded WKT string that PROJ cannot parse, often because GDAL was built against an older PROJ version. Open the file with rasterio.open and print src.crs — if it is None, pass crs=CRS.from_epsg(XXXX) explicitly to the write profile or repair the file with gdal_edit.py -a_srs EPSG:XXXX.
Should I reproject to WGS84 or keep UTM for area calculations?
Keep a projected CRS such as UTM or equal-area (e.g. EPSG:6933 EASE-Grid 2.0) for area calculations — geographic CRS like WGS84 (EPSG:4326) uses angular degrees, which makes pixel area vary with latitude. Reproject to WGS84 only for display, tile serving, or export to formats that require geographic coordinates.
What resampling method should I use for a land-cover classification raster?
Always use nearest-neighbour resampling for categorical datasets such as land cover or crop type maps. Bilinear or cubic interpolation blends class labels and produces fractional values that do not correspond to any real category.
Related
- Core Raster Fundamentals & STAC Mapping — parent section covering the full raster data model, COG structure, and STAC catalog navigation.
- Understanding Cloud-Optimized GeoTIFF Structure — internal overview layout and tiling requirements that reprojected outputs must satisfy.
- Fixing EPSG mismatches in rasterio.open — step-by-step diagnostics for ambiguous or conflicting CRS metadata in source files.
- Handling Pixel Resolution and Scaling — affine transform arithmetic and window-based reads that complement reprojection workflows.
- Band Math Operations with Xarray — computing spectral indices on multi-band stacks that must share a common CRS and transform after reprojection.
- Advanced Resampling and Upscaling Techniques — resampling kernel theory and sensor-specific guidance across the satellite processing pipeline.