Fixing EPSG Mismatches in rasterio.open

Direct answer: call src.crs.to_epsg() immediately after opening the file, compare it against your expected EPSG code, then either override the metadata tag via MemoryFile (zero resampling, milliseconds) or reproject the pixel grid with rasterio.warp.reproject (necessary only when coordinates truly diverge).

import rasterio
from rasterio.crs import CRS
from rasterio.io import MemoryFile

with rasterio.open("scene.tif") as src:
    if src.crs is None or src.crs.to_epsg() != 32633:
        target_crs = CRS.from_epsg(32633)
        memfile = MemoryFile()
        profile = src.profile | {"crs": target_crs}
        with memfile.open(**profile) as dst:
            dst.write(src.read())
        # keep `memfile` alive — freeing it invalidates `ds`
        ds = memfile.open()

This approach is part of the Mastering CRS Transformations in Rasterio workflow. For the broader context of raster ingestion architecture, see Core Raster Fundamentals & STAC Mapping.


Why EPSG Mismatches Occur During Ingestion

Remote sensing pipelines routinely ingest heterogeneous data: legacy GeoTIFFs, cloud-optimized archives, and STAC catalog assets. Mismatches concentrate around three root causes.

Header vs. data divergence. The TIFF tags declare one EPSG (e.g., EPSG:4326), but the pixel grid coordinates and affine transform clearly follow a projected system like EPSG:32633. This pattern is common when data is manually edited in desktop GIS software without updating projection metadata. You can catch it quickly: if src.crs claims EPSG:4326 (degree-valued) but src.transform.c is a seven-digit number, the label is wrong.

Deprecated or ambiguous WKT strings. Older GDAL versions wrote custom Well-Known Text strings that map to multiple EPSG identifiers. Modern PROJ rejects ambiguous definitions unless explicitly resolved, so src.crs.to_epsg() returns None even when a valid CRS is embedded.

Catalog metadata overrides. proj:epsg fields in STAC items frequently conflict with actual raster headers, especially after mosaicking or manual curation. Relying solely on catalog JSON without validating the binary payload introduces silent spatial drift — a particularly dangerous failure mode when feeding data into spectral index calculation pipelines or spatial joins.

The diagram below shows where a mismatch can enter the pipeline and the two correction branches available:

EPSG mismatch correction decision flow A flowchart showing rasterio.open leading to CRS validation, then branching into a fast metadata-only override path (MemoryFile, no resampling) and a slow reproject path (calculate_default_transform + warp). rasterio.open(path) read header src.crs.to_epsg() cross-check bounds EPSG correct? Use as-is no action needed coords diverge? MemoryFile override tag fix, no resample warp.reproject pixel grid transform yes no tag only diverge

Environment & Setup

Package Minimum version Why required
rasterio 1.3 rasterio.open, MemoryFile, warp.reproject
pyproj 3.4 CRS authority lookups, Transformer
numpy 1.24 Array operations on band data
pip install "rasterio>=1.3" "pyproj>=3.4" "numpy>=1.24"

Complete Working Example

The function below handles both the fast path (metadata tag wrong, coordinates correct) and the slow path (pixel grid genuinely needs transformation). It returns a (dataset, memfile) tuple so the caller can control the lifetime of the in-memory buffer.

import rasterio
from rasterio.crs import CRS
from rasterio.io import MemoryFile
from rasterio.warp import reproject, Resampling, calculate_default_transform


def open_with_verified_crs(
    filepath: str,
    target_epsg: int,
    force_reproject: bool = False,
) -> tuple[rasterio.DatasetReader, MemoryFile | None]:
    """
    Open a raster and guarantee the returned dataset carries the correct CRS.

    Parameters
    ----------
    filepath : str
        Path to a local GeoTIFF or any rasterio-supported format.
    target_epsg : int
        The EPSG code the caller expects (e.g. 32633 for UTM Zone 33N).
    force_reproject : bool
        If True, warp the pixel grid even when only the metadata tag differs.
        Use this when bounds validation confirms true coordinate divergence.

    Returns
    -------
    (dataset, memfile)
        Keep `memfile` alive in the caller's scope. If it is garbage-collected,
        the in-memory buffer is freed and `dataset` becomes invalid — which
        causes a segmentation fault, not a Python exception.
    """
    target_crs = CRS.from_epsg(target_epsg)

    with rasterio.open(filepath) as src:
        if src.crs is None:
            raise ValueError(
                f"No CRS metadata found in '{filepath}'. "
                "Assign a CRS explicitly before ingestion."
            )

        src_epsg = src.crs.to_epsg()  # returns None if ambiguous

        # --- Fast path: EPSG already matches; return a direct file handle ---
        if src_epsg == target_epsg and not force_reproject:
            return rasterio.open(filepath), None

        memfile = MemoryFile()

        if not force_reproject:
            # Metadata-only override: copy raw pixel data without any
            # resampling. Safe only when coordinates are already in the
            # target system but the TIFF tag is wrong (e.g. labelled 4326
            # while the affine values are clearly in UTM metres).
            profile = src.profile | {"crs": target_crs}
            with memfile.open(**profile) as dst:
                dst.write(src.read())  # single bulk read, no per-band loop
            return memfile.open(), memfile

        # --- Slow path: true coordinate transformation ---
        # calculate_default_transform derives the optimal output extent and
        # resolution so no source pixels are clipped or over-sampled.
        transform, width, height = calculate_default_transform(
            src.crs,
            target_crs,
            src.width,
            src.height,
            *src.bounds,  # left, bottom, right, top in src CRS units
        )
        profile = src.profile | {
            "crs": target_crs,
            "transform": transform,
            "width": width,
            "height": height,
        }
        with memfile.open(**profile) as dst:
            for band_idx in range(1, src.count + 1):
                reproject(
                    source=rasterio.band(src, band_idx),
                    destination=rasterio.band(dst, band_idx),
                    src_transform=src.transform,
                    src_crs=src.crs,
                    dst_transform=transform,
                    dst_crs=target_crs,
                    resampling=Resampling.bilinear,
                    # bilinear is appropriate for continuous data (reflectance,
                    # elevation); use Resampling.nearest for categorical data
                    # (land cover, cloud masks).
                )
        return memfile.open(), memfile

Usage:

ds, memfile = open_with_verified_crs("sentinel2_B04.tif", target_epsg=32633)
print(ds.crs.to_epsg())   # 32633
print(ds.bounds)
# keep ds and memfile in scope until you are done reading
ds.close()
if memfile:
    memfile.close()

Variant Patterns

Variant 1 — Sentinel-2 STAC asset with proj:epsg conflict

When loading assets via a STAC catalog query, the proj:epsg field in the item’s properties may disagree with the actual GeoTIFF header. This variant reads the STAC-declared EPSG, compares it against the header, and applies the override only when necessary:

import rasterio
from rasterio.crs import CRS
from rasterio.io import MemoryFile
import pystac


def open_stac_asset_verified(
    item: pystac.Item,
    asset_key: str = "B04",
) -> tuple[rasterio.DatasetReader, MemoryFile | None]:
    """
    Open a STAC asset, using the raster header as ground truth for the CRS.
    The STAC proj:epsg field is treated as a hint only.
    """
    asset_href = item.assets[asset_key].href
    stac_epsg: int | None = item.properties.get("proj:epsg")

    with rasterio.open(asset_href) as src:
        header_epsg = src.crs.to_epsg() if src.crs else None

    if stac_epsg and header_epsg and stac_epsg != header_epsg:
        # STAC metadata disagrees with the header; trust the header.
        # Log the discrepancy for audit trails.
        print(
            f"CRS conflict for {item.id}/{asset_key}: "
            f"STAC says EPSG:{stac_epsg}, header says EPSG:{header_epsg}. "
            f"Using header value."
        )

    # Validate bounds against the header EPSG regardless of STAC claim
    expected_epsg = header_epsg or stac_epsg
    if expected_epsg is None:
        raise ValueError(f"Cannot determine CRS for {item.id}/{asset_key}")

    return open_with_verified_crs(asset_href, target_epsg=expected_epsg)

Variant 2 — Batch pipeline with pyproj bounds validation

For batch ingestion of heterogeneous archives, cross-referencing src.bounds against the official EPSG geographic extent catches mislabelled files before they corrupt downstream outputs. The handling pixel resolution and scaling cluster covers how affine transform parameters interact with the CRS choice.

from pyproj import CRS as ProjCRS
from pyproj.aoi import AreaOfInterest
from pyproj.database import query_utm_crs_info
import rasterio


def validate_crs_by_bounds(filepath: str) -> dict:
    """
    Return a validation report: whether the embedded CRS is plausible
    given the raster's geographic extent.
    """
    with rasterio.open(filepath) as src:
        if src.crs is None:
            return {"valid": False, "reason": "no_crs"}

        epsg = src.crs.to_epsg()
        bounds = src.bounds

        # Convert bounds to WGS 84 degrees for the plausibility check
        proj_crs = ProjCRS.from_user_input(src.crs)
        wgs84_bounds = proj_crs.area_of_use

        if wgs84_bounds is None:
            return {"valid": None, "reason": "no_aou_data", "epsg": epsg}

        # A crude sanity check: the raster's geographic footprint must
        # overlap the CRS's declared area of use.
        in_range = (
            wgs84_bounds.west <= bounds.right
            and wgs84_bounds.east >= bounds.left
            and wgs84_bounds.south <= bounds.top
            and wgs84_bounds.north >= bounds.bottom
        )
        return {
            "valid": in_range,
            "epsg": epsg,
            "reason": "bounds_ok" if in_range else "bounds_outside_aou",
        }


# Example batch run
import pathlib

for tif in pathlib.Path("./scenes").glob("*.tif"):
    report = validate_crs_by_bounds(str(tif))
    if not report["valid"]:
        print(f"WARNING {tif.name}: {report}")

Variant 3 — Dask-friendly variant for large mosaics

When working with large mosaics processed via rasterio.vrt.WarpedVRT, the MemoryFile pattern does not scale. Instead, pass the correct CRS directly to WarpedVRT, which performs lazy on-demand warping without materialising the entire array:

import rasterio
from rasterio.crs import CRS
from rasterio.vrt import WarpedVRT

def open_warped_vrt(filepath: str, target_epsg: int) -> WarpedVRT:
    """
    Return a WarpedVRT that exposes the dataset in `target_epsg` without
    loading all pixels into memory. Compatible with rasterio's window-read
    interface used in Dask pipelines.
    """
    src = rasterio.open(filepath)
    # WarpedVRT wraps the source and handles on-demand pixel transformation.
    # No data is read until a window or array is requested.
    return WarpedVRT(src, crs=CRS.from_epsg(target_epsg))

Common Errors

AttributeError: 'NoneType' object has no attribute 'to_epsg' src.crs is None — the file has no CRS metadata at all. Assign one via the MemoryFile pattern above or reject the file early with a clear error message rather than propagating None downstream.

rasterio.errors.CRSError: Input is not a CRS CRS.from_epsg() received an invalid code (e.g. None or a deprecated authority string). Verify the integer before calling; EPSG codes are positive integers in the range 1024–32767 for common projected and geographic systems.

Segmentation fault after MemoryFile.close() The MemoryFile object was garbage-collected while a dataset handle opened from it was still in use. Always return (or store) both (dataset, memfile) and call memfile.close() only after dataset.close().