Handling Pixel Resolution and Scaling in Python

Pixel resolution governs the spatial fidelity of every geospatial dataset you touch — it controls analytical accuracy, storage size, and compute cost. In production remote sensing pipelines, resolution and scaling operations sit at the junction of data ingestion and analysis: you must parse affine metadata correctly, align grids before merging datasets, and convert raw digital numbers to physical units before any spatial interpolation. Getting the order of operations wrong silently corrupts downstream spectral indices, machine learning features, or time-series statistics. For the full architectural context of how this fits into raster data models, see Core Raster Fundamentals & STAC Mapping.

Prerequisites

Install the required libraries before starting:

pip install rasterio>=1.3 rioxarray>=0.15 xarray>=2023.6 numpy>=1.24
Library Min version Why required
rasterio 1.3 Affine transform parsing, reproject, calculate_default_transform
rioxarray 0.15 CRS-aware xarray extension; open_rasterio with masked nodata
xarray 2023.6 Labelled N-D arrays for multi-band and multi-temporal stacks
numpy 1.24 Array arithmetic for radiometric scaling

Conceptual prerequisites: you should be comfortable with how an affine geotransform maps pixel indices to ground coordinates (covered in Extracting and Parsing Raster Metadata), and you should understand how Cloud-Optimised GeoTIFF internal tiling affects block-aligned reads before you run chunked resampling over remote files.

How pixel resolution, GSD, and radiometric scaling relate

The diagram below shows the data flow from raw sensor storage through scaling and resampling to an analysis-ready output. The critical invariant is that radiometric scaling must precede spatial interpolation.

Resolution and scaling pipeline data flow Diagram showing five sequential stages: Raw DN (16-bit integer), Radiometric Scaling (apply scale_factor + add_offset), CRS Alignment (reproject to common grid), Spatial Resampling (bilinear/average/nearest), and Analysis-Ready Output (float32 reflectance). Arrows connect each stage left to right. Raw DN (16-bit int) Radiometric Scaling scale_factor + add_offset CRS Alignment reproject to common grid Spatial Resampling bilinear / average / nearest Analysis-ready output (float32 reflectance, target GSD) Must precede resampling

Ground Sampling Distance (GSD) is the physical ground area one pixel covers. The declared GSD in a file header can differ from the true optical GSD because of terrain displacement, off-nadir viewing, or processing decisions — always confirm against the product specification. Resampling algorithms must match the data type and analytical intent:

Method When to use Risk if misapplied
Resampling.nearest Categorical / label rasters (land-cover class, quality flag) Blocky artifacts on continuous data
Resampling.bilinear Upscaling continuous data (reflectance, elevation) Blurs edges; not suitable for labels
Resampling.cubic High-quality upscaling where edge sharpness matters Higher compute; can overshoot at sharp boundaries
Resampling.average Downscaling continuous data Aliasing if downscale factor is large without prior smoothing
Resampling.lanczos High-fidelity downscaling, final display outputs Slowest; ringing near hard edges

For a deeper comparison of these methods against Sentinel-2 imagery specifically, see Advanced Resampling and Upscaling Techniques.

Step-by-step workflow

Step 1 — Extract and validate affine transform metadata

Parse the source raster’s affine transform, CRS, dimensions, and embedded scaling metadata before touching the data. Validation catches non-orthogonal transforms (rotated or sheared grids) that will silently misalign pixels if left uncorrected.

import rasterio
from rasterio.transform import Affine

def validate_raster_metadata(src_path: str) -> dict:
    """
    Open a raster and return validated geometry metadata.

    Raises ValueError if the transform is non-orthogonal (rotation/shear present).
    """
    with rasterio.open(src_path) as src:
        transform: Affine = src.transform
        crs = src.crs

        # Affine components:
        #   a = pixel width (positive),  e = pixel height (negative for north-up)
        #   b, d = off-diagonal rotation terms (should be ~0 for north-up rasters)
        pixel_w = abs(transform.a)
        pixel_h = abs(transform.e)

        if not (abs(transform.b) < 1e-6 and abs(transform.d) < 1e-6):
            raise ValueError(
                f"Non-orthogonal transform detected (b={transform.b:.2e}, d={transform.d:.2e}). "
                "Correct rotation or shear before resampling."
            )

        # Read embedded scale/offset if present (e.g. NetCDF conventions)
        tags = src.tags()
        scale  = float(tags.get("scale_factor", 1.0))
        offset = float(tags.get("add_offset",   0.0))

        meta = {
            "crs": crs,
            "transform": transform,
            "pixel_width_m":  pixel_w,
            "pixel_height_m": pixel_h,
            "width": src.width,
            "height": src.height,
            "count": src.count,
            "dtype": src.dtypes[0],
            "nodata": src.nodata,
            "scale_factor": scale,
            "add_offset": offset,
        }

        print(
            f"CRS: {crs.to_epsg()} | GSD: {pixel_w:.1f}m × {pixel_h:.1f}m | "
            f"Shape: {src.height}×{src.width} | scale: {scale} offset: {offset}"
        )
        return meta

The b and d terms of the Affine matrix are the rotation/shear components. For standard north-up satellite products they are always near zero — if they are not, the raster has been delivered in a rotated grid (common for airborne SAR) and must be warped to a rectilinear grid first.

Step 2 — Align coordinate reference systems

Resolution adjustments fail silently when source and target datasets live in different projections. Even a shared EPSG code is insufficient if the origin point or pixel-boundary snapping convention differs between sensors. Before merging or resampling, reproject every input to a common CRS and snap pixel grids to a shared reference origin.

The full reprojection workflow — including handling of Proj network lookups and datum shift grids — is documented in Mastering CRS Transformations in Rasterio. The critical pattern when combining CRS alignment with resolution change is to derive the new transform from the original bounds rather than computing it manually, which avoids floating-point accumulation errors:

from rasterio.warp import calculate_default_transform

def compute_target_transform(
    src_path: str,
    target_crs: str,
    target_gsd: float,
) -> tuple:
    """
    Derive the affine transform for a reprojected + resampled output.

    Returns (new_transform, new_width, new_height).
    """
    with rasterio.open(src_path) as src:
        new_transform, new_width, new_height = calculate_default_transform(
            src_crs=src.crs,
            dst_crs=target_crs,
            width=src.width,
            height=src.height,
            left=src.bounds.left,
            bottom=src.bounds.bottom,
            right=src.bounds.right,
            top=src.bounds.top,
            resolution=target_gsd,  # override output pixel size
        )
    return new_transform, new_width, new_height

Passing resolution=target_gsd to calculate_default_transform adjusts the output grid to your desired pixel size in the target CRS units (metres for UTM, degrees for geographic CRS).

Step 3 — Apply radiometric scaling before resampling

Scaling factors bridge raw 16-bit integer digital numbers (DN) and physical units such as surface reflectance (0–1) or top-of-atmosphere radiance (W·m⁻²·sr⁻¹·μm⁻¹). The formula is always:

physical_value = DN × scale_factor + add_offset

This step must come before spatial interpolation. When you resample raw integers first, the interpolation kernel averages quantised values, producing fractional intermediates that violate the sensor’s radiometric calibration model. Post-scaling those averaged DNs can introduce errors up to half a DN across every pixel boundary — enough to corrupt NDVI or water-index calculations for heterogeneous land surfaces.

import numpy as np
import rioxarray

def apply_radiometric_scaling(
    file_path: str,
    scale: float = 0.0001,
    offset: float = 0.0,
    valid_range: tuple[float, float] = (0.0, 1.0),
):
    """
    Load a raster via rioxarray and convert raw DN to physical units.

    masked=True converts nodata pixels to NaN so they propagate cleanly
    through downstream xarray operations instead of corrupting statistics.
    """
    da = rioxarray.open_rasterio(file_path, masked=True)

    # Broadcasting works across any number of bands
    da_physical = da * scale + offset

    # Clip to the physically meaningful range (reflectance: 0–1)
    da_physical = da_physical.clip(*valid_range)

    return da_physical

Sentinel-2 L2A products store the quantification value (QUANTIFICATION_VALUE = 10000) and an optional RADIO_ADD_OFFSET (introduced in Processing Baseline 04.00) in scene-level XML metadata — always parse the actual metadata rather than hardcoding. Landsat Collection 2 Level-2 uses per-band REFLECTANCE_MULT_BAND_n and REFLECTANCE_ADD_BAND_n coefficients in the MTL file.

Step 4 — Resample to target resolution with memory-efficient chunking

Large rasters (continental-scale mosaics, full Sentinel-2 scenes) exceed available RAM if loaded entirely. The pattern below processes data band-by-band using rasterio.warp.reproject, which internally limits memory to a configurable working buffer size.

from rasterio.warp import reproject, Resampling

def resample_raster(
    src_path: str,
    dst_path: str,
    target_crs: str,
    target_gsd: float,
    resampling: Resampling = Resampling.bilinear,
) -> None:
    """
    Reproject and resample a raster to a new CRS and pixel size.

    Data must already be radiometrically scaled (float32) before calling this.
    """
    new_transform, new_width, new_height = compute_target_transform(
        src_path, target_crs, target_gsd
    )

    with rasterio.open(src_path) as src:
        profile = src.profile.copy()
        profile.update(
            crs=target_crs,
            transform=new_transform,
            width=new_width,
            height=new_height,
            dtype="float32",
            driver="GTiff",
            compress="deflate",
            predictor=3,       # optimal for float32 data
            tiled=True,
            blockxsize=256,
            blockysize=256,
            nodata=float("nan"),
        )

        with rasterio.open(dst_path, "w", **profile) as dst:
            for band_idx in range(1, src.count + 1):
                reproject(
                    source=rasterio.band(src, band_idx),
                    destination=rasterio.band(dst, band_idx),
                    src_transform=src.transform,
                    src_crs=src.crs,
                    dst_transform=new_transform,
                    dst_crs=target_crs,
                    resampling=resampling,
                    # warp_mem_limit controls peak RAM per band in MB
                    warp_mem_limit=512,
                )

For datasets that still exceed memory even band-by-band (e.g. very large uncompressed source files), window-based tiled reads eliminate the issue. The companion page Optimizing Rasterio Window Reads for Memory Efficiency covers generator-based chunking that aligns reads to internal tile boundaries, maximising block cache hits and minimising disk I/O.

Step 5 — Write optimised GeoTIFF and validate output

Default GeoTIFF profiles produce uncompressed, strip-organised files that perform poorly in cloud storage and GIS tools. The profile in step 4 already applies the key options; the table below explains each choice:

Profile key Value Rationale
tiled True Enables spatial indexing and partial reads without scanning full rows
compress deflate Lossless; better ratio than LZW for float data
predictor 3 Horizontal differencing for float32; 2 for integer data
blockxsize / blockysize 256 Matches cloud object storage transfer granularity
dtype float32 Preserves physical unit precision after scaling
nodata nan Propagates cleanly through xarray/numpy operations

After writing, validate the output using rasterio profile assertions and extent checks:

def validate_output(dst_path: str, expected_crs_epsg: int, expected_gsd: float) -> None:
    """Assert the resampled file matches the intended geometry."""
    with rasterio.open(dst_path) as dst:
        actual_epsg = dst.crs.to_epsg()
        actual_gsd_x = abs(dst.transform.a)
        actual_gsd_y = abs(dst.transform.e)

        assert actual_epsg == expected_crs_epsg, (
            f"CRS mismatch: expected EPSG:{expected_crs_epsg}, got EPSG:{actual_epsg}"
        )
        # Allow 0.1 m tolerance for floating-point GSD rounding
        assert abs(actual_gsd_x - expected_gsd) < 0.1, (
            f"GSD mismatch: expected {expected_gsd}m, got {actual_gsd_x:.4f}m"
        )
        assert abs(actual_gsd_y - expected_gsd) < 0.1, (
            f"GSD mismatch: expected {expected_gsd}m, got {actual_gsd_y:.4f}m"
        )

        print(
            f"Validation passed — EPSG:{actual_epsg}, "
            f"GSD: {actual_gsd_x:.2f}m × {actual_gsd_y:.2f}m, "
            f"shape: {dst.height}×{dst.width}"
        )

Parameter reference

Key arguments for rasterio.warp.reproject and calculate_default_transform:

Parameter Type Default Usage note
resampling Resampling Resampling.nearest Match to data type — see table above
warp_mem_limit int 0 (unlimited) Set to available RAM in MB to prevent OOM on large files
resolution float or (x, y) derived from input Pass to calculate_default_transform to override output GSD
num_threads int 1 Increase for multi-core resampling; GDAL uses OpenMP internally
target_aligned_pixels bool False Snap output grid to the target resolution origin; prevents sub-pixel drift when merging tiles

Verification and testing

A resampled raster is correctly produced when all of the following hold:

  1. Bounds match: rasterio.open(dst).bounds should be within one pixel of the expected geographic extent. Use rioxarray’s .rio.bounds() against a reference vector or the source extent.
  2. Value range is physical: After scaling, float32 values must lie within the sensor’s stated physical range. For surface reflectance: assert da.values.max() <= 1.0 + 1e-4.
  3. NaN distribution is consistent: Masked pixels must not have grown (bleed from bilinear interpolation across nodata boundaries) or shrunk (nodata values incorrectly filled). Compare cloud-mask pixel counts before and after.
  4. Affine passes the orthogonality check: Run validate_raster_metadata on the output and confirm b ≈ 0, d ≈ 0.
  5. Time-series alignment: When building multi-date stacks, confirm every layer has an identical transform with numpy.allclose(layer1.transform, layer2.transform) — silent misalignment is the most common cause of spurious temporal change signals.

Wrap steps 1–4 in a pytest fixture that runs as part of your CI pipeline. A single failed assertion here prevents weeks of debugging corrupted downstream models.

Troubleshooting

CRSError: Input is not a CRS when passing target_crs as a string

Cause: rasterio.warp.reproject accepts rasterio.crs.CRS objects or "EPSG:XXXX" strings, but not bare integers or PROJ4 strings in all versions. Fix: use rasterio.crs.CRS.from_epsg(32633) or the canonical string "EPSG:32633".


Resampled output has extent that differs by one pixel row/column

Cause: floating-point rounding in manually computed width/height from src.width * scale_factor. Fix: always derive dimensions from calculate_default_transform with explicit bounds rather than computing int(src.width * scale_factor) — the former solves for a consistent grid, the latter accumulates rounding error.


MemoryError during reproject on a 10 GB raster

Cause: warp_mem_limit=0 (unlimited) causes GDAL to attempt to allocate the entire output in one pass. Fix: set warp_mem_limit=512 (MB) and process band-by-band. For extremely large inputs, switch to window-based reads as described in Optimizing Rasterio Window Reads for Memory Efficiency.


Bilinear resampling bleeds valid reflectance values into cloud/shadow pixels

Cause: Resampling.bilinear interpolates across the nodata boundary, giving masked pixels a fractional valid value. Fix: run resampling with masked=True in rioxarray, or apply the cloud mask as an explicit nodata region before resampling so the interpolation kernel has no valid neighbours to borrow from.


PROJ_NETWORK warning and degraded accuracy for inter-datum reprojections

Cause: PROJ cannot reach its CDN to download datum shift grids, falling back to approximate transformations. Fix: set PROJ_NETWORK=ON in your environment, or pre-download grids with projsync. For disconnected environments, bundle the relevant .tif grid files. See Fixing EPSG Mismatches in rasterio.open for the full offline-PROJ setup.


Deep-Dive Articles