Handling Pixel Resolution and Scaling in Python
Pixel resolution governs the spatial fidelity of every geospatial dataset you touch — it controls analytical accuracy, storage size, and compute cost. In production remote sensing pipelines, resolution and scaling operations sit at the junction of data ingestion and analysis: you must parse affine metadata correctly, align grids before merging datasets, and convert raw digital numbers to physical units before any spatial interpolation. Getting the order of operations wrong silently corrupts downstream spectral indices, machine learning features, or time-series statistics. For the full architectural context of how this fits into raster data models, see Core Raster Fundamentals & STAC Mapping.
Prerequisites
Install the required libraries before starting:
pip install rasterio>=1.3 rioxarray>=0.15 xarray>=2023.6 numpy>=1.24
| Library | Min version | Why required |
|---|---|---|
rasterio |
1.3 | Affine transform parsing, reproject, calculate_default_transform |
rioxarray |
0.15 | CRS-aware xarray extension; open_rasterio with masked nodata |
xarray |
2023.6 | Labelled N-D arrays for multi-band and multi-temporal stacks |
numpy |
1.24 | Array arithmetic for radiometric scaling |
Conceptual prerequisites: you should be comfortable with how an affine geotransform maps pixel indices to ground coordinates (covered in Extracting and Parsing Raster Metadata), and you should understand how Cloud-Optimised GeoTIFF internal tiling affects block-aligned reads before you run chunked resampling over remote files.
How pixel resolution, GSD, and radiometric scaling relate
The diagram below shows the data flow from raw sensor storage through scaling and resampling to an analysis-ready output. The critical invariant is that radiometric scaling must precede spatial interpolation.
Ground Sampling Distance (GSD) is the physical ground area one pixel covers. The declared GSD in a file header can differ from the true optical GSD because of terrain displacement, off-nadir viewing, or processing decisions — always confirm against the product specification. Resampling algorithms must match the data type and analytical intent:
| Method | When to use | Risk if misapplied |
|---|---|---|
Resampling.nearest |
Categorical / label rasters (land-cover class, quality flag) | Blocky artifacts on continuous data |
Resampling.bilinear |
Upscaling continuous data (reflectance, elevation) | Blurs edges; not suitable for labels |
Resampling.cubic |
High-quality upscaling where edge sharpness matters | Higher compute; can overshoot at sharp boundaries |
Resampling.average |
Downscaling continuous data | Aliasing if downscale factor is large without prior smoothing |
Resampling.lanczos |
High-fidelity downscaling, final display outputs | Slowest; ringing near hard edges |
For a deeper comparison of these methods against Sentinel-2 imagery specifically, see Advanced Resampling and Upscaling Techniques.
Step-by-step workflow
Step 1 — Extract and validate affine transform metadata
Parse the source raster’s affine transform, CRS, dimensions, and embedded scaling metadata before touching the data. Validation catches non-orthogonal transforms (rotated or sheared grids) that will silently misalign pixels if left uncorrected.
import rasterio
from rasterio.transform import Affine
def validate_raster_metadata(src_path: str) -> dict:
"""
Open a raster and return validated geometry metadata.
Raises ValueError if the transform is non-orthogonal (rotation/shear present).
"""
with rasterio.open(src_path) as src:
transform: Affine = src.transform
crs = src.crs
# Affine components:
# a = pixel width (positive), e = pixel height (negative for north-up)
# b, d = off-diagonal rotation terms (should be ~0 for north-up rasters)
pixel_w = abs(transform.a)
pixel_h = abs(transform.e)
if not (abs(transform.b) < 1e-6 and abs(transform.d) < 1e-6):
raise ValueError(
f"Non-orthogonal transform detected (b={transform.b:.2e}, d={transform.d:.2e}). "
"Correct rotation or shear before resampling."
)
# Read embedded scale/offset if present (e.g. NetCDF conventions)
tags = src.tags()
scale = float(tags.get("scale_factor", 1.0))
offset = float(tags.get("add_offset", 0.0))
meta = {
"crs": crs,
"transform": transform,
"pixel_width_m": pixel_w,
"pixel_height_m": pixel_h,
"width": src.width,
"height": src.height,
"count": src.count,
"dtype": src.dtypes[0],
"nodata": src.nodata,
"scale_factor": scale,
"add_offset": offset,
}
print(
f"CRS: {crs.to_epsg()} | GSD: {pixel_w:.1f}m × {pixel_h:.1f}m | "
f"Shape: {src.height}×{src.width} | scale: {scale} offset: {offset}"
)
return meta
The b and d terms of the Affine matrix are the rotation/shear components. For standard north-up satellite products they are always near zero — if they are not, the raster has been delivered in a rotated grid (common for airborne SAR) and must be warped to a rectilinear grid first.
Step 2 — Align coordinate reference systems
Resolution adjustments fail silently when source and target datasets live in different projections. Even a shared EPSG code is insufficient if the origin point or pixel-boundary snapping convention differs between sensors. Before merging or resampling, reproject every input to a common CRS and snap pixel grids to a shared reference origin.
The full reprojection workflow — including handling of Proj network lookups and datum shift grids — is documented in Mastering CRS Transformations in Rasterio. The critical pattern when combining CRS alignment with resolution change is to derive the new transform from the original bounds rather than computing it manually, which avoids floating-point accumulation errors:
from rasterio.warp import calculate_default_transform
def compute_target_transform(
src_path: str,
target_crs: str,
target_gsd: float,
) -> tuple:
"""
Derive the affine transform for a reprojected + resampled output.
Returns (new_transform, new_width, new_height).
"""
with rasterio.open(src_path) as src:
new_transform, new_width, new_height = calculate_default_transform(
src_crs=src.crs,
dst_crs=target_crs,
width=src.width,
height=src.height,
left=src.bounds.left,
bottom=src.bounds.bottom,
right=src.bounds.right,
top=src.bounds.top,
resolution=target_gsd, # override output pixel size
)
return new_transform, new_width, new_height
Passing resolution=target_gsd to calculate_default_transform adjusts the output grid to your desired pixel size in the target CRS units (metres for UTM, degrees for geographic CRS).
Step 3 — Apply radiometric scaling before resampling
Scaling factors bridge raw 16-bit integer digital numbers (DN) and physical units such as surface reflectance (0–1) or top-of-atmosphere radiance (W·m⁻²·sr⁻¹·μm⁻¹). The formula is always:
physical_value = DN × scale_factor + add_offset
This step must come before spatial interpolation. When you resample raw integers first, the interpolation kernel averages quantised values, producing fractional intermediates that violate the sensor’s radiometric calibration model. Post-scaling those averaged DNs can introduce errors up to half a DN across every pixel boundary — enough to corrupt NDVI or water-index calculations for heterogeneous land surfaces.
import numpy as np
import rioxarray
def apply_radiometric_scaling(
file_path: str,
scale: float = 0.0001,
offset: float = 0.0,
valid_range: tuple[float, float] = (0.0, 1.0),
):
"""
Load a raster via rioxarray and convert raw DN to physical units.
masked=True converts nodata pixels to NaN so they propagate cleanly
through downstream xarray operations instead of corrupting statistics.
"""
da = rioxarray.open_rasterio(file_path, masked=True)
# Broadcasting works across any number of bands
da_physical = da * scale + offset
# Clip to the physically meaningful range (reflectance: 0–1)
da_physical = da_physical.clip(*valid_range)
return da_physical
Sentinel-2 L2A products store the quantification value (QUANTIFICATION_VALUE = 10000) and an optional RADIO_ADD_OFFSET (introduced in Processing Baseline 04.00) in scene-level XML metadata — always parse the actual metadata rather than hardcoding. Landsat Collection 2 Level-2 uses per-band REFLECTANCE_MULT_BAND_n and REFLECTANCE_ADD_BAND_n coefficients in the MTL file.
Step 4 — Resample to target resolution with memory-efficient chunking
Large rasters (continental-scale mosaics, full Sentinel-2 scenes) exceed available RAM if loaded entirely. The pattern below processes data band-by-band using rasterio.warp.reproject, which internally limits memory to a configurable working buffer size.
from rasterio.warp import reproject, Resampling
def resample_raster(
src_path: str,
dst_path: str,
target_crs: str,
target_gsd: float,
resampling: Resampling = Resampling.bilinear,
) -> None:
"""
Reproject and resample a raster to a new CRS and pixel size.
Data must already be radiometrically scaled (float32) before calling this.
"""
new_transform, new_width, new_height = compute_target_transform(
src_path, target_crs, target_gsd
)
with rasterio.open(src_path) as src:
profile = src.profile.copy()
profile.update(
crs=target_crs,
transform=new_transform,
width=new_width,
height=new_height,
dtype="float32",
driver="GTiff",
compress="deflate",
predictor=3, # optimal for float32 data
tiled=True,
blockxsize=256,
blockysize=256,
nodata=float("nan"),
)
with rasterio.open(dst_path, "w", **profile) as dst:
for band_idx in range(1, src.count + 1):
reproject(
source=rasterio.band(src, band_idx),
destination=rasterio.band(dst, band_idx),
src_transform=src.transform,
src_crs=src.crs,
dst_transform=new_transform,
dst_crs=target_crs,
resampling=resampling,
# warp_mem_limit controls peak RAM per band in MB
warp_mem_limit=512,
)
For datasets that still exceed memory even band-by-band (e.g. very large uncompressed source files), window-based tiled reads eliminate the issue. The companion page Optimizing Rasterio Window Reads for Memory Efficiency covers generator-based chunking that aligns reads to internal tile boundaries, maximising block cache hits and minimising disk I/O.
Step 5 — Write optimised GeoTIFF and validate output
Default GeoTIFF profiles produce uncompressed, strip-organised files that perform poorly in cloud storage and GIS tools. The profile in step 4 already applies the key options; the table below explains each choice:
| Profile key | Value | Rationale |
|---|---|---|
tiled |
True |
Enables spatial indexing and partial reads without scanning full rows |
compress |
deflate |
Lossless; better ratio than LZW for float data |
predictor |
3 |
Horizontal differencing for float32; 2 for integer data |
blockxsize / blockysize |
256 |
Matches cloud object storage transfer granularity |
dtype |
float32 |
Preserves physical unit precision after scaling |
nodata |
nan |
Propagates cleanly through xarray/numpy operations |
After writing, validate the output using rasterio profile assertions and extent checks:
def validate_output(dst_path: str, expected_crs_epsg: int, expected_gsd: float) -> None:
"""Assert the resampled file matches the intended geometry."""
with rasterio.open(dst_path) as dst:
actual_epsg = dst.crs.to_epsg()
actual_gsd_x = abs(dst.transform.a)
actual_gsd_y = abs(dst.transform.e)
assert actual_epsg == expected_crs_epsg, (
f"CRS mismatch: expected EPSG:{expected_crs_epsg}, got EPSG:{actual_epsg}"
)
# Allow 0.1 m tolerance for floating-point GSD rounding
assert abs(actual_gsd_x - expected_gsd) < 0.1, (
f"GSD mismatch: expected {expected_gsd}m, got {actual_gsd_x:.4f}m"
)
assert abs(actual_gsd_y - expected_gsd) < 0.1, (
f"GSD mismatch: expected {expected_gsd}m, got {actual_gsd_y:.4f}m"
)
print(
f"Validation passed — EPSG:{actual_epsg}, "
f"GSD: {actual_gsd_x:.2f}m × {actual_gsd_y:.2f}m, "
f"shape: {dst.height}×{dst.width}"
)
Parameter reference
Key arguments for rasterio.warp.reproject and calculate_default_transform:
| Parameter | Type | Default | Usage note |
|---|---|---|---|
resampling |
Resampling |
Resampling.nearest |
Match to data type — see table above |
warp_mem_limit |
int |
0 (unlimited) |
Set to available RAM in MB to prevent OOM on large files |
resolution |
float or (x, y) |
derived from input | Pass to calculate_default_transform to override output GSD |
num_threads |
int |
1 |
Increase for multi-core resampling; GDAL uses OpenMP internally |
target_aligned_pixels |
bool |
False |
Snap output grid to the target resolution origin; prevents sub-pixel drift when merging tiles |
Verification and testing
A resampled raster is correctly produced when all of the following hold:
- Bounds match:
rasterio.open(dst).boundsshould be within one pixel of the expected geographic extent. Userioxarray’s.rio.bounds()against a reference vector or the source extent. - Value range is physical: After scaling,
float32values must lie within the sensor’s stated physical range. For surface reflectance:assert da.values.max() <= 1.0 + 1e-4. - NaN distribution is consistent: Masked pixels must not have grown (bleed from bilinear interpolation across nodata boundaries) or shrunk (nodata values incorrectly filled). Compare cloud-mask pixel counts before and after.
- Affine passes the orthogonality check: Run
validate_raster_metadataon the output and confirmb ≈ 0, d ≈ 0. - Time-series alignment: When building multi-date stacks, confirm every layer has an identical transform with
numpy.allclose(layer1.transform, layer2.transform)— silent misalignment is the most common cause of spurious temporal change signals.
Wrap steps 1–4 in a pytest fixture that runs as part of your CI pipeline. A single failed assertion here prevents weeks of debugging corrupted downstream models.
Troubleshooting
CRSError: Input is not a CRS when passing target_crs as a string
Cause: rasterio.warp.reproject accepts rasterio.crs.CRS objects or "EPSG:XXXX" strings, but not bare integers or PROJ4 strings in all versions. Fix: use rasterio.crs.CRS.from_epsg(32633) or the canonical string "EPSG:32633".
Resampled output has extent that differs by one pixel row/column
Cause: floating-point rounding in manually computed width/height from src.width * scale_factor. Fix: always derive dimensions from calculate_default_transform with explicit bounds rather than computing int(src.width * scale_factor) — the former solves for a consistent grid, the latter accumulates rounding error.
MemoryError during reproject on a 10 GB raster
Cause: warp_mem_limit=0 (unlimited) causes GDAL to attempt to allocate the entire output in one pass. Fix: set warp_mem_limit=512 (MB) and process band-by-band. For extremely large inputs, switch to window-based reads as described in Optimizing Rasterio Window Reads for Memory Efficiency.
Bilinear resampling bleeds valid reflectance values into cloud/shadow pixels
Cause: Resampling.bilinear interpolates across the nodata boundary, giving masked pixels a fractional valid value. Fix: run resampling with masked=True in rioxarray, or apply the cloud mask as an explicit nodata region before resampling so the interpolation kernel has no valid neighbours to borrow from.
PROJ_NETWORK warning and degraded accuracy for inter-datum reprojections
Cause: PROJ cannot reach its CDN to download datum shift grids, falling back to approximate transformations. Fix: set PROJ_NETWORK=ON in your environment, or pre-download grids with projsync. For disconnected environments, bundle the relevant .tif grid files. See Fixing EPSG Mismatches in rasterio.open for the full offline-PROJ setup.
Related
- Extracting and Parsing Raster Metadata — reading affine transforms, nodata, and tag metadata from rasterio before resampling
- Mastering CRS Transformations in Rasterio — full reprojection workflow with datum shift handling
- Understanding Cloud-Optimised GeoTIFF Structure — how COG internal tiling affects block-aligned reads during resampling
- Advanced Resampling and Upscaling Techniques — algorithm comparison for Sentinel-2 super-resolution and sensor fusion
- Optimizing Rasterio Window Reads for Memory Efficiency — generator-based tiled reads for out-of-core resampling on large files