Optimizing rasterio window reads for memory efficiency

Read a large GeoTIFF in constant, predictable RAM by aligning rasterio.windows.Window to the file’s native block_shapes, passing out_dtype directly to src.read(), and operating on arrays in-place. The minimal pattern:

import rasterio
from rasterio.windows import Window
import numpy as np

with rasterio.open("sentinel2_B04.tif") as src:
    block_h, block_w = src.block_shapes[0]
    window = Window(0, 0, block_w, block_h)          # one native tile
    chunk = src.read(window=window, out_dtype=np.float32, masked=True)

That single call reads exactly one compressed tile, casts it to float32 inside GDAL, and tracks nodata without an extra allocation — your RAM footprint stays flat no matter how large the file is on disk.

Why this arises in remote sensing workflows

Multi-gigabyte satellite scenes, DEMs, and LiDAR-derived grids cannot be loaded into RAM in one call on a typical workstation. Yet the naive src.read() — no window, no dtype control — materialises the entire raster and silently promotes uint16 reflectance to float64, consuming up to eight times the raw pixel memory. This pattern is part of the Handling Pixel Resolution and Scaling workflow, which governs how spatial resolution choices propagate through every read, resample, and write operation in a pipeline. For the broader context of how raster storage is structured see Core Raster Fundamentals & STAC Mapping.

The problem is ubiquitous at the I/O boundary: processing scripts written to test on 100 MB tiles break silently at production when scenes are 4–12 GB. Block-aligned windowed reads are the correct architectural response.

Environment and setup

Package Minimum version Role on this page
rasterio 1.3 Windowed reads, block_shapes, masked reads
numpy 1.24 In-place array operations, dtype constants
psutil 5.9 Peak RSS monitoring (validation only)
pip install "rasterio>=1.3" "numpy>=1.24" "psutil>=5.9"

How block structure controls memory cost

Remote sensing formats — GeoTIFF, Cloud-Optimized GeoTIFF (COG) — store pixels in compressed tiles or strips rather than as one contiguous array. When you request a window that cuts across tile boundaries, GDAL must decompress every intersecting tile in full, extract the pixels that fall inside your window, and discard the rest. A window offset by a single pixel from the tile grid can force GDAL to decompress four tiles instead of one.

The diagram below shows a 4 × 3 tile grid. Two windows of identical pixel size have very different I/O costs depending on their alignment.

Block-aligned vs misaligned window reads Left panel: a read window perfectly aligned to a 2×2 tile region touches 4 tiles and decompresses only the data it needs. Right panel: a same-size window offset by half a tile touches 9 tiles, multiplying I/O and peak memory. Aligned window ← aligned window Tiles touched: 4 Decompressed: 4 ✓ Minimal I/O Misaligned window ← misaligned window Tiles touched: 9 Decompressed: 9 ✗ 2.25× excess I/O

Rasterio exposes the native tile dimensions through src.block_shapes, a list of (height, width) tuples — one per band. Aligning your chunk size to these values keeps GDAL operating at maximum efficiency.

Complete working example

The function below iterates a raster in block-aligned chunks, reads each one with out_dtype and masked=True, and processes in-place. It works on local GeoTIFFs and on COGs read over HTTP with equal efficiency.

import rasterio
from rasterio.windows import Window
import numpy as np
from collections.abc import Generator


def iter_windows_aligned(
    src_path: str,
    chunk_px: int = 2048,
    out_dtype: np.dtype = np.float32,
) -> Generator[tuple[np.ma.MaskedArray, Window], None, None]:
    """
    Yield (chunk, window) pairs covering the full raster.

    Windows are snapped to the dataset's native block grid so GDAL
    decompresses only the tiles it actually needs.  out_dtype is
    applied at the I/O layer — no intermediate array is ever allocated.
    """
    with rasterio.open(src_path) as src:
        # block_shapes is a list of (rows, cols) tuples, one per band.
        # Bands in a well-formed GeoTIFF share the same block shape.
        block_h, block_w = src.block_shapes[0]

        # Round chunk_px up to the nearest block multiple so windows
        # land on exact tile boundaries regardless of chunk_px choice.
        win_h = max(block_h, (chunk_px // block_h) * block_h)
        win_w = max(block_w, (chunk_px // block_w) * block_w)

        height, width = src.height, src.width

        for row_off in range(0, height, win_h):
            for col_off in range(0, width, win_w):
                # Clamp to raster extents — the last row/column of
                # windows is almost always smaller than win_h / win_w.
                h = min(win_h, height - row_off)
                w = min(win_w, width - col_off)

                window = Window(col_off, row_off, w, h)

                # out_dtype → GDAL casts during decompression (zero extra alloc).
                # masked=True → MaskedArray tracks nodata without a separate bool array.
                chunk: np.ma.MaskedArray = src.read(
                    window=window,
                    out_dtype=out_dtype,
                    masked=True,
                )

                # In-place operation: no new array allocated.
                # Replace fill_value with 0 so downstream math is safe.
                np.ma.fix_invalid(chunk, copy=False, fill_value=0)

                yield chunk, window


# --- Usage example ---
for chunk, win in iter_windows_aligned("dem_10m.tif", chunk_px=4096):
    # chunk shape: (bands, win_height, win_width)
    mean_elevation = float(chunk.mean())
    print(f"Window {win}: mean elevation = {mean_elevation:.1f} m")

Key decisions explained

  • win_h = max(block_h, (chunk_px // block_h) * block_h) — rounds chunk_px down to a block multiple, ensuring the window origin always falls on a tile boundary. Without this, a chunk_px=2000 against 256-px tiles produces misaligned offsets from the second iteration onwards.
  • out_dtype in src.read() — forces GDAL to cast during decompression. Calling .astype(np.float32) after the read allocates the full uint16 array first and then a second float32 copy; out_dtype avoids the first allocation entirely.
  • masked=True — returns numpy.ma.MaskedArray. This encodes nodata without allocating a separate boolean array of identical shape, and it propagates correctly through NumPy reductions.
  • np.ma.fix_invalid(chunk, copy=False) — normalises masked values in-place before arithmetic. Operations such as chunk * 0.0001 on unmasked fill values can inject extreme numbers into statistics.

Variant patterns

Variant 1 — Dtype promotion guard for spectral index pipelines

When computing a ratio such as NDVI that naturally produces float64 intermediate values, keep float32 throughout by assigning with the out parameter:

import rasterio
from rasterio.windows import Window
import numpy as np

with rasterio.open("sentinel2_8band.tif") as src:
    block_h, block_w = src.block_shapes[0]
    window = Window(0, 0, block_w * 4, block_h * 4)

    # Read NIR (band 4) and Red (band 3) as float32
    data = src.read(
        indexes=[4, 3],
        window=window,
        out_dtype=np.float32,
        masked=True,
    )
    nir, red = data[0], data[1]

    # Allocate output once; reuse in every iteration
    ndvi = np.empty_like(nir)
    np.divide(nir - red, nir + red + 1e-8, out=ndvi)
    # ndvi stays float32 — no silent float64 promotion

This pattern feeds directly into spectral index calculation pipelines where band-math chains can otherwise silently widen to float64 at the first arithmetic operator.

Variant 2 — Overlapping windows for neighbourhood filters

Convolution, gradient, and morphological filters need surrounding context at chunk edges to avoid seam artifacts. Expand each window by a pad, clamp to bounds, then slice back to the valid region after reading:

import rasterio
from rasterio.windows import Window
import numpy as np

PAD = 32  # pixel overlap on each side

with rasterio.open("slope_raster.tif") as src:
    block_h, block_w = src.block_shapes[0]
    height, width = src.height, src.width
    win_h, win_w = block_h * 8, block_w * 8

    for row_off in range(0, height, win_h):
        for col_off in range(0, width, win_w):
            # Valid (unpadded) extents
            h = min(win_h, height - row_off)
            w = min(win_w, width - col_off)

            # Padded extents — clamped to raster bounds
            p_col = max(0, col_off - PAD)
            p_row = max(0, row_off - PAD)
            p_col_end = min(width,  col_off + w + PAD)
            p_row_end = min(height, row_off + h + PAD)

            padded_window = Window(
                p_col, p_row,
                p_col_end - p_col,
                p_row_end - p_row,
            )
            chunk = src.read(window=padded_window, out_dtype=np.float32, masked=True)

            # Slice back to the valid region (exclude padding)
            row_slice = slice(row_off - p_row, row_off - p_row + h)
            col_slice = slice(col_off - p_col, col_off - p_col + w)
            valid = chunk[:, row_slice, col_slice]

            # Apply filter to `chunk`, extract results from `valid`
            _ = valid  # placeholder for downstream filter

Variant 3 — COG over HTTP (no local download)

Cloud-Optimized GeoTIFFs store tiles at known byte offsets. Block-aligned windows map directly to HTTP range requests, so the windowed-read pattern works unchanged over HTTPS or S3 — GDAL’s VSICURL driver fetches only the required byte ranges:

import rasterio
from rasterio.windows import Window
import numpy as np

COG_URL = "https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/36/N/YF/2023/7/S2A_36NYF_20230715_0_L2A/B04.tif"

with rasterio.open(COG_URL) as src:
    block_h, block_w = src.block_shapes[0]  # typically 512×512 for Sentinel-2 COGs
    window = Window(0, 0, block_w, block_h)  # fetch exactly one HTTP range
    chunk = src.read(window=window, out_dtype=np.float32, masked=True)
    print(f"Block shape: {block_h}×{block_w}  |  Chunk shape: {chunk.shape}")

STAC items discovered via pystac-client carry asset HREFs that can be passed directly here — no intermediate download step required.

Common errors

AttributeError: 'NoneType' object has no attribute '__iter__' when unpacking block_shapes Cause: the file has no internal tiling — strips are used instead, and block_shapes[0] is (1, width). Fix: guard with block_h, block_w = src.block_shapes[0]; block_w = block_w if block_w < src.width else 512 and set a sensible fallback chunk width.

Silent float64 promotion — memory use doubles mid-pipeline Cause: a scalar operation such as chunk * 0.0001 (Python float is float64) widens the array. Fix: use np.float32(0.0001) as the scalar, or pass out=chunk to keep the result in the same dtype buffer.

rasterio.errors.WindowError: window extends beyond the dataset bounds Cause: the clamping logic is wrong — usually min(win_w, width - col_off) was written as min(win_w, width). Fix: always subtract the current offset when clamping window height and width.