Seamless Mosaicking and Edge Blending in Python Raster Pipelines

When a study area spans multiple satellite acquisition footprints, naive concatenation produces visible seam lines wherever scenes overlap — radiometric offsets, slightly different acquisition angles, and boundary interpolation artifacts all contribute. True seamless mosaicking solves this by computing per-pixel blend weights from each scene’s distance to its own valid-data boundary, then computing a weighted average in every overlap zone. The result is a single raster where adjacent scenes fade into each other rather than snap. For the broader pipeline context, this workflow sits within Satellite Processing Workflows & Index Pipelines.


Seamless mosaicking pipeline stages Five pipeline stages shown as labelled boxes connected by arrows: Scene Inputs, Grid Alignment, Overlap + Weight Masks, Weighted Blend, Output GeoTIFF. Scene Inputs (L2 surface refl.) Grid Alignment CRS · res · dtype Overlap Detection + Distance-Weight Masks Weighted Blend Σ(V·W) / Σ(W) Output GeoTIFF atmos. corrected reproject if needed EDT per valid mask float32 accum. COG-ready

Prerequisites

Install the required libraries before running any of the code below.

pip install rasterio>=1.3 numpy>=1.24 scipy>=1.10 dask[array]>=2023.1
Library Min version Why required
rasterio 1.3.0 Geospatial I/O, window reads, merge primitives
numpy 1.24 Array math, accumulator buffers
scipy 1.10 distance_transform_edt for feather weights
dask[array] 2023.1 Out-of-core chunked reads (optional)

Conceptual prerequisites. All inputs must already be Level-2 surface reflectance or orthorectified products. Advanced Resampling and Upscaling Techniques explains how to align grids when scenes arrive at different spatial resolutions. Any cloud or shadow pixels should be flagged before blending — Cloud and Shadow Masking Strategies covers how to generate clean valid-data masks from Sentinel-2 SCL layers and Landsat QA_PIXEL bands. Use Automated Image Clipping and Cropping to trim inputs to a common extent with a buffer zone before passing them to the blending stage.


Step-by-step Workflow

Step 1 — Validate Metadata and Align the Output Grid

Parse the CRS, affine transform, dtype, and NoData value from every input file. Any mismatch in CRS or cell size must be resolved before computing weights — sub-pixel misalignment collapses the distance transform at scene edges and produces a jagged blended boundary.

import rasterio
from rasterio.crs import CRS

def validate_inputs(paths: list[str]) -> dict:
    """
    Check that all inputs share CRS, resolution, band count, and dtype.
    Returns a dict with the common profile keys, or raises ValueError.
    """
    reference: dict | None = None
    for path in paths:
        with rasterio.open(path) as src:
            meta = {
                "crs": src.crs,
                "res": src.res,
                "count": src.count,
                "dtype": src.dtypes[0],
                "nodata": src.nodata,
            }
            if reference is None:
                reference = meta
                continue
            if src.crs != reference["crs"]:
                raise ValueError(
                    f"{path}: CRS {src.crs} does not match reference {reference['crs']}. "
                    "Reproject with rasterio.warp.reproject before mosaicking."
                )
            if src.res != reference["res"]:
                raise ValueError(
                    f"{path}: resolution {src.res} != {reference['res']}. "
                    "Resample to a common grid first."
                )
            if src.dtypes[0] != reference["dtype"]:
                raise ValueError(
                    f"{path}: dtype {src.dtypes[0]} != {reference['dtype']}."
                )
    return reference  # type: ignore[return-value]

The validation gate is intentionally strict. Allowing mixed-resolution inputs silently degrades blend quality; it is better to raise early and reproject upstream.


Step 2 — Detect Overlap Zones and Build Distance-Weighted Masks

For each input, build a binary valid-data mask (1 = valid, 0 = NoData or cloud-masked), then apply scipy.ndimage.distance_transform_edt to assign each pixel a weight proportional to its distance from the nearest invalid pixel. Pixels deep inside a scene receive the highest weights; edge pixels taper to zero, producing the smooth feathered transition.

import numpy as np
from scipy.ndimage import distance_transform_edt

def compute_weight_mask(data: np.ndarray, nodata: float | None) -> np.ndarray:
    """
    Given a 2-D or 3-D masked array (bands, rows, cols),
    return a 2-D float32 weight array in [0, 1].
    """
    if data.ndim == 3:
        # All bands must be valid for the pixel to contribute
        if np.ma.is_masked(data):
            valid = (~np.ma.getmaskarray(data)).all(axis=0).astype(np.float32)
        else:
            if nodata is not None:
                valid = (data != nodata).all(axis=0).astype(np.float32)
            else:
                valid = np.ones(data.shape[1:], dtype=np.float32)
    else:
        valid = (data != nodata).astype(np.float32) if nodata is not None else np.ones_like(data, dtype=np.float32)

    edt = distance_transform_edt(valid)
    max_dist = float(edt.max())
    if max_dist == 0.0:
        return valid  # single-pixel or entirely invalid scene
    return (edt / max_dist).astype(np.float32)

The normalization step maps the EDT distances to [0, 1]. Scenes with narrow overlap zones will have steeper weight gradients; wider overlaps produce gentler fades. If seam artifacts remain visible after blending, widen the input clip buffer so that overlap zones span at least 20 pixels on each side of the seam line.


Step 3 — Apply Radiometric Harmonization and Execute the Weighted Blend

Accumulate weighted_sum = Σ(V_i × W_i) and weight_sum = Σ(W_i) across all inputs, then divide once at the end. Keeping running totals in float32 prevents dtype overflow even when inputs are uint16.

import rasterio
import numpy as np
from rasterio.transform import from_bounds
from rasterio.windows import from_bounds as window_from_bounds, Window
from scipy.ndimage import distance_transform_edt
from pathlib import Path


def blend_mosaic(
    input_paths: list[str],
    output_path: str,
    nodata: float | None = None,
) -> Path:
    """
    Build a seamless blended mosaic from multiple overlapping rasters.

    Assumes all inputs share CRS, pixel resolution, band count, and dtype.
    Run validate_inputs() first to enforce these preconditions.
    """
    if not input_paths:
        raise ValueError("At least one input raster is required.")

    # --- Establish output profile from the first file ---
    with rasterio.open(input_paths[0]) as src:
        profile = src.profile.copy()
        dtype = profile["dtype"]
        res_x, res_y = src.res
        if nodata is None:
            nodata = src.nodata
        profile.update(
            nodata=nodata,
            compress="deflate",
            tiled=True,
            blockxsize=256,
            blockysize=256,
            bigtiff="IF_SAFER",
        )

    # --- Compute the union bounding box ---
    all_bounds = []
    for path in input_paths:
        with rasterio.open(path) as src:
            all_bounds.append(src.bounds)

    union_left   = min(b.left   for b in all_bounds)
    union_bottom = min(b.bottom for b in all_bounds)
    union_right  = max(b.right  for b in all_bounds)
    union_top    = max(b.top    for b in all_bounds)

    out_width  = int(round((union_right - union_left) / res_x))
    out_height = int(round((union_top   - union_bottom) / res_y))
    transform  = from_bounds(
        union_left, union_bottom, union_right, union_top,
        out_width, out_height
    )
    profile.update(transform=transform, width=out_width, height=out_height)

    # --- Initialise float32 accumulators ---
    n_bands      = profile["count"]
    weighted_sum = np.zeros((n_bands, out_height, out_width), dtype=np.float32)
    weight_sum   = np.zeros((n_bands, out_height, out_width), dtype=np.float32)

    # --- Accumulate each scene's contribution ---
    for path in input_paths:
        with rasterio.open(path) as src:
            win = window_from_bounds(
                src.bounds.left, src.bounds.bottom,
                src.bounds.right, src.bounds.top,
                transform,
            )
            row_off = max(0, int(win.row_off))
            col_off = max(0, int(win.col_off))
            win_h   = min(out_height - row_off, int(round(win.height)))
            win_w   = min(out_width  - col_off, int(round(win.width)))
            out_win = Window(col_off, row_off, win_w, win_h)

            data = src.read(
                masked=True,
                out_shape=(src.count, win_h, win_w),
                resampling=rasterio.enums.Resampling.bilinear,
            )

            # All-bands valid mask → distance transform → normalised weights
            valid_2d = (~np.ma.getmaskarray(data)).all(axis=0).astype(np.float32)
            edt      = distance_transform_edt(valid_2d)
            max_dist = float(edt.max())
            weights  = edt / max_dist if max_dist > 0 else valid_2d

            for band_idx in range(n_bands):
                band_data = data.data[band_idx].astype(np.float32)
                weighted_sum[
                    band_idx,
                    row_off:row_off + win_h,
                    col_off:col_off + win_w,
                ] += band_data * valid_2d * weights
                weight_sum[
                    band_idx,
                    row_off:row_off + win_h,
                    col_off:col_off + win_w,
                ] += weights

    # --- Normalise and write ---
    fill = float(nodata) if nodata is not None else 0.0
    with np.errstate(divide="ignore", invalid="ignore"):
        blended = np.where(
            weight_sum > 0,
            weighted_sum / weight_sum,
            fill,
        ).astype(dtype)

    with rasterio.open(output_path, "w", **profile) as dst:
        dst.write(blended)

    return Path(output_path)

Key implementation details:

  • Windowed reads (out_shape resize) mean each scene is read at final resolution, so there is no intermediate reprojection step for scenes that share the grid.
  • bigtiff="IF_SAFER" allows outputs larger than 4 GB without a separate flag decision at runtime.
  • Per-band weight accumulation preserves spectral independence; bands with different NoData patterns (e.g. a striped sensor band) blend correctly without cross-band contamination.
  • np.errstate suppresses the division-by-zero warning for pixels outside all scenes’ extents.

Step 4 — Tune Key Parameters

The table below covers the arguments and internal constants that most affect blend quality and output file size.

Parameter Type Default Effect
nodata float | None read from first file Pixels equal to this value are treated as invalid; must be consistent across all inputs
compress str "deflate" "zstd" is faster at equal ratio for large mosaics; "lzw" maximises compatibility
blockxsize / blockysize int 256 Larger tiles (512) reduce file overhead for full-extent reads; smaller tiles suit spatial subsetting
Resampling.bilinear enum Use nearest for categorical rasters (land cover, cloud mask)
EDT normalization denominator float edt.max() Replacing max() with a fixed pixel distance (e.g. 50) caps the blend zone width and speeds computation

Step 5 — Validate Output Quality

After writing, reopen the mosaic and run these checks before passing it downstream.

import numpy as np
import rasterio

def validate_mosaic(output_path: str, nodata: float | None) -> dict:
    """
    Return a summary of quality metrics for a blended mosaic.
    Raises AssertionError if critical checks fail.
    """
    with rasterio.open(output_path) as dst:
        profile = dst.profile
        data    = dst.read(masked=True)

    results = {}

    # 1. Check for unexpected NaN values
    nan_count = int(np.isnan(data.data).sum())
    results["nan_pixels"] = nan_count
    assert nan_count == 0, f"Output contains {nan_count} NaN pixels — check float32 cast."

    # 2. Check NoData coverage matches expectations
    if nodata is not None:
        nodata_fraction = float((data.data == nodata).any(axis=0).mean())
        results["nodata_fraction"] = round(nodata_fraction, 4)

    # 3. Verify dtype preserved
    assert profile["dtype"] == data.dtype.name or np.issubdtype(
        data.dtype, np.floating
    ), f"Unexpected dtype: {data.dtype}"

    # 4. Check tiling and compression are present (COG-ready)
    assert profile.get("tiled"), "Output is not tiled — rewrite with tiled=True."
    assert profile.get("compress"), "No compression found in output profile."

    results["crs"]       = str(profile["crs"])
    results["transform"] = str(profile["transform"])
    results["shape"]     = (profile["count"], profile["height"], profile["width"])
    return results

A passing validation returns a results dict with nan_pixels: 0, nodata_fraction close to the expected border proportion, and tiled: True. Visual inspection using rasterio’s built-in plotting via show() or any GIS viewer should show no visible step change at scene boundaries.


Parameter Reference

Argument (blend_mosaic) Type Default Notes
input_paths list[str] required Ordered list of file paths; order does not affect output (unlike priority merge)
output_path str required Write path for the output GeoTIFF
nodata float | None None (reads from first file) Override if inputs have inconsistent NoData flags
Internal constant Where set Effect
compress="deflate" profile update Switch to "zstd" for ~20 % faster writes on large extents
blockxsize=256 profile update Increase to 512 for area-extent reads, decrease to 128 for point queries
bigtiff="IF_SAFER" profile update Enables BigTIFF automatically if output exceeds 4 GB
Resampling.bilinear windowed read Use Resampling.nearest for mask or categorical inputs

Verification and Testing

Expected output properties:

  • File opens without error and src.crs matches input CRS.
  • src.transform matches the union bounding box computed from inputs.
  • src.profile["tiled"] is True and src.profile["compress"] is set.
  • No NaN values in any band (np.isnan(data).sum() == 0).
  • Overlap zones show smooth intensity gradients, not step edges, when visualised.

Minimal integration test:

import numpy as np
import rasterio
from rasterio.transform import from_origin

def make_test_scene(path: str, value: float, left: float) -> None:
    transform = from_origin(left, 60.0, 0.0001, 0.0001)
    data = np.full((1, 100, 100), value, dtype=np.float32)
    with rasterio.open(
        path, "w", driver="GTiff", height=100, width=100,
        count=1, dtype="float32", crs="EPSG:4326",
        transform=transform, nodata=-9999,
    ) as dst:
        dst.write(data)

make_test_scene("/tmp/scene_a.tif", 3000.0, left=10.0)
make_test_scene("/tmp/scene_b.tif", 3200.0, left=10.005)  # 50-pixel overlap

result = blend_mosaic(
    ["/tmp/scene_a.tif", "/tmp/scene_b.tif"],
    "/tmp/mosaic_test.tif",
)

with rasterio.open(result) as dst:
    blended = dst.read(1)
    # Overlap zone (columns 50-100) should contain values between 3000 and 3200
    assert blended[50, 60] > 3000 and blended[50, 60] < 3200, \
        "Blend zone not interpolated — check weight mask logic."
    assert not np.isnan(blended).any(), "NaN pixels found in output."

print("Integration test passed.")

Troubleshooting

Seam still visible after blending

Cause: Residual radiometric offset between scenes that feathering cannot mask. The blend smooths the transition zone but cannot change absolute reflectance values; if scene A is systematically 200 DN brighter than scene B, the centre of the blend zone will still show a gradient shift. Fix: Apply histogram matching or relative radiometric normalisation before blending. Alternatively, use spectral index calculation pipelines (e.g. NDVI) instead of raw reflectance mosaics — ratio-based indices are self-normalising and suppress sensor offset artifacts.

out_shape resize silently truncates edge pixels

Cause: Floating-point rounding in window_from_bounds can produce a window with win_h or win_w one pixel shorter than the true scene extent. Fix: Add win_h = min(out_height - row_off, int(round(win.height)) + 1) and clamp to out_height. Alternatively, use rasterio.warp.reproject with an explicit destination array to eliminate window arithmetic entirely.

MemoryError on large extents

Cause: The weighted_sum accumulator allocates n_bands × out_height × out_width × 4 bytes up front. A continental mosaic at 10 m resolution can exceed available RAM. Fix: Process the output in horizontal strips. Iterate over row offsets, read only the intersecting portion of each input into the accumulator strip, blend, and write the strip via dst.write(data, window=...). For distributed environments, wrap the strip loop in a Dask delayed graph and submit it to distributed workers via dask.distributed.Client.compute.

CRS.from_epsg raises PROJ_NETWORK errors

Cause: Grid-shift datum transformations need the PROJ CDN if the transformation is not cached locally. Fix: Set PROJ_NETWORK=ON in the environment, or pre-download the required grids with projsync. For offline pipelines, choose source data that shares a native CRS (e.g. UTM zone-matched Sentinel-2 tiles) so no datum shift is required.

Output contains fill-value stripes at tile boundaries

Cause: The weight_sum accumulator remains zero in pixels outside all input extents, and the np.where fill path writes nodata there — correct behaviour. If stripes appear inside expected scene coverage, the row_off/col_off window offsets are misaligned. Fix: Print (row_off, col_off, win_h, win_w) for each scene and verify against src.bounds. A single off-by-one offset shifts the scene contribution by one pixel, leaving a one-pixel-wide nodata stripe at the seam.


Deep-Dive Articles