Seamless Mosaicking and Edge Blending in Python Raster Pipelines
When a study area spans multiple satellite acquisition footprints, naive concatenation produces visible seam lines wherever scenes overlap — radiometric offsets, slightly different acquisition angles, and boundary interpolation artifacts all contribute. True seamless mosaicking solves this by computing per-pixel blend weights from each scene’s distance to its own valid-data boundary, then computing a weighted average in every overlap zone. The result is a single raster where adjacent scenes fade into each other rather than snap. For the broader pipeline context, this workflow sits within Satellite Processing Workflows & Index Pipelines.
Prerequisites
Install the required libraries before running any of the code below.
pip install rasterio>=1.3 numpy>=1.24 scipy>=1.10 dask[array]>=2023.1
| Library | Min version | Why required |
|---|---|---|
rasterio |
1.3.0 | Geospatial I/O, window reads, merge primitives |
numpy |
1.24 | Array math, accumulator buffers |
scipy |
1.10 | distance_transform_edt for feather weights |
dask[array] |
2023.1 | Out-of-core chunked reads (optional) |
Conceptual prerequisites. All inputs must already be Level-2 surface reflectance or orthorectified products. Advanced Resampling and Upscaling Techniques explains how to align grids when scenes arrive at different spatial resolutions. Any cloud or shadow pixels should be flagged before blending — Cloud and Shadow Masking Strategies covers how to generate clean valid-data masks from Sentinel-2 SCL layers and Landsat QA_PIXEL bands. Use Automated Image Clipping and Cropping to trim inputs to a common extent with a buffer zone before passing them to the blending stage.
Step-by-step Workflow
Step 1 — Validate Metadata and Align the Output Grid
Parse the CRS, affine transform, dtype, and NoData value from every input file. Any mismatch in CRS or cell size must be resolved before computing weights — sub-pixel misalignment collapses the distance transform at scene edges and produces a jagged blended boundary.
import rasterio
from rasterio.crs import CRS
def validate_inputs(paths: list[str]) -> dict:
"""
Check that all inputs share CRS, resolution, band count, and dtype.
Returns a dict with the common profile keys, or raises ValueError.
"""
reference: dict | None = None
for path in paths:
with rasterio.open(path) as src:
meta = {
"crs": src.crs,
"res": src.res,
"count": src.count,
"dtype": src.dtypes[0],
"nodata": src.nodata,
}
if reference is None:
reference = meta
continue
if src.crs != reference["crs"]:
raise ValueError(
f"{path}: CRS {src.crs} does not match reference {reference['crs']}. "
"Reproject with rasterio.warp.reproject before mosaicking."
)
if src.res != reference["res"]:
raise ValueError(
f"{path}: resolution {src.res} != {reference['res']}. "
"Resample to a common grid first."
)
if src.dtypes[0] != reference["dtype"]:
raise ValueError(
f"{path}: dtype {src.dtypes[0]} != {reference['dtype']}."
)
return reference # type: ignore[return-value]
The validation gate is intentionally strict. Allowing mixed-resolution inputs silently degrades blend quality; it is better to raise early and reproject upstream.
Step 2 — Detect Overlap Zones and Build Distance-Weighted Masks
For each input, build a binary valid-data mask (1 = valid, 0 = NoData or cloud-masked), then apply scipy.ndimage.distance_transform_edt to assign each pixel a weight proportional to its distance from the nearest invalid pixel. Pixels deep inside a scene receive the highest weights; edge pixels taper to zero, producing the smooth feathered transition.
import numpy as np
from scipy.ndimage import distance_transform_edt
def compute_weight_mask(data: np.ndarray, nodata: float | None) -> np.ndarray:
"""
Given a 2-D or 3-D masked array (bands, rows, cols),
return a 2-D float32 weight array in [0, 1].
"""
if data.ndim == 3:
# All bands must be valid for the pixel to contribute
if np.ma.is_masked(data):
valid = (~np.ma.getmaskarray(data)).all(axis=0).astype(np.float32)
else:
if nodata is not None:
valid = (data != nodata).all(axis=0).astype(np.float32)
else:
valid = np.ones(data.shape[1:], dtype=np.float32)
else:
valid = (data != nodata).astype(np.float32) if nodata is not None else np.ones_like(data, dtype=np.float32)
edt = distance_transform_edt(valid)
max_dist = float(edt.max())
if max_dist == 0.0:
return valid # single-pixel or entirely invalid scene
return (edt / max_dist).astype(np.float32)
The normalization step maps the EDT distances to [0, 1]. Scenes with narrow overlap zones will have steeper weight gradients; wider overlaps produce gentler fades. If seam artifacts remain visible after blending, widen the input clip buffer so that overlap zones span at least 20 pixels on each side of the seam line.
Step 3 — Apply Radiometric Harmonization and Execute the Weighted Blend
Accumulate weighted_sum = Σ(V_i × W_i) and weight_sum = Σ(W_i) across all inputs, then divide once at the end. Keeping running totals in float32 prevents dtype overflow even when inputs are uint16.
import rasterio
import numpy as np
from rasterio.transform import from_bounds
from rasterio.windows import from_bounds as window_from_bounds, Window
from scipy.ndimage import distance_transform_edt
from pathlib import Path
def blend_mosaic(
input_paths: list[str],
output_path: str,
nodata: float | None = None,
) -> Path:
"""
Build a seamless blended mosaic from multiple overlapping rasters.
Assumes all inputs share CRS, pixel resolution, band count, and dtype.
Run validate_inputs() first to enforce these preconditions.
"""
if not input_paths:
raise ValueError("At least one input raster is required.")
# --- Establish output profile from the first file ---
with rasterio.open(input_paths[0]) as src:
profile = src.profile.copy()
dtype = profile["dtype"]
res_x, res_y = src.res
if nodata is None:
nodata = src.nodata
profile.update(
nodata=nodata,
compress="deflate",
tiled=True,
blockxsize=256,
blockysize=256,
bigtiff="IF_SAFER",
)
# --- Compute the union bounding box ---
all_bounds = []
for path in input_paths:
with rasterio.open(path) as src:
all_bounds.append(src.bounds)
union_left = min(b.left for b in all_bounds)
union_bottom = min(b.bottom for b in all_bounds)
union_right = max(b.right for b in all_bounds)
union_top = max(b.top for b in all_bounds)
out_width = int(round((union_right - union_left) / res_x))
out_height = int(round((union_top - union_bottom) / res_y))
transform = from_bounds(
union_left, union_bottom, union_right, union_top,
out_width, out_height
)
profile.update(transform=transform, width=out_width, height=out_height)
# --- Initialise float32 accumulators ---
n_bands = profile["count"]
weighted_sum = np.zeros((n_bands, out_height, out_width), dtype=np.float32)
weight_sum = np.zeros((n_bands, out_height, out_width), dtype=np.float32)
# --- Accumulate each scene's contribution ---
for path in input_paths:
with rasterio.open(path) as src:
win = window_from_bounds(
src.bounds.left, src.bounds.bottom,
src.bounds.right, src.bounds.top,
transform,
)
row_off = max(0, int(win.row_off))
col_off = max(0, int(win.col_off))
win_h = min(out_height - row_off, int(round(win.height)))
win_w = min(out_width - col_off, int(round(win.width)))
out_win = Window(col_off, row_off, win_w, win_h)
data = src.read(
masked=True,
out_shape=(src.count, win_h, win_w),
resampling=rasterio.enums.Resampling.bilinear,
)
# All-bands valid mask → distance transform → normalised weights
valid_2d = (~np.ma.getmaskarray(data)).all(axis=0).astype(np.float32)
edt = distance_transform_edt(valid_2d)
max_dist = float(edt.max())
weights = edt / max_dist if max_dist > 0 else valid_2d
for band_idx in range(n_bands):
band_data = data.data[band_idx].astype(np.float32)
weighted_sum[
band_idx,
row_off:row_off + win_h,
col_off:col_off + win_w,
] += band_data * valid_2d * weights
weight_sum[
band_idx,
row_off:row_off + win_h,
col_off:col_off + win_w,
] += weights
# --- Normalise and write ---
fill = float(nodata) if nodata is not None else 0.0
with np.errstate(divide="ignore", invalid="ignore"):
blended = np.where(
weight_sum > 0,
weighted_sum / weight_sum,
fill,
).astype(dtype)
with rasterio.open(output_path, "w", **profile) as dst:
dst.write(blended)
return Path(output_path)
Key implementation details:
- Windowed reads (
out_shaperesize) mean each scene is read at final resolution, so there is no intermediate reprojection step for scenes that share the grid. bigtiff="IF_SAFER"allows outputs larger than 4 GB without a separate flag decision at runtime.- Per-band weight accumulation preserves spectral independence; bands with different NoData patterns (e.g. a striped sensor band) blend correctly without cross-band contamination.
np.errstatesuppresses the division-by-zero warning for pixels outside all scenes’ extents.
Step 4 — Tune Key Parameters
The table below covers the arguments and internal constants that most affect blend quality and output file size.
| Parameter | Type | Default | Effect |
|---|---|---|---|
nodata |
float | None |
read from first file | Pixels equal to this value are treated as invalid; must be consistent across all inputs |
compress |
str |
"deflate" |
"zstd" is faster at equal ratio for large mosaics; "lzw" maximises compatibility |
blockxsize / blockysize |
int |
256 |
Larger tiles (512) reduce file overhead for full-extent reads; smaller tiles suit spatial subsetting |
Resampling.bilinear |
enum | — | Use nearest for categorical rasters (land cover, cloud mask) |
| EDT normalization denominator | float |
edt.max() |
Replacing max() with a fixed pixel distance (e.g. 50) caps the blend zone width and speeds computation |
Step 5 — Validate Output Quality
After writing, reopen the mosaic and run these checks before passing it downstream.
import numpy as np
import rasterio
def validate_mosaic(output_path: str, nodata: float | None) -> dict:
"""
Return a summary of quality metrics for a blended mosaic.
Raises AssertionError if critical checks fail.
"""
with rasterio.open(output_path) as dst:
profile = dst.profile
data = dst.read(masked=True)
results = {}
# 1. Check for unexpected NaN values
nan_count = int(np.isnan(data.data).sum())
results["nan_pixels"] = nan_count
assert nan_count == 0, f"Output contains {nan_count} NaN pixels — check float32 cast."
# 2. Check NoData coverage matches expectations
if nodata is not None:
nodata_fraction = float((data.data == nodata).any(axis=0).mean())
results["nodata_fraction"] = round(nodata_fraction, 4)
# 3. Verify dtype preserved
assert profile["dtype"] == data.dtype.name or np.issubdtype(
data.dtype, np.floating
), f"Unexpected dtype: {data.dtype}"
# 4. Check tiling and compression are present (COG-ready)
assert profile.get("tiled"), "Output is not tiled — rewrite with tiled=True."
assert profile.get("compress"), "No compression found in output profile."
results["crs"] = str(profile["crs"])
results["transform"] = str(profile["transform"])
results["shape"] = (profile["count"], profile["height"], profile["width"])
return results
A passing validation returns a results dict with nan_pixels: 0, nodata_fraction close to the expected border proportion, and tiled: True. Visual inspection using rasterio’s built-in plotting via show() or any GIS viewer should show no visible step change at scene boundaries.
Parameter Reference
| Argument (blend_mosaic) | Type | Default | Notes |
|---|---|---|---|
input_paths |
list[str] |
required | Ordered list of file paths; order does not affect output (unlike priority merge) |
output_path |
str |
required | Write path for the output GeoTIFF |
nodata |
float | None |
None (reads from first file) |
Override if inputs have inconsistent NoData flags |
| Internal constant | Where set | Effect |
|---|---|---|
compress="deflate" |
profile update | Switch to "zstd" for ~20 % faster writes on large extents |
blockxsize=256 |
profile update | Increase to 512 for area-extent reads, decrease to 128 for point queries |
bigtiff="IF_SAFER" |
profile update | Enables BigTIFF automatically if output exceeds 4 GB |
Resampling.bilinear |
windowed read | Use Resampling.nearest for mask or categorical inputs |
Verification and Testing
Expected output properties:
- File opens without error and
src.crsmatches input CRS. src.transformmatches the union bounding box computed from inputs.src.profile["tiled"]isTrueandsrc.profile["compress"]is set.- No NaN values in any band (
np.isnan(data).sum() == 0). - Overlap zones show smooth intensity gradients, not step edges, when visualised.
Minimal integration test:
import numpy as np
import rasterio
from rasterio.transform import from_origin
def make_test_scene(path: str, value: float, left: float) -> None:
transform = from_origin(left, 60.0, 0.0001, 0.0001)
data = np.full((1, 100, 100), value, dtype=np.float32)
with rasterio.open(
path, "w", driver="GTiff", height=100, width=100,
count=1, dtype="float32", crs="EPSG:4326",
transform=transform, nodata=-9999,
) as dst:
dst.write(data)
make_test_scene("/tmp/scene_a.tif", 3000.0, left=10.0)
make_test_scene("/tmp/scene_b.tif", 3200.0, left=10.005) # 50-pixel overlap
result = blend_mosaic(
["/tmp/scene_a.tif", "/tmp/scene_b.tif"],
"/tmp/mosaic_test.tif",
)
with rasterio.open(result) as dst:
blended = dst.read(1)
# Overlap zone (columns 50-100) should contain values between 3000 and 3200
assert blended[50, 60] > 3000 and blended[50, 60] < 3200, \
"Blend zone not interpolated — check weight mask logic."
assert not np.isnan(blended).any(), "NaN pixels found in output."
print("Integration test passed.")
Troubleshooting
Seam still visible after blending
Cause: Residual radiometric offset between scenes that feathering cannot mask. The blend smooths the transition zone but cannot change absolute reflectance values; if scene A is systematically 200 DN brighter than scene B, the centre of the blend zone will still show a gradient shift. Fix: Apply histogram matching or relative radiometric normalisation before blending. Alternatively, use spectral index calculation pipelines (e.g. NDVI) instead of raw reflectance mosaics — ratio-based indices are self-normalising and suppress sensor offset artifacts.
out_shape resize silently truncates edge pixels
Cause: Floating-point rounding in window_from_bounds can produce a window with win_h or win_w one pixel shorter than the true scene extent.
Fix: Add win_h = min(out_height - row_off, int(round(win.height)) + 1) and clamp to out_height. Alternatively, use rasterio.warp.reproject with an explicit destination array to eliminate window arithmetic entirely.
MemoryError on large extents
Cause: The weighted_sum accumulator allocates n_bands × out_height × out_width × 4 bytes up front. A continental mosaic at 10 m resolution can exceed available RAM.
Fix: Process the output in horizontal strips. Iterate over row offsets, read only the intersecting portion of each input into the accumulator strip, blend, and write the strip via dst.write(data, window=...). For distributed environments, wrap the strip loop in a Dask delayed graph and submit it to distributed workers via dask.distributed.Client.compute.
CRS.from_epsg raises PROJ_NETWORK errors
Cause: Grid-shift datum transformations need the PROJ CDN if the transformation is not cached locally.
Fix: Set PROJ_NETWORK=ON in the environment, or pre-download the required grids with projsync. For offline pipelines, choose source data that shares a native CRS (e.g. UTM zone-matched Sentinel-2 tiles) so no datum shift is required.
Output contains fill-value stripes at tile boundaries
Cause: The weight_sum accumulator remains zero in pixels outside all input extents, and the np.where fill path writes nodata there — correct behaviour. If stripes appear inside expected scene coverage, the row_off/col_off window offsets are misaligned.
Fix: Print (row_off, col_off, win_h, win_w) for each scene and verify against src.bounds. A single off-by-one offset shifts the scene contribution by one pixel, leaving a one-pixel-wide nodata stripe at the seam.
Related
- Removing Seams in Multi-Scene Mosaics with Feathering — detailed treatment of the distance-transform feathering algorithm and alternative gradient-based blending strategies.
- Cloud and Shadow Masking Strategies — generate clean valid-data masks from Sentinel-2 SCL and Landsat QA_PIXEL before feeding inputs to the blend stage.
- Advanced Resampling and Upscaling Techniques — align scenes at different resolutions to a common grid before mosaicking.
- Temporal Aggregation and Time-Series Analysis — blended mosaics serve as the per-period baseline when building consistent time-series stacks.
- Automated Image Clipping and Cropping — clip and buffer inputs to the target extent before passing them to the blend pipeline.