layout: article.njk

Implementing Fmask and s2cloudless in Python

Implementing Fmask and s2cloudless in Python requires combining a physics-based radiative transfer classifier with a machine-learning probabilistic detector to achieve production-grade accuracy for optical satellite imagery. The standard workflow reads Sentinel-2 or Landsat surface reflectance bands, computes pixel-level cloud probabilities using s2cloudless, ingests Fmask quality assurance (QA) bands, and merges both outputs into a single boolean mask using rasterio and numpy. This hybrid architecture compensates for s2cloudless’s occasional thin-cloud misses while leveraging Fmask’s robust thermal and geometric shadow detection, which is critical for time-series analysis and surface reflectance normalization.

Hybrid Architecture & Data Flow

Modern Cloud and Shadow Masking Strategies increasingly favor ensemble masking over single-model approaches. Fmask relies on physical thresholds (brightness temperature, NDVI, and spectral ratios) to isolate clouds and shadows, making it highly reliable for thick cloud cover and topographic shading. Conversely, s2cloudless uses a random forest classifier trained on Sentinel-2 top-of-atmosphere (TOA) reflectance, excelling at detecting semi-transparent cirrus and high-altitude aerosols that bypass thermal thresholds.

By merging both outputs with a logical OR operation, you retain the highest sensitivity across spectral domains. The pipeline typically follows this sequence:

  1. Preprocessing: Align and scale TOA/BOA bands to [0, 1] reflectance.
  2. ML Detection: Pass the band stack to S2PixelCloudDetector to generate a probability raster.
  3. Physics QA: Execute Fmask via CLI or load pre-generated QA bands.
  4. Fusion: Apply probability thresholds (typically ≥0.45) and decode Fmask categorical values, then union the masks.
  5. Export: Write the final boolean mask to a Cloud-Optimized GeoTIFF (COG) with matching geospatial metadata.

For teams building automated Satellite Processing Workflows & Index Pipelines, this hybrid step should run immediately after atmospheric correction and before spectral index calculation to prevent cloud-contaminated statistics from skewing downstream analytics.

Environment & Compatibility Matrix

Component Supported Versions Notes
Python 3.9–3.11 3.12+ requires GDAL wheel rebuilds or --no-binary flags
s2cloudless ≥1.5.0 Depends on scikit-learn, numpy, xarray
rasterio ≥1.3.0 Must align with GDAL 3.4+ for COG/STAC compliance
Fmask 4.2–5.3 (CLI) Official Python bindings are deprecated; use CLI → QA ingestion
Sentinel-2 L1C (TOA) & L2A (BOA) s2cloudless expects L1C; Fmask 5.x natively supports L2A
Landsat 8/9 (SR/TOA) Fmask optimized for TIRS thermal bands; adjust scaling factors

Critical deployment note: Fmask is distributed as a compiled C++/MATLAB toolchain. Production Python pipelines should invoke Fmask via subprocess, then parse the resulting *_fmask.tif QA band. Direct in-memory execution via ctypes or pybind11 is unstable outside conda-managed environments and breaks cross-platform reproducibility.

Production Implementation

The following script demonstrates a complete, memory-safe pipeline. It assumes Fmask has already been executed externally and that aligned Sentinel-2 L1C bands are available on disk.

import numpy as np
import rasterio
from rasterio.windows import Window
from s2cloudless import S2PixelCloudDetector
import os
from pathlib import Path

def load_and_stack_bands(band_paths: list[str]) -> tuple[np.ndarray, dict]:
    """Load Sentinel-2 bands into a normalized [0, 1] numpy stack."""
    with rasterio.open(band_paths[0]) as src:
        profile = src.profile.copy()
        height, width = src.height, src.width

    # Sentinel-2 L1C scaling factor
    SCALE_FACTOR = 10000.0
    stack = np.zeros((len(band_paths), height, width), dtype=np.float32)

    for i, path in enumerate(band_paths):
        with rasterio.open(path) as src:
            band = src.read(1, out_dtype=np.float32)
            stack[i] = np.clip(band / SCALE_FACTOR, 0.0, 1.0)

    return stack, profile

def decode_fmask_qa(qa_path: str, height: int, width: int) -> np.ndarray:
    """Load Fmask QA band and extract cloud/shadow pixels."""
    with rasterio.open(qa_path) as src:
        qa = src.read(1)
    
    # Fmask 4.x/5.x categorical encoding:
    # 0=clear, 1=water, 2=cloud_shadow, 3=snow/ice, 4=cloud
    # Adjust bit-masks if using bit-packed Fmask 5.x outputs
    fmask_cloud = (qa == 4)
    fmask_shadow = (qa == 2)
    return fmask_cloud | fmask_shadow

def generate_hybrid_mask(
    band_paths: list[str],
    fmask_qa_path: str,
    output_path: str,
    cloud_threshold: float = 0.45,
    chunk_size: int = 1024
) -> None:
    """Merge s2cloudless probabilities with Fmask QA into a boolean mask."""
    stack, profile = load_and_stack_bands(band_paths)
    height, width = stack.shape[1], stack.shape[2]

    # Initialize detector
    detector = S2PixelCloudDetector(threshold=cloud_threshold, all_bands=True)
    
    # Prepare output profile for boolean mask
    out_profile = profile.copy()
    out_profile.update(dtype="uint8", count=1, compress="deflate", nodata=None)

    # Process in chunks to avoid OOM on large scenes
    with rasterio.open(output_path, "w", **out_profile) as dst:
        for row in range(0, height, chunk_size):
            for col in range(0, width, chunk_size):
                win = Window(col, row, 
                             min(chunk_size, width - col), 
                             min(chunk_size, height - row))
                
                # 1. s2cloudless probability
                # s2cloudless expects shape (n_data, n_rows, n_cols, n_bands)
                chunk_stack = stack[:, win.row_off:win.row_off+win.height, 
                                       win.col_off:win.col_off+win.width]
                chunk_input = chunk_stack.transpose(1, 2, 0)[np.newaxis, ...]
                prob_maps = detector.get_cloud_probability_maps(chunk_input)  # shape: (1, H, W)
                ml_cloud = prob_maps[0] >= cloud_threshold

                # 2. Fmask QA (load full QA band, slice to match window)
                fmask_full = decode_fmask_qa(fmask_qa_path, height, width)
                fmask_chunk = fmask_full[win.row_off:win.row_off+win.height,
                                         win.col_off:win.col_off+win.width]

                # 3. Hybrid union
                hybrid_mask = ml_cloud | fmask_chunk
                dst.write(hybrid_mask.astype(np.uint8), 1, window=win)

if __name__ == "__main__":
    # Example usage
    BANDS = [
        "S2A_MSIL1C_..._B02.tif", "S2A_MSIL1C_..._B03.tif",
        "S2A_MSIL1C_..._B04.tif", "S2A_MSIL1C_..._B08.tif",
        "S2A_MSIL1C_..._B11.tif", "S2A_MSIL1C_..._B12.tif"
    ]
    FMASK_QA = "fmask_qa.tif"
    OUTPUT = "hybrid_cloud_shadow_mask.tif"
    
    generate_hybrid_mask(BANDS, FMASK_QA, OUTPUT)

Performance Optimization & Validation

Memory Management

Loading full-scene 10m/20m bands into RAM frequently triggers MemoryError on standard cloud instances. The chunked rasterio window approach above keeps peak memory under 4 GB for 100 km² scenes. For larger footprints, integrate dask.array or rioxarray to parallelize tile processing across CPU cores.

Threshold Calibration

The default cloud_threshold=0.45 in s2cloudless balances precision and recall for mid-latitude Sentinel-2 data. In tropical or high-aerosol regions, lower the threshold to 0.35 to capture thin haze, then validate against manual QA samples. Always cross-check against the official s2cloudless GitHub repository for model updates and band-order requirements.

Fmask QA Decoding

Fmask 5.x introduced bit-packed outputs for compact storage. If your QA band uses bitwise encoding, replace the categorical equality checks with bitwise operations:

# Bit-packed Fmask 5.x: bit 0=cloud, bit 1=shadow, bit 2=water, etc.
fmask_cloud = (qa & 0b00000001) > 0
fmask_shadow = (qa & 0b00000010) > 0

Refer to the Fmask algorithm reference for exact bit definitions per sensor version.

Validation Workflow

Before deploying to production, run a stratified random sample (n=500 pixels) against high-resolution reference imagery (e.g., PlanetScope or NAIP). Calculate precision, recall, and F1-score. A well-tuned hybrid mask typically achieves F1 ≥ 0.92 for clouds and F1 ≥ 0.88 for shadows across diverse biomes.