layout: article.njk

Using pystac-client to filter Sentinel-2 imagery by date

To filter Sentinel-2 imagery by date using pystac-client, instantiate a Client object pointing to a public STAC API endpoint, pass a datetime parameter formatted as an RFC 3339 interval (YYYY-MM-DD/YYYY-MM-DD), and specify the target collection ID (sentinel-2-l2a for surface reflectance or sentinel-2-l1c for top-of-atmosphere). The library automatically handles API pagination, returns an ItemSearch iterator, and yields Item objects ready for metadata inspection or direct ingestion into raster I/O pipelines.

This workflow is the standard approach when Querying STAC Catalogs Programmatically for time-series analysis, change detection, or cloud-free compositing.

Environment & Setup

Component Requirement Notes
Python ≥3.9 pystac-client drops support for EOL versions. Use 3.10+ for production.
pystac-client ≥0.7.0 Aligns with STAC API v1.0.0. Includes improved pagination and max_items support.
STAC Catalog v1.0.0 compliant Microsoft Planetary Computer, AWS Earth, or ESA DIAS endpoints work out of the box.

Install dependencies via pip:

pip install pystac-client>=0.7.0

Complete Working Example

The following function queries the Microsoft Planetary Computer STAC API, applies a temporal filter, optionally constrains by bounding box, and returns a list of validated Item objects.

import pystac
import pystac_client
import logging

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")

def fetch_sentinel2_by_date(
    start_date: str, 
    end_date: str, 
    bbox: list[float] | None = None,
    max_cloud_cover: float = 20.0,
    max_items: int = 100
) -> list[pystac.Item]:
    """
    Query a public STAC API for Sentinel-2 L2A imagery within a date range.
    Filters by cloud cover and returns a list of STAC Items.
    """
    stac_url = "https://planetarycomputer.microsoft.com/api/stac/v1"
    
    try:
        client = pystac_client.Client.open(stac_url)
    except Exception as e:
        raise ConnectionError(f"Failed to connect to STAC API: {e}") from e

    # RFC 3339 interval format required by STAC API spec
    dt_range = f"{start_date}/{end_date}"

    search_params = {
        "collections": ["sentinel-2-l2a"],
        "datetime": dt_range,
        "query": {"eo:cloud_cover": {"lt": max_cloud_cover}},
        "limit": 50  # Page size; max_items caps total results
    }

    if bbox:
        if len(bbox) != 4:
            raise ValueError("bbox must contain exactly 4 values: [min_lon, min_lat, max_lon, max_lat]")
        search_params["bbox"] = bbox

    try:
        search = client.search(**search_params)
        # max_items safely caps the iterator to prevent memory exhaustion
        items = list(search.items(max_items=max_items))
        logging.info(f"Retrieved {len(items)} Sentinel-2 items for {dt_range}")
        return items
    except Exception as e:
        logging.error(f"Search failed: {e}")
        return []

if __name__ == "__main__":
    results = fetch_sentinel2_by_date(
        start_date="2023-06-01",
        end_date="2023-06-30",
        bbox=[-122.5, 37.6, -122.0, 37.9],
        max_cloud_cover=15.0,
        max_items=50
    )

    if results:
        first = results[0]
        print(f"Item ID: {first.id}")
        print(f"Acquisition: {first.properties['datetime']}")
        print(f"Cloud Cover: {first.properties['eo:cloud_cover']}%")
        print(f"Red Band: {first.assets['red'].href}")

Parameter Breakdown & STAC Compliance

1. Temporal Filtering (datetime)

The STAC API requires RFC 3339-compliant intervals. Use YYYY-MM-DD/YYYY-MM-DD for closed ranges, YYYY-MM-DD/.. for open-ended future queries, or ../YYYY-MM-DD for historical lookbacks. The API interprets these as inclusive boundaries.

2. Collection Targeting

Sentinel-2 is typically split into two collections:

  • sentinel-2-l2a: Bottom-of-atmosphere surface reflectance (recommended for analysis)
  • sentinel-2-l1c: Top-of-atmosphere radiance (useful for atmospheric correction pipelines)

3. Cloud Cover Filtering

Raw temporal queries return all scenes, including heavily obscured ones. The query parameter leverages the eo:cloud_cover extension to filter server-side, drastically reducing payload size. This is documented in the official STAC API specification.

4. Pagination & Memory Management

pystac-client automatically follows next links across paginated API responses. Without a cap, large spatiotemporal queries can exhaust local memory. Always pass max_items to the .items() iterator or slice results explicitly.

Production Best Practices

  • Validate Bounding Boxes: Ensure coordinates follow [min_lon, min_lat, max_lon, max_lat] order. Crossing the antimeridian requires specialized handling or polygon queries.
  • Use Signed URLs: When downloading assets from cloud providers, leverage the catalog’s signing endpoint (e.g., Planetary Computer’s pystac_client.Client.get_signing_url()) to avoid 403 errors on private buckets.
  • Defer Asset Loading: STAC Item objects are lightweight metadata containers. Only resolve .assets["band"].href when ready to stream data into rasterio, xarray, or odc-stac.
  • Handle Missing Assets: Not all Sentinel-2 tiles contain every band at every resolution. Check item.assets.get("band_name") before accessing to avoid KeyError.

Integrating with Raster Workflows

Once filtered, STAC items map directly to array operations. Understanding how spatial metadata, coordinate reference systems, and band ordering translate from JSON to NumPy arrays is essential for reproducible pipelines. Refer to Core Raster Fundamentals & STAC Mapping for detailed guidance on aligning STAC assets with GDAL/rasterio conventions.

For bulk loading, combine pystac-client with stackstac or odc-stac:

import stackstac
import rasterio

# Convert STAC Items to an xarray DataArray
da = stackstac.stack(results, assets=["red", "green", "blue"], epsg=32610)

# Or stream directly into rasterio for windowed reads
with rasterio.open(results[0].assets["red"].href) as src:
    profile = src.profile
    window = src.read(1, window=rasterio.windows.Window(0, 0, 512, 512))

The official pystac-client documentation provides additional examples for advanced query composition, authentication handling, and catalog crawling.