Using pystac-client to Filter Sentinel-2 Imagery by Date

To filter Sentinel-2 imagery by date with pystac-client, open a Client against a public STAC endpoint, pass a datetime interval in RFC 3339 format (YYYY-MM-DD/YYYY-MM-DD), and call .items() on the resulting ItemSearch. The library paginates automatically and yields pystac.Item objects ready for metadata inspection or direct asset loading.

import pystac_client

client = pystac_client.Client.open(
    "https://earth-search.aws.element84.com/v1"
)
search = client.search(
    collections=["sentinel-2-l2a"],
    datetime="2025-06-01/2025-06-30",
    bbox=[-122.5, 37.6, -122.0, 37.9],
    query={"eo:cloud_cover": {"lt": 20}},
)
items = list(search.items(max_items=50))
print(f"Found {len(items)} scenes")

This is the minimal pattern. The sections below explain each parameter, walk through a production-grade implementation, and cover the common failure modes.

Context

Date-range filtering is the entry point for almost every Sentinel-2 analysis workflow: change detection, cloud-free compositing, NDVI time series, and flood mapping all start by narrowing the archive to scenes that fall within a meaningful temporal window. Without a date filter you retrieve tens of thousands of items per spatial query, exhausting memory and inflating API costs.

This page covers the date-filtering step in detail. It belongs to the Querying STAC Catalogs Programmatically workflow, which covers the full lifecycle from client initialization through asset validation. For the broader raster data model that gives these items meaning — how spatial metadata, band ordering, and coordinate reference systems translate from JSON to NumPy arrays — see Core Raster Fundamentals & STAC Mapping.

Before writing the production function it helps to understand what happens between your client.search() call and the first item you receive. The diagram below shows the four-stage lifecycle.

pystac-client date-filter request lifecycle Four stages: Client.open validates the root catalog; search() posts parameters including the RFC 3339 datetime interval; the STAC API returns a page of GeoJSON FeatureCollection items; items() follows next-page links and yields pystac.Item objects until max_items is reached. Client.open() validates root catalog + conformance classes client.search() POST /search with datetime + bbox + query STAC API returns FeatureCollection page + next link .items() follows next links, yields pystac.Item stage 1 stage 2 stage 3 stage 4 Pagination is transparent — .items(max_items=N) caps total yielded objects across all pages

The key insight is that stages 3 and 4 repeat for every page. A query covering a full month over a large bounding box can produce dozens of API round-trips. Passing max_items to .items() prevents unbounded memory growth by capping the iterator regardless of how many pages the API offers.

Environment & Setup

Package Minimum version Why required
pystac-client 0.7.0 STAC API v1.0.0 support; improved pagination and max_items
pystac 1.8.0 Item, Asset, and extension deserialization
planetary-computer 1.0.0 Only for Planetary Computer asset signing (optional)
pip install "pystac-client>=0.7.0" "pystac>=1.8.0"
# Add this only if you target Microsoft Planetary Computer:
pip install "planetary-computer>=1.0.0"

Both AWS Earth Search (earth-search.aws.element84.com/v1) and Microsoft Planetary Computer (planetarycomputer.microsoft.com/api/stac/v1) work without authentication. Earth Search assets are publicly readable; Planetary Computer assets require signing (see the Variant Patterns section).

Complete Working Example

The function below is copy-pasteable and production-ready. It validates inputs, applies server-side cloud-cover filtering, caps memory usage, and logs progress.

import logging
import pystac
import pystac_client

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)


def fetch_sentinel2_by_date(
    start_date: str,
    end_date: str,
    bbox: list[float] | None = None,
    max_cloud_cover: float = 20.0,
    max_items: int = 100,
    stac_url: str = "https://earth-search.aws.element84.com/v1",
) -> list[pystac.Item]:
    """
    Query a public STAC API for Sentinel-2 L2A scenes within a date range.

    Parameters
    ----------
    start_date, end_date : str
        Dates in YYYY-MM-DD format (RFC 3339 interval boundaries, inclusive).
    bbox : list[float] | None
        Optional [min_lon, min_lat, max_lon, max_lat] spatial filter.
    max_cloud_cover : float
        Upper bound for eo:cloud_cover (percent). Applied server-side.
    max_items : int
        Hard cap on total items returned — prevents memory exhaustion on
        large spatiotemporal queries that span many API pages.
    stac_url : str
        Root URL of a STAC API v1.0.0-compliant endpoint.

    Returns
    -------
    list[pystac.Item]
        Validated Item objects, sorted by acquisition datetime descending.
    """
    # Connect once; Client.open() validates the root catalog and
    # inspects conformance classes to determine supported filter syntax.
    try:
        client = pystac_client.Client.open(stac_url)
    except Exception as exc:
        raise ConnectionError(f"Cannot reach STAC API at {stac_url}: {exc}") from exc

    # RFC 3339 closed interval — both boundaries are inclusive.
    dt_range = f"{start_date}/{end_date}"

    search_params: dict = {
        "collections": ["sentinel-2-l2a"],  # surface reflectance collection
        "datetime": dt_range,
        # server-side cloud filter via the EO extension (supported by both
        # Earth Search and Planetary Computer)
        "query": {"eo:cloud_cover": {"lt": max_cloud_cover}},
        "limit": 50,  # page size; distinct from max_items cap
    }

    if bbox is not None:
        if len(bbox) != 4:
            raise ValueError(
                "bbox must be [min_lon, min_lat, max_lon, max_lat] — "
                f"got {len(bbox)} values"
            )
        search_params["bbox"] = bbox

    try:
        search = client.search(**search_params)
        # .items() follows pagination links automatically.
        # max_items caps the total across all pages — always set this.
        items = list(search.items(max_items=max_items))
    except Exception as exc:
        logger.error("Search failed: %s", exc)
        return []

    logger.info(
        "Retrieved %d Sentinel-2 L2A items for %s", len(items), dt_range
    )
    return items


# ── Example usage ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
    results = fetch_sentinel2_by_date(
        start_date="2025-06-01",
        end_date="2025-06-30",
        bbox=[-122.5, 37.6, -122.0, 37.9],  # San Francisco Bay Area
        max_cloud_cover=15.0,
        max_items=50,
    )

    for item in results[:3]:
        acq = item.properties["datetime"]
        cc = item.properties.get("eo:cloud_cover", "n/a")
        red_href = item.assets.get("red", item.assets.get("B04"))
        print(f"{item.id}  acq={acq}  cloud={cc}%")
        if red_href:
            print(f"  red band → {red_href.href}")

What each non-obvious line does

  • "limit": 50 sets the page size per HTTP request, not the total result count. max_items=50 in .items() is the true ceiling.
  • item.assets.get("red", item.assets.get("B04")) handles the naming difference between Earth Search (red) and some older catalogs that use the raw Sentinel-2 band designator (B04).
  • The query dict uses the eo:cloud_cover extension predicate. This filter runs on the server, so you receive only low-cloud scenes rather than filtering a large local list.

Variant Patterns

Variant 1 — Microsoft Planetary Computer with asset signing

Planetary Computer assets are stored in private Azure Blob Storage and require time-limited SAS tokens. Install planetary-computer and sign each item before passing HREFs to rasterio.

import planetary_computer
import pystac_client

client = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    # The modifier signs every asset HREF during deserialization
    modifier=planetary_computer.sign_inplace,
)

search = client.search(
    collections=["sentinel-2-l2a"],
    datetime="2025-07-01/2025-07-31",
    bbox=[28.0, -26.5, 28.5, -26.0],  # Johannesburg region
    query={"eo:cloud_cover": {"lt": 10}},
)

for item in search.items(max_items=10):
    # Assets are pre-signed — pass directly to rasterio without extra steps
    print(item.assets["red"].href[:80], "...")

The modifier=planetary_computer.sign_inplace argument intercepts each item as it is deserialized and rewrites every HREF with a valid SAS token. Without this, rasterio.open() receives a bare blob URL and returns HTTP 403.

Variant 2 — Open-ended datetime for the most recent N scenes

Use ../YYYY-MM-DD to request all scenes up to a cutoff date, or YYYY-MM-DD/.. to start from a date and include all subsequent acquisitions.

import pystac_client

client = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")

# Everything from 2025-01-01 to the present
search = client.search(
    collections=["sentinel-2-l2a"],
    datetime="2025-01-01/..",
    bbox=[2.2, 48.8, 2.4, 48.9],  # Paris
    query={"eo:cloud_cover": {"lt": 5}},
    sortby="-datetime",  # descending = most recent first
)

recent = list(search.items(max_items=5))
print(f"Most recent acquisition: {recent[0].properties['datetime']}")

Note that sortby="-datetime" is an optional STAC API extension (sort); confirm it is listed in the catalog’s conformance classes before relying on it.

Variant 3 — Feed results directly into stackstac for lazy array loading

Once you have a list of pystac.Item objects, handing them to band math operations with xarray via stackstac requires only two extra lines. The items remain lazy — no pixel data is loaded until you call .compute().

import stackstac
import pystac_client

client = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")
search = client.search(
    collections=["sentinel-2-l2a"],
    datetime="2025-05-01/2025-05-31",
    bbox=[-74.05, 40.7, -73.9, 40.8],  # Manhattan
    query={"eo:cloud_cover": {"lt": 20}},
)
items = list(search.items(max_items=20))

# stackstac reads STAC Item asset metadata to build a lazy xarray DataArray.
# epsg=32618 is UTM zone 18N — the native CRS for this region.
da = stackstac.stack(
    items,
    assets=["red", "green", "blue", "nir"],
    epsg=32618,
    resolution=10,
)
print(da)  # (time, band, y, x) DataArray — no pixels loaded yet

For windowed reads without stackstac, open the asset HREF directly with rasterio. The handling pixel resolution and scaling page covers how to select appropriate read windows and resolution levels from a COG.

Common Errors

ValidationError: 'datetime' is not valid under any of the given schemas The STAC API rejected the datetime format. Use strictly YYYY-MM-DD/YYYY-MM-DD — no times, no T suffix, no timezone suffix unless the endpoint explicitly documents timezone-aware queries. Confirm format with the catalog’s /api OpenAPI spec.

HTTP 403 Forbidden when opening an asset HREF with rasterio The item came from Microsoft Planetary Computer without signing. Either add modifier=planetary_computer.sign_inplace to Client.open(), or call planetary_computer.sign(item) after each item is retrieved. Signed tokens expire after ~60 minutes, so sign immediately before reading.

MemoryError or process killed during .items() iteration You omitted max_items. A wide bounding box over a long time range can return tens of thousands of items across hundreds of API pages. Always pass max_items to .items(), or wrap the iterator in a generator that breaks early.