Using pystac-client to Filter Sentinel-2 Imagery by Date
To filter Sentinel-2 imagery by date with pystac-client, open a Client against a public STAC endpoint, pass a datetime interval in RFC 3339 format (YYYY-MM-DD/YYYY-MM-DD), and call .items() on the resulting ItemSearch. The library paginates automatically and yields pystac.Item objects ready for metadata inspection or direct asset loading.
import pystac_client
client = pystac_client.Client.open(
"https://earth-search.aws.element84.com/v1"
)
search = client.search(
collections=["sentinel-2-l2a"],
datetime="2025-06-01/2025-06-30",
bbox=[-122.5, 37.6, -122.0, 37.9],
query={"eo:cloud_cover": {"lt": 20}},
)
items = list(search.items(max_items=50))
print(f"Found {len(items)} scenes")
This is the minimal pattern. The sections below explain each parameter, walk through a production-grade implementation, and cover the common failure modes.
Context
Date-range filtering is the entry point for almost every Sentinel-2 analysis workflow: change detection, cloud-free compositing, NDVI time series, and flood mapping all start by narrowing the archive to scenes that fall within a meaningful temporal window. Without a date filter you retrieve tens of thousands of items per spatial query, exhausting memory and inflating API costs.
This page covers the date-filtering step in detail. It belongs to the Querying STAC Catalogs Programmatically workflow, which covers the full lifecycle from client initialization through asset validation. For the broader raster data model that gives these items meaning — how spatial metadata, band ordering, and coordinate reference systems translate from JSON to NumPy arrays — see Core Raster Fundamentals & STAC Mapping.
How pystac-client Resolves a Date-Filtered Search
Before writing the production function it helps to understand what happens between your client.search() call and the first item you receive. The diagram below shows the four-stage lifecycle.
The key insight is that stages 3 and 4 repeat for every page. A query covering a full month over a large bounding box can produce dozens of API round-trips. Passing max_items to .items() prevents unbounded memory growth by capping the iterator regardless of how many pages the API offers.
Environment & Setup
| Package | Minimum version | Why required |
|---|---|---|
pystac-client |
0.7.0 |
STAC API v1.0.0 support; improved pagination and max_items |
pystac |
1.8.0 |
Item, Asset, and extension deserialization |
planetary-computer |
1.0.0 |
Only for Planetary Computer asset signing (optional) |
pip install "pystac-client>=0.7.0" "pystac>=1.8.0"
# Add this only if you target Microsoft Planetary Computer:
pip install "planetary-computer>=1.0.0"
Both AWS Earth Search (earth-search.aws.element84.com/v1) and Microsoft Planetary Computer (planetarycomputer.microsoft.com/api/stac/v1) work without authentication. Earth Search assets are publicly readable; Planetary Computer assets require signing (see the Variant Patterns section).
Complete Working Example
The function below is copy-pasteable and production-ready. It validates inputs, applies server-side cloud-cover filtering, caps memory usage, and logs progress.
import logging
import pystac
import pystac_client
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)
def fetch_sentinel2_by_date(
start_date: str,
end_date: str,
bbox: list[float] | None = None,
max_cloud_cover: float = 20.0,
max_items: int = 100,
stac_url: str = "https://earth-search.aws.element84.com/v1",
) -> list[pystac.Item]:
"""
Query a public STAC API for Sentinel-2 L2A scenes within a date range.
Parameters
----------
start_date, end_date : str
Dates in YYYY-MM-DD format (RFC 3339 interval boundaries, inclusive).
bbox : list[float] | None
Optional [min_lon, min_lat, max_lon, max_lat] spatial filter.
max_cloud_cover : float
Upper bound for eo:cloud_cover (percent). Applied server-side.
max_items : int
Hard cap on total items returned — prevents memory exhaustion on
large spatiotemporal queries that span many API pages.
stac_url : str
Root URL of a STAC API v1.0.0-compliant endpoint.
Returns
-------
list[pystac.Item]
Validated Item objects, sorted by acquisition datetime descending.
"""
# Connect once; Client.open() validates the root catalog and
# inspects conformance classes to determine supported filter syntax.
try:
client = pystac_client.Client.open(stac_url)
except Exception as exc:
raise ConnectionError(f"Cannot reach STAC API at {stac_url}: {exc}") from exc
# RFC 3339 closed interval — both boundaries are inclusive.
dt_range = f"{start_date}/{end_date}"
search_params: dict = {
"collections": ["sentinel-2-l2a"], # surface reflectance collection
"datetime": dt_range,
# server-side cloud filter via the EO extension (supported by both
# Earth Search and Planetary Computer)
"query": {"eo:cloud_cover": {"lt": max_cloud_cover}},
"limit": 50, # page size; distinct from max_items cap
}
if bbox is not None:
if len(bbox) != 4:
raise ValueError(
"bbox must be [min_lon, min_lat, max_lon, max_lat] — "
f"got {len(bbox)} values"
)
search_params["bbox"] = bbox
try:
search = client.search(**search_params)
# .items() follows pagination links automatically.
# max_items caps the total across all pages — always set this.
items = list(search.items(max_items=max_items))
except Exception as exc:
logger.error("Search failed: %s", exc)
return []
logger.info(
"Retrieved %d Sentinel-2 L2A items for %s", len(items), dt_range
)
return items
# ── Example usage ──────────────────────────────────────────────────────────────
if __name__ == "__main__":
results = fetch_sentinel2_by_date(
start_date="2025-06-01",
end_date="2025-06-30",
bbox=[-122.5, 37.6, -122.0, 37.9], # San Francisco Bay Area
max_cloud_cover=15.0,
max_items=50,
)
for item in results[:3]:
acq = item.properties["datetime"]
cc = item.properties.get("eo:cloud_cover", "n/a")
red_href = item.assets.get("red", item.assets.get("B04"))
print(f"{item.id} acq={acq} cloud={cc}%")
if red_href:
print(f" red band → {red_href.href}")
What each non-obvious line does
"limit": 50sets the page size per HTTP request, not the total result count.max_items=50in.items()is the true ceiling.item.assets.get("red", item.assets.get("B04"))handles the naming difference between Earth Search (red) and some older catalogs that use the raw Sentinel-2 band designator (B04).- The
querydict uses theeo:cloud_coverextension predicate. This filter runs on the server, so you receive only low-cloud scenes rather than filtering a large local list.
Variant Patterns
Variant 1 — Microsoft Planetary Computer with asset signing
Planetary Computer assets are stored in private Azure Blob Storage and require time-limited SAS tokens. Install planetary-computer and sign each item before passing HREFs to rasterio.
import planetary_computer
import pystac_client
client = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
# The modifier signs every asset HREF during deserialization
modifier=planetary_computer.sign_inplace,
)
search = client.search(
collections=["sentinel-2-l2a"],
datetime="2025-07-01/2025-07-31",
bbox=[28.0, -26.5, 28.5, -26.0], # Johannesburg region
query={"eo:cloud_cover": {"lt": 10}},
)
for item in search.items(max_items=10):
# Assets are pre-signed — pass directly to rasterio without extra steps
print(item.assets["red"].href[:80], "...")
The modifier=planetary_computer.sign_inplace argument intercepts each item as it is deserialized and rewrites every HREF with a valid SAS token. Without this, rasterio.open() receives a bare blob URL and returns HTTP 403.
Variant 2 — Open-ended datetime for the most recent N scenes
Use ../YYYY-MM-DD to request all scenes up to a cutoff date, or YYYY-MM-DD/.. to start from a date and include all subsequent acquisitions.
import pystac_client
client = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")
# Everything from 2025-01-01 to the present
search = client.search(
collections=["sentinel-2-l2a"],
datetime="2025-01-01/..",
bbox=[2.2, 48.8, 2.4, 48.9], # Paris
query={"eo:cloud_cover": {"lt": 5}},
sortby="-datetime", # descending = most recent first
)
recent = list(search.items(max_items=5))
print(f"Most recent acquisition: {recent[0].properties['datetime']}")
Note that sortby="-datetime" is an optional STAC API extension (sort); confirm it is listed in the catalog’s conformance classes before relying on it.
Variant 3 — Feed results directly into stackstac for lazy array loading
Once you have a list of pystac.Item objects, handing them to band math operations with xarray via stackstac requires only two extra lines. The items remain lazy — no pixel data is loaded until you call .compute().
import stackstac
import pystac_client
client = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")
search = client.search(
collections=["sentinel-2-l2a"],
datetime="2025-05-01/2025-05-31",
bbox=[-74.05, 40.7, -73.9, 40.8], # Manhattan
query={"eo:cloud_cover": {"lt": 20}},
)
items = list(search.items(max_items=20))
# stackstac reads STAC Item asset metadata to build a lazy xarray DataArray.
# epsg=32618 is UTM zone 18N — the native CRS for this region.
da = stackstac.stack(
items,
assets=["red", "green", "blue", "nir"],
epsg=32618,
resolution=10,
)
print(da) # (time, band, y, x) DataArray — no pixels loaded yet
For windowed reads without stackstac, open the asset HREF directly with rasterio. The handling pixel resolution and scaling page covers how to select appropriate read windows and resolution levels from a COG.
Common Errors
ValidationError: 'datetime' is not valid under any of the given schemas
The STAC API rejected the datetime format. Use strictly YYYY-MM-DD/YYYY-MM-DD — no times, no T suffix, no timezone suffix unless the endpoint explicitly documents timezone-aware queries. Confirm format with the catalog’s /api OpenAPI spec.
HTTP 403 Forbidden when opening an asset HREF with rasterio
The item came from Microsoft Planetary Computer without signing. Either add modifier=planetary_computer.sign_inplace to Client.open(), or call planetary_computer.sign(item) after each item is retrieved. Signed tokens expire after ~60 minutes, so sign immediately before reading.
MemoryError or process killed during .items() iteration
You omitted max_items. A wide bounding box over a long time range can return tens of thousands of items across hundreds of API pages. Always pass max_items to .items(), or wrap the iterator in a generator that breaks early.
Related
- Querying STAC Catalogs Programmatically — the parent workflow covering full lifecycle: client initialization, pagination strategies, asset validation, and error handling.
- Core Raster Fundamentals & STAC Mapping — how STAC metadata maps to raster data models, COG byte layout, and CRS conventions.
- Band Math Operations with xarray — load the items retrieved here into lazy xarray DataArrays and compute NDVI or other spectral indices.
- Cloud and Shadow Masking Strategies — apply SCL-based or s2cloudless masks to the date-filtered scenes before analysis.
- Handling Pixel Resolution and Scaling — open individual COG assets with rasterio windowed reads after retrieving items here.