Tracks P95 latency per route template (e.g. /users/{id}) Learns a baseline dynamically from recent traffic Detects spikes using rate-of-change (not just static thresholds) Computes a 0–100 health score with trend direction (improving / stable / degrading) Stores events in Redis Streams for replay and debugging
One interesting result: In synthetic load tests (gradual latency ramp from ~200ms to ~1200ms over 60 seconds, with a P95 warning threshold at 1000ms), rate-of-change detection consistently surfaced degradation slightly before static threshold alerts. The window is small, but it was often enough to notice system stress before crossing alert thresholds.
Design constraints:
Near-zero overhead on the request path (async, fire-and-forget writes) Must fail silently if Redis is unavailable No external monitoring stack required (runs in-app)
Example usage: pythonapp.add_middleware(RequestMetricsMiddleware, alert_engine=engine)
Context: This is part of a larger system I'm building that integrates cloud services with mobile money APIs (EcoCash, etc.), where partial failures and latency spikes are common. Still early — hasn't been tested under real production traffic yet.
Curious how others are handling early degradation detection in FastAPI or similar systems. Repo: https://github.com/Tandem-Media/fastapi-alertengine PyPI: https://pypi.org/project/fastapi-alertengine/