always found backpressure tricky to handle without locks.
1. Lock-Free Parts (Atomics) The ring buffer relies on atomic.Uint64 for head/tail cursors. Also, hot-swapping the processor chain uses atomic.Pointer[T] to ensure the worker loop never blocks even during a config reload:
// Worker reads config without locks currentChain := p.chain.Load() currentChain.Process(data)
// Config updates swap the pointer atomically p.chain.Store(newChain)
I intentionally avoided channels for the data path. Benchmarks showed that at >100k msgs/sec, the channel allocation/locking overhead was 2-3x costlier than the ring buffer approach.
Handling Backpressure: I Don't (Intentionally) You're right that backpressure is tricky without locks. My approach was to eliminate the need for it:
Drop-on-full: If saturated, new logs are dropped (preserves recent history, non-blocking) Fail-open circuit breaker: If buffer >80% full, bypass processing to drain faster Philosophy: For observability, dropping overflow is better than blocking ingestion The honest answer: This isn't "wait-free" in the academic sense—WaitGroup is used in output fanout. But the ingestion→buffering path has zero mutexes.
sandeepk235•2h ago
Hey HN,
I built StreamGate to solve a specific problem: our observability bills (Datadog/Splunk) were scaling linearly with traffic, mostly due to low-value "noise" logs (DEBUG, health checks, etc.) that we rarely queried.
Existing solutions were either "all-or-nothing" agents or heavy Java/Enterprise pipelines. I wanted something lightweight that could sit at the network edge and act as a smart valve.
The Architecture: It uses a Split-Plane design:
Control Plane (Python): Handles config validation and API requests (FastAPI).
Data Plane (Go): Handles the hot path.
Technical Implementation Details: Instead of standard Go Channels (which I found created too much GC pressure at high throughput), I implemented a fixed-size Ring Buffer using sync/atomic primitives.
Lock-Free: The hot path uses atomic pointers to swap configuration rules, so we can "hot reload" rules without stopping the world or acquiring a Mutex.
Fail-Open: It implements a circuit breaker; if the ring buffer fills up (>80%), it degrades to pass-through mode rather than dropping logs or blocking the app.
Performance: On my local machine (M2), it handles ~200k events/sec with minimal allocation.
I’d love critical feedback on the ring buffer implementation or the split-plane approach.