Show HN: a small API layer for real-time AI streaming, retries, and debugging

4•akarshc•2w ago

Comments

akarshc•2w ago

While building AI features that rely on real-time streaming responses, I kept running into failures that were hard to reason about once things went async.

Requests would partially stream, providers would throttle or fail mid-stream, and retry logic ended up scattered across background jobs, webhooks, and request handlers.

I built ModelRiver as a thin API layer that sits between an app and AI providers and centralizes streaming, retries, failover, and request-level debugging in one place.

It’s early and opinionated, and there are tradeoffs. Happy to answer technical questions or hear how others are handling streaming reliability in production AI apps.

arxgo•2w ago

Why not just handle this in the application with queues and background jobs?

akarshc•2w ago

Queues work well before or after a request, but they’re awkward once a response is already streaming. This layer exists mainly to handle failures during a stream without spreading that logic across handlers, workers, and client code.

amalv•2w ago

At what point does adding this layer become more complex than just handling streaming failures directly in the app?

akarshc•2w ago

If streaming behavior is still product-specific and changing fast, this adds friction. It only pays off once failure handling stabilizes and starts repeating across the system.

kxbnb•1w ago

The pain of failures that are "hard to reason about once things went async" is real. Centralizing the retry/failover logic makes sense.

One pattern I've found useful: having a read-only view of what's actually hitting the wire before any retry logic kicks in. When you can see the raw request/response as it happens, you can tell whether the issue is your payload, the provider throttling, or something in between.

We built toran.sh for this - it's a transparent proxy that shows exactly what goes out and comes back in real-time. Different layer than what you're doing (you handle the orchestration, we just show the traffic), but they complement each other.

Curious how you handle visibility into what's actually being sent during partial stream failures?

akarshc•1w ago

Totally agree, that “what actually hit the wire?” view is critical once things go async.

ModelRiver already has this covered via request logs. Every request captures the full lifecycle, the exact payload sent to the provider, streaming chunks as they arrive, partial responses, errors, retries, and the final outcome. Even if a stream fails midway, you can still inspect what was sent and what came back before the failure.

So you can clearly tell whether the issue is payload shape, provider throttling, or a mid stream failure, before any retry or failover logic kicks in. That wire level visibility is core to how we approach debugging async AI requests.

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

AI-native capabilities, a new API Catalog, and updated plans and pricing

What changed in tech from 2010 to 2020?

From Human Ergonomics to Agent Ergonomics

Advanced Inertial Reference Sphere

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

Show HN: A longitudinal health record built from fragmented medical data

CoreWeave's $30B Bet on GPU Market Infrastructure

Creating and Hosting a Static Website on Cloudflare for Free

"The Stanford scam proves America is becoming a nation of grifters"

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

RFCs vs. READMEs: The Evolution of Protocols

Kanchipuram Saris and Thinking Machines

Chinese chemical supplier causes global baby formula recall

I've used AI to write 100% of my code for a year as an engineer

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

AI-native capabilities, a new API Catalog, and updated plans and pricing

What changed in tech from 2010 to 2020?

From Human Ergonomics to Agent Ergonomics

Advanced Inertial Reference Sphere

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

Show HN: A longitudinal health record built from fragmented medical data

CoreWeave's $30B Bet on GPU Market Infrastructure

Creating and Hosting a Static Website on Cloudflare for Free

"The Stanford scam proves America is becoming a nation of grifters"

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

X (Twitter) is back with a new X API Pay-Per-Use model

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

Scientists Discover Levitating Time Crystals You Can Hold, Defy Newton’s 3rd Law

When Michelangelo Met Titian

Solving NYT Pips with DLX

Baldur's Gate to be turned into TV series – without the game's developers

Interview with 'Just use a VPS' bro (OpenClaw version) [video]

EchoJEPA: Latent Predictive Foundation Model for Echocardiography

Disablling Go Telemetry

Effective Nihilism

The UK government didn't want you to see this report on ecosystem collapse

No 10 blocks report on impact of rainforest collapse on food prices

Show HN: a small API layer for real-time AI streaming, retries, and debugging

Comments