Sync vs. async vs. event-driven AI requests: what works in production

https://modelriver.com/how-modelriver-works/event-driven-async

4•akarshc•1w ago

Comments

akarshc•1w ago

I’m one of the builders. Once AI requests moved beyond simple sync calls, we kept running into the same problems in production: retries hiding failures, async flows that were hard to reason about, frontend state drifting, and providers timing out mid-request.

This page breaks down the three request patterns we see teams actually using in production (sync, async, and event-driven async), how data flows in each case, and why we ended up favoring an event-driven approach for interactive, streaming apps.

Happy to answer questions or go deeper on any part of the architecture.

vishaal_007•1w ago

I’m another founder on this. One thing that surprised us while building AI features was how often the hard problems weren’t about model choice, but about request lifecycle. Once you introduce streaming, retries, and multiple providers, a lot of implicit assumptions in typical request–response code stop holding.

We kept seeing teams reinvent similar patterns in slightly different ways, especially around correlating events, handling partial failures, and keeping the frontend in sync with what actually happened on the backend. The goal with this writeup was to make those tradeoffs explicit and show what’s actually happening on the wire in each approach.

Curious to hear how others here are handling long-lived or streaming AI requests in production, especially once things start failing in non-obvious ways.

amalv•1w ago

If a team adopts this pattern and later decides to remove ModelRiver, how hard is it to unwind? Are the request and event models close to provider APIs or fairly opinionated?

akarshc•1w ago

This was something we were careful about. The request and event models are intentionally close to what most providers already expose, rather than introducing a completely new abstraction.

Teams usually integrate it incrementally in front of existing calls. If you remove it, you’re mostly deleting the orchestration layer and keeping your provider integrations and client logic. You lose centralized retries and observability, but you’re not stuck rewriting your entire request model.

If adopting it requires a full rewrite, that’s usually a sign it’s being applied too broadly.

aparnavalsan43•1w ago

In practice, where does the event-driven approach break down? What kinds of workloads still fit better with simple sync or queue-based async?

vishaal_007•1w ago

In practice, event-driven starts to feel like overkill when requests are short-lived and failures are cheap. If a call is fast, idempotent, and the user isn’t waiting on partial output, a simple sync request is usually the clearest solution.

Queue-based async still works well for batch jobs, offline processing, or anything where latency and ordering aren’t user-visible. The event-driven approach mainly pays off once you have long-lived or interactive requests where failures can happen mid-response and you care about what the user actually sees.

aparnavalsan43•1w ago

That makes sense. How do you decide early on which requests are likely to “grow into” needing an event-driven approach, versus staying simple sync or queue-based long term?

vishaal_007•1w ago

In our experience, it usually comes down to whether the request has user-visible state over time. If the response is something you can treat as atomic and either succeed or fail cleanly, it tends to stay simple.

The requests that “grow” tend to share a few signals early on: they stream partial results, they take long enough that the frontend needs progress updates, or failures start happening after something has already been shown to the user. Another common signal is when retries stop being transparent and you start needing to explain to users what actually happened.

Once those patterns show up, teams usually end up reworking the flow anyway. The event-driven approach just makes that lifecycle explicit earlier, instead of letting it emerge implicitly and painfully over time.

GopikaDilip•1w ago

How do you reason about retries and correctness once a stream has already started? For example, how do you avoid duplicated or missing tokens if a provider fails mid-stream?

akarshc•1w ago

This is one of the harder problems, and there isn’t a perfect answer.

The main thing we try to avoid is pretending mid-stream retries are the same as pre-request retries. Once a stream has started, we treat it as a sequence of events with checkpoints rather than a single opaque response. Retries are scoped to known safe boundaries, and anything ambiguous is surfaced explicitly instead of silently re-emitting tokens.

In other words, correctness is prioritized over pretending the stream is seamless. If we can’t guarantee no duplication, we make that visible rather than hide it.

I replaced the front page with AI slop and honestly it's an improvement

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

I replaced the front page with AI slop and honestly it's an improvement

Economists vs. Technologists on AI

Life at the Edge

RISC-V Vector Primer

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

A Tale of Two Standards, POSIX and Win32 (2005)

Ask HN: Is the Downfall of SaaS Started?

Flirt: The Native Backend

OpenAI's Latest Platform Targets Enterprise Customers

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

Big Tech's AI Push Is Costing More Than the Moon Landing

The AI boom is causing shortages everywhere else

Suno, AI Music, and the Bad Future [video]

Ask HN: How are researchers using AlphaFold in 2026?

Running the "Reflections on Trusting Trust" Compiler

Watermark API – $0.01/image, 10x cheaper than Cloudinary

Now send your marketing campaigns directly from ChatGPT

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

Show HN: Hibana – choreography-first protocol safety for Rust

Haniri: A live autonomous world where AI agents survive or collapse

GPT-5.3-Codex System Card [pdf]

Atlas: Manage your database schema as code

Geist Pixel

Show HN: MCP to get latest dependency package and tool versions

The better you get at something, the harder it becomes to do

Show HN: WP Float – Archive WordPress blogs to free static hosting

Show HN: I Hacked My Family's Meal Planning with an App

Sony BMG copy protection rootkit scandal

The Future of Systems

Sync vs. async vs. event-driven AI requests: what works in production

Comments