Staff Engineer at Ably here. Over the past few months I've been speaking to engineers building AI assistants, copilots, and agentic workflows (over 40 companies at this point), with particular focus on cloud-hosted agents.
I expected the hard problems to be in model selection, prompt engineering, and orchestration. Instead, the same infrastructure challenges kept coming up: realtime sync between agents and end clients is surprisingly painful to get right.
- Managing and scaling WebSocket or SSE connections between agents and clients
- Buffering messages server-side and implementing replay logic for reconnecting clients
- Tracking what each client has received across multiple devices
- Ensuring continuity between historical and live responses on initial load
- Routing reconnecting clients to the correct agent instance in distributed deployments
Ably AI Transport solves this: it's a drop-in transport layer that sits between your agents and end-user devices.
We discovered many companies who were already using Ably Pub/Sub to tackle these problems. The pub/sub pattern decouples agents from clients: agents publish to channels, clients subscribe, Ably handles delivery and replay.
AI Transport makes this easier - for example, we've added message appends for efficient token streaming, and annotations for attaching metadata like citations.
A few of the interesting technical pieces:
- Channel-oriented architecture: In connection-oriented setups, the connection pokes the agent into life. If the connection drops, on reconnection the agent must figure out what state each client is missing. AI Transport uses channels instead: agents publish, clients subscribe, the channel handles replay. Presence events let agents detect when users go online/offline or connect from multiple devices.
- Identity in decoupled pub/sub: Users authenticate with your server, which issues JWTs with embedded clientId and capabilities. Agents receive messages with cryptographically verified identity. User claims (ably.channel.* in the JWT) appear in message.extras.userClaim - useful for HITL workflows where you verify an approver's role before executing a tool call.
- Token streaming with appends: New message.append operation lets you build a response incrementally. Clients joining mid-stream get a message.update with the complete response so far, then receive subsequent appends. Channel history contains one compacted message per response.
- Annotations for citations: Attach citation metadata to responses without modifying content. Clients can subscribe to individual annotations or obtain them on demand via REST. Ably also automatically aggregates annotations into summaries (e.g. count by domain name), which are delivered to clients in realtime.
- Messaging patterns: Docs include patterns for tool calls, human-in-the-loop approval flows, and chain-of-thought streaming.
Docs: https://ably.com/docs/ai-transport
If you're building AI UX, I'd love to hear what problems you've hit and what you've built in-house.
Thanks!
Mike Christensen