OpenAI's WebRTC problem

https://moq.dev/blog/webrtc-is-the-problem/

84•atgctg•1d ago

Comments

giancarlostoro•1h ago

Probably because WebTransport is the lesser known alternative to WebRTC.

est•43m ago

WebTransport requires some speicific server setup.

cldouflare doesn't support WebTransport well.

r2vcap•1h ago

This is frustratingly one-sided writing. Yeah, WebRTC has limitations, but relying on a standard buys you a lot of correctness and reduces long-term engineering cost. The fact that WebRTC is complicated does not mean it is wrong; it means real-time media over the public internet is complicated.

Also, networking is inherently stateful. NAT traversal, jitter buffers, congestion control, packet loss, codec state, encryption, and session routing do not disappear because you put audio over TCP or WebSocket. Pretending otherwise is not architectural clarity. It is just moving the complexity somewhere less visible.

Waterluvian•30m ago

“How hard can it be?” the strawman asked.

It’s 2026 and teleconferencing is still such a shit show. There’s billions of dollars to be had and Zoom is at best mediocre, and it can be as bad as Microsoft Whatchamacallit. I’ve never not seen teleconferencing be a ham handed mess.

charcircuit•22m ago

QUIC is also a standard.

tekacs•20m ago

You might have noticed that the author started the blog post explaining themselves:

  Like 6 years ago I wrote a WebRTC SFU at Twitch.
  Originally we used Pion (Go) just like OpenAI,
  but forked after benchmarking revealed that it was too slow.
  I ended up rewriting every protocol, because of course I did!

  Just a year ago, I was at Discord and I rewrote the WebRTC SFU in Rust.
  Because of course I did! You’re probably noticing a trend.

  Fun Fact: WebRTC consists of ~45 RFCs dating back to the early 2000s.
  And some de-facto standards that are technically drafts (ex. TWCC, REMB).
  Not a fun fact when you have to implement them all.

  You should consider me a Certified WebRTC Expert.
  Which is why I never, never want to use WebRTC again.

I think that they've done more than enough of 'trying the normal way' to be warranted in having an opinion the other way, don't you think?

awkii•47m ago

This poor soul. There are few protocols I hate implementing more than WebRTC. Getting a simple client going means you need to quickly acclimate to SDP, TURN/STUN, ice-candidates, offers, peer-to-peer protocols, and the complex handshake that is implemented from scratch each time. I can't imagine re-writing the whole trenchcoat of protocols and unintended "best-practices".

jgalt212•41m ago

Have you attempted to use the Microsoft Graph API to interact with email?

edoceo•13m ago

Ugh. Who's decided to Graph all the things.

Sean-Der•34m ago

What platforms were you targeting that you found it painful! Sorry it was frustrating.

I hope it’s getting better with education/more libraries. It’s also amazing how easy Codex etc… can burn through it now

moomoo11•13m ago

i like livekit for this reason and their ceo is cool

Giefo6ah•37m ago

Yet another victim of IPv4, and you still find countless detractors of IPv6 on every thread where it's mentioned.

whattheheckheck•20m ago

How would ipv6 handle it

tardedmeme•6m ago

You just send packets to the other party's address and they send packets back to yours.

spongebobstoes•10m ago

IPv4 support is necessary, but IPv6 isn't

fidotron•32m ago

> WebRTC is designed to degrade and drop my prompt during poor network conditions

You want real time that's what you are going to deal with. If you don't want real time and instead imagine everything as STT -> Prompt -> TTS then maybe you shouldn't even be sending audio on the wire at all.

telman17•29m ago

Yep. Maybe there's some additional configuration I'm missing to mitigate the delay but clients don't seem to want to deal with the delay with STT -> Prompt -> TTS. They'll happily suffer occasional quality issues if the conversation feels "real".

lpln3452•15m ago

I haven't really experienced disconnections while using ChatGPT. Gemini is the frustrating part. Simply backgrounding the app (and the web version too) and resuming it causes the response or the conversation with an assigned ID to disappear. Haha.

Sean-Der•13m ago

I believe Gemini is Websockets? I have the same experience with heavy/custom applications that try to roll their own media stuff.

You run into issues around AudioContext and resumption etc... it's a PITA to have to handle all those corner cases :(

spongebobstoes•12m ago

this misses a few key things but hits on many others

webrtc is a bad protocol, without a doubt. I do like websockets as an easy alternative, but you do need to reinvent decent portions of webrtc as a result

I like the idea of MoQ but it's not widely used. probably worth experimenting with, especially as video enters the chat

> and then a GPU pretends to talk to you via text-to-speech

OpenAI is speech-to-speech, there is no TTS in voice mode

> It takes a minimum of 8* round trips (RTT) to establish a WebRTC connection

signalling can be done long ahead of time, though I don't see this mentioned in the OpenAI blog. I also saw some new webrtc extensions that should reduce setup time further

ultimately though, it comes down to

> It’s not like LLMs are particularly responsive anyway

I expect to see a shift in how S2S models work to be lower latency like the new voice API models that OpenAI announced

to be fair, the new models were released the day after this MoQ blog was published

Sean-Der•10m ago

Responding to some technical points first, but then after that I do see a future that isn't WebRTC. I don't think it matches where WebTransport+WebCodecs etc is going though.

> …but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate

This is the opposite of the feedback I get. Users want instant responses. If you have delay in generating responses/interruptions it kills the magic. You also don't want to send faster than real-time. If the user interrupts the model you just wasted a bunch of bandwidth sending 3 minutes of audio (but only played 10 seconds)

> TTS is faster than real-time

https://research.nvidia.com/labs/adlr/personaplex/ Voice AI for the latest/aspirational is moving away from what the author describes. It is trickled in/out at 20ms

> We really hope the user’s source IP/port never changes, because we broke that functionality.

That is supported. When new IP for ufrag comes in its supported

> It takes a minimum of 8* round trips (RTT)

That's wrong. https://datatracker.ietf.org/doc/draft-hancke-webrtc-sped/

> I’d just stream audio over WebSockets

You lose stuff like AEC. You also push complexity on clients. The simplicity of WebRTC (createOffer -> setRemoteDescription) is what lets people onboard easily. Lots of developers struggled with Realtime API + web sockets (lots of code and having to do stuff by hand)

----

I think if I had my choice I would pick Offer/Answer model and then doing QUIC instead of DTLS+SCTP. Maybe do RTP over QUIC? I personally don't feel strongly about the protocol itself. I don't know how to ship code to multiple clients (and customers clients) with a much large code footprint.

Google broke reCAPTCHA for de-googled Android users

OpenAI's WebRTC problem

The React2Shell Story

Wi is Fi: Understanding Wi-Fi 4/5/6/6E/7/8 (802.11 n/AC/ax/be/bn)

AI is breaking two vulnerability cultures

You gave me a u32. I gave you root. (io_uring ZCRX freelist LPE)

Can LLMs model real-world systems in TLA+?

Light without electricity? Glowing algae could make it possible

AWS North Virginia data center outage – recovery to take hours

Cartoon Network Flash Games

David Attenborough's 100th Birthday

When is your birthday? The math behind hash collisions

Serving a website on a Raspberry Pi Zero running in RAM

An Introduction to Meshtastic

Mux (YC W16) Is Hiring

Looking at the data behind prediction markets

Teaching Claude Why

Meta Shuts Down End-to-End Encryption for Instagram Messaging

Human typing habits and token counts

Bitter Lessons from the ISSpresso

Tesla Model Y Passes NHTSA's New 'Advanced Driver Assistance System' Tests

Rumors of my death are slightly exaggerated

Mojo 1.0 Beta

All means are fair except solving the problem

US Government releases first batch of UAP documents and videos

Non-determinism is an issue with patching CVEs

How do I deal with memory leaks? (2022)

Poland is now among the 20 largest economies

PC Engine CPU

Hosting a Site on a Raspberry Pi

OpenAI's WebRTC problem

Comments

Google broke reCAPTCHA for de-googled Android users

OpenAI's WebRTC problem

The React2Shell Story

Wi is Fi: Understanding Wi-Fi 4/5/6/6E/7/8 (802.11 n/AC/ax/be/bn)

AI is breaking two vulnerability cultures

You gave me a u32. I gave you root. (io_uring ZCRX freelist LPE)

Can LLMs model real-world systems in TLA+?

Light without electricity? Glowing algae could make it possible

AWS North Virginia data center outage – recovery to take hours

Cartoon Network Flash Games

David Attenborough's 100th Birthday

When is your birthday? The math behind hash collisions

Serving a website on a Raspberry Pi Zero running in RAM

An Introduction to Meshtastic

Mux (YC W16) Is Hiring

Looking at the data behind prediction markets

Teaching Claude Why

Meta Shuts Down End-to-End Encryption for Instagram Messaging

Human typing habits and token counts

Bitter Lessons from the ISSpresso

Tesla Model Y Passes NHTSA's New 'Advanced Driver Assistance System' Tests

Rumors of my death are slightly exaggerated

Mojo 1.0 Beta

All means are fair except solving the problem

US Government releases first batch of UAP documents and videos

Non-determinism is an issue with patching CVEs

How do I deal with memory leaks? (2022)

Poland is now among the 20 largest economies

PC Engine CPU

Hosting a Site on a Raspberry Pi