A timeout occurred Error code 524 Visit cloudflare.com for more information. api.novita.ai Host Error What happened? The origin web server timed out responding to this request.
In other words: the model is new, smart, thinks for a long time, the context is huge too, the server doesn’t respond for ages, and our beloved proxy Cloudflare (which has decided to replace the entire Internet) happily kills the connection. (yes, request body contains "stream": true)
Tell me (this is a rhetorical question): who exactly came up with the idea that in the OpenAI protocol the only option available is streaming? Which FAANG genius was that? In my opinion, this is how long-running requests to a server should work:
The client sends a request and immediately receives a task identifier. The client polls the server for the task status and reads the response (buffered in server) in chunks.
Is that hard to implement? Does it require ten interview rounds? Why is it that in my boring enterprise API, when working with, shall we say, leisurely third-party services, my API worked exactly the way I described above? And why didn’t these hyper-smart AI people (the world experts in high-speed matrix multiplication) do the same?
Tell me, what do you think: how exactly does Novita expect people to use long-thinking models if their proxy has a 60-second timeout?
After all, the SSH protocol has special empty keep-alive packets to prevent timeout disconnects. TCP has keep-alive packets.
Windows 2000 already supported SIO_KEEPALIVE_VALS for sockets. That was twenty-five years ago. Supported, as you can see, by Windows (the system that all real hackers despise), and modern data scientists despise it too (they have MacBooks, blue hair, and were born in 1999).
So why haven’t the geniuses of the AI industry thought of keep-alive in SSE? Their API could just send empty events so Cloudflare wouldn’t die.