For those not following: they are running inference on their own chips way faster than any competition, which I find really useful for refactoring some old code. It's like 3 seconds vs 40 seconds with Claude and the likes.
Their system had been hit or miss where where it would often get overloaded and kick down to slower speeds, but when it worked, it was nearly instant. Until they suddenly announced collaboration with OpenAI and took down non-OpenAI models, eventually pulling gpt-oss as well. I tried Gemini-flash in their absence, but it is nowhere as fast.
So, now they are back with the GLM and Qwen models. I wish the communication from them was better, and they would not leave users in the dark, but regardless, they have a wonderful product, and it's back online.
sysmax•1h ago
Their system had been hit or miss where where it would often get overloaded and kick down to slower speeds, but when it worked, it was nearly instant. Until they suddenly announced collaboration with OpenAI and took down non-OpenAI models, eventually pulling gpt-oss as well. I tried Gemini-flash in their absence, but it is nowhere as fast.
So, now they are back with the GLM and Qwen models. I wish the communication from them was better, and they would not leave users in the dark, but regardless, they have a wonderful product, and it's back online.