I work at a mid-sized startup dealing with latency issues in customer-facing flows that use LLMs. Using OSS-120B seems preferable to 5-mini or Anthropic models in many cases when we need speed, intelligence, and cost control. Is there some catch here beyond needing to acquire higher rate limits?
Comments
jpau•5h ago
I love Cerebras. I also love that they've started to scale rate limits to useful levels (which is relatively new).
I still don't know how long they'll support our chosen model.
On Oct 22 I got an email saying that
```
- qwen-3-coder-480b will be available until Nov 5, 2025
- qwen-3-235b-a22b-thinking-2507 will be available until Nov 14, 2025
```
That's not a lot of notice!
I don't want to spend all my time benchmarking new models for features I already built. I don't want my users' experience to be disturbed every few months.
jpau•5h ago
I still don't know how long they'll support our chosen model.
On Oct 22 I got an email saying that
```
- qwen-3-coder-480b will be available until Nov 5, 2025
- qwen-3-235b-a22b-thinking-2507 will be available until Nov 14, 2025
```
That's not a lot of notice!
I don't want to spend all my time benchmarking new models for features I already built. I don't want my users' experience to be disturbed every few months.