Qwen3 Coder 480B is Live on Cerebras

https://www.cerebras.ai/blog/qwen3-coder-480b-is-live-on-cerebras

31•retreatguru•11h ago

Comments

retreatguru•11h ago

I'm looking forward to trying this out.

I'd like to try this out: use Claude Code as the interface, setup claude-code-router to connect to Cerebras Qwen3 coder and see 20x speed up. The speed difference might make up for the slightly less intelligence compared to Sonnet or Opus.

I don't see Qwen3 Coder available yet on Open Router https://openrouter.ai/provider/cerebras

retreatguru•9h ago

It's up there now.

gnulinux•7h ago

It's averaging to $0.3/1M input tok and $1.2/1M output tok. That's kind of mind blowingly cheap for a model at its caliber. Gemini 2.5 Pro is more than 10x that price.

alcasa•10h ago

Really cool, especially once 256k context size becomes available.

I think higher performance will be a key differentiator in AI tool quality from a user perspective, especially in use-cases where model quality is already sufficiently good for human-in-loop usage.

gnulinux•7h ago

At $2/1Mt it's cheaper than e.g. Gemini 2.5 Pro which is ($1.25/1Mt for input and $10/1Mt per output). When I code with Aider my requests average to something like 5000 tokens input and 800 tokens output. At this rate, Gemini 2.5 Pro is about $0.01425 per single Aider request and Cerebras Qwen3 Coder is $0.0116 per request. Not a significant difference, but I think sufficiently cheaper to be competitive, especially given Qwen3-coder is on part with Gemini/Claude/o3, it even surpasses them in some tests.

NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging to $0.3/1M input tok and $1.2/1M output tok. That's just so significantly cheaper that I wouldn't be surprised if open weight models start eating Google/Anthropic/OpenAI lunch. https://openrouter.ai/qwen/qwen3-coder

pkaye•6h ago

Do you have any experience on how good is Qwen3-coder compared to Claude 4 Sonnet?

M4v3R•7h ago

2000 tokens per second is absolutely insane for a model that's on par with GPT 4.1. However throughoutput is only one part of the equation, the other being latency. As of right now it looks like the latency for every API call is quite high, it takes few seconds to receive first token for every API call. This means it's not as exciting for agentic use where many API calls are being made in quick succession. I wish providers focused more on this part.

pxc•7h ago

This feels way less annoying to use than ChatGPT. But I wonder how much the effect is lost when the tool does many of the things that make models like o3 useful (repeated web searches, running code in a sandbox, etc.).

For code generation, this does seem pretty useful with something like Qwen3-Coder-480B, if that generates good enough code for your purposes.

But for chat, I wonder: does this kind of speed call for models that behave pretty differently to current ones? With virtually instant speed, I find myself wanting much shorter answers sometimes. Maybe a model whose design and training are focused on concision and a context with lots and lots of turns would be a uniquely useful option with this kind of hardware.

But I guess the hardware is really for training, right, and the inference-as-a-service stuff is basically a powerful form of marketing?

Meta

AI is already replacing jobs per month, report finds

High Content (200 to 300 page) non-fiction book creator with Claude Sonnet 4

The Big World of Tiny Architecture

Keep Calm and Carry On

Thousands of Hot dogs spill across busy highway

Libvirt – incremental backups for raw devices

Tesla to pay $243M in deadly Autopilot crash: 'This will open the floodgates'

Self-Hosting AI Models After Claude's Usage Limits

What Makes an Individual More Likely to Consent to Sex They Do Not Want?

Why are so many British babies being given the name of murderous Hamas terrorist

Durability of clothes is by no means correlated with price, study finds

Just how fast is Cerebras, really? [video]

Saudi Arabia's Revolutionary Solar-Powered Laser Beacons: Lifeline in the Desert

Video] Can I get no-fluff feedback on my Python tutorials?

I don't think AGI is right around the corner – Dwarkesh Patel [video]

ScalVer – calendar‑aware, SemVer‑compatible and extendable versioning scheme

Dead leaf isn't quite a leaf, but a leaf-mimicking spider, discovered 2015

I kept losing my best AI prompts, so I built a place to save and discover them

Jujutsu for Busy Devs, Part 2: "How Do I?"

Who is Alexandre de Moraes?

Show HN: A Toy Sound Generator

'What am I falling in love with?' Human-AI relationships no longer fiction

The Napster (2000)

A Great Way to Snub the World (1981)

UniFi OS Server for MSPs

Before Sebald Was Great

Poet, Artist, Tantric Christian

What Happened to AltaVista? The Rise and Fall of a Search Pioneer

Show HN: Freezewell – A Private Egg Freezing Tracker (Offline App)