frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
590•klaussilveira•11h ago•170 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
896•xnx•16h ago•544 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
93•matheusalmeida•1d ago•22 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
20•helloplanets•4d ago•13 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
26•videotopia•4d ago•0 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
200•isitcontent•11h ago•24 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
199•dmpetrov•11h ago•91 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
312•vecti•13h ago•136 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
353•aktau•17h ago•176 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
22•romes•4d ago•2 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
354•ostacke•17h ago•92 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
458•todsacerdoti•19h ago•229 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
7•bikenaga•3d ago•1 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
80•quibono•4d ago•18 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
256•eljojo•14h ago•154 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
53•kmm•4d ago•3 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
390•lstoll•17h ago•263 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
231•i5heu•14h ago•177 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
120•SerCe•7h ago•98 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
136•vmatsiiako•16h ago•59 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
68•phreda4•10h ago•12 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
12•neogoose•4h ago•7 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
25•gmays•6h ago•7 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
44•gfortaine•9h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
271•surprisetalk•3d ago•37 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1043•cdrnsf•20h ago•431 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
171•limoce•3d ago•90 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
60•rescrv•19h ago•22 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
89•antves•1d ago•64 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•2 comments
Open in hackernews

Cerebras Code

https://www.cerebras.ai/blog/introducing-cerebras-code
449•d3vr•6mo ago

Comments

namanyayg•6mo ago
I was waiting for more subscription base services to pop up to compete with the influence provider on a commodities level.

I think a lot more companies will follow suit and the competition will make pricing much better for the end user.

congrats on the launch Cerebras team!

sophia01•6mo ago
My understanding is that the coding agents people use can be modified to plug into any LLM provider's API?

The difference here seems to be that Cerebras does not appear to have Qwen3-Coder through their API! So now there is a crazy fast (and apparently good too?) model that they only provide if you pay the crazy monthly sub?

baq•6mo ago
define 'crazy'.

it's two kilotokens per second. that's fast.

ttoinou•6mo ago
I’d say super fast
bangaladore•6mo ago
It's more than 10x faster than the fastest alternative. And roughly 50x the average alternative.

Certainly, somewhere between fast and crazy.

amelius•6mo ago
It generates code faster than I can inspect it.

In other words, it's needlessly fast.

pxc•6mo ago
You might be able to use the extra time to have it do things like run some formatters, linters, run the code in a VM before you inspect it, or modify it for compliance with a style guide that you've written, and continually revise it for up to 5 tries until the conditions are met, something like that.

So maybe there's something useful to do with the extra speed. But it does seem more "useful" for vibe coding than for writing usable/good code.

pxc•6mo ago
You can still get it pay-as-you-go on OpenRouter, afaict, and the billing section of the Cerebras Cloud account I just created has a section for Qwen3-Coder-480B as well.
sophia01•6mo ago
Yeah just checked apparently it is available as a preview (not on main models/pricing page).
social_quotient•6mo ago
Exactly! You can use tools like https://github.com/musistudio/claude-code-router which let you use other LLMs.

The way I would use this $50 Cerebras offering is as a delegate for some high token count items like documentation, lint fixing, and other operations as a way not only to speed up the workflow but to release some back pressure on Anthropic/claude so you don’t hit your limits as quickly… especially with the new weekly throttle coming. This $50 dollar jump seems very reasonable, now for the 1k completions a day, id really want to see and get a feel for how chatty it is.

I suppose thats how it starts but id the model is competent and fast, the speed alone might force you a bit to delegate more to it. (Maybe sub agent tasks)

JackYoustra•6mo ago
I've been waiting on this for a LONG time. Integration with Cursor when Cerebras released their earlier models was patchy at best, even through openrouter. It's nice to finally see official support, although I'm a bit worried about long-term the time for bash mcp calls ending up dominating.

Still, definitely the right direction!

EDIT: doesn't seem like anything but a first-party api with a monthly plan.

ktsakas•6mo ago
Does it work with claude-code-router? I was getting API errors this week trying to use qwen3 Cerebras through OpenRouter with Claude code router.
amirhirsch•6mo ago
API Error: 422 {"error":{"message":"Error from provider: {\"message\":\"body.messages.0.system.content: Input should be a valid string\",\"type\":\"invalid_request_error\",\"param\":\"validation_error\",\"code\":\"wrong_api_format\"}
amirhirsch•6mo ago
i ended up getting it working through copying the transformer in this issue: https://github.com/musistudio/claude-code-router/issues/407

It hits the request per minute limit instantly and then you wait a minute.

nubela•6mo ago
Did you make payment? I also found it unusable due to rate limits. Not sure if it is because I was on the free trial.
d4rkp4ttern•6mo ago
I really wish Qwen3 folks put up an Anthropic-compatible API like the Kimi and GLM/Zai folks cleverly did — this makes their models trivially usable in Claude Code, via this dead-simple setup:

https://github.com/pchalasani/claude-code-tools?tab=readme-o...

thanhhaimai•6mo ago
> running at speeds of up to 2,000 tokens per second, with a 131k-token context window, no proprietary IDE lock-in, and no weekly limits!

I was excited, then I read this:

> Send up to 1,000 messages per day—enough for 3–4 hours of uninterrupted vibe coding.

I don't mind paying for services I use. But it's hard to take this seriously when the first paragraph claim is contradicting the fine prints.

amirhirsch•6mo ago
the distinction is from weekly limits of claude code.
sneilan1•6mo ago
Claude code weekly limits are hard to distinguish. It's not easy to understand their usage limits. I've found when I run into too much opus usage, I switch to sonnet but I've never ran into a usage limit with sonnet 4 yet.
sneilan1•6mo ago
1,000 messages per day should be plenty as a daily development driver. I use claude code sonnet 4 exclusively and I do not send more than 1,000 messages per day. However, that is my current understanding. I am certainly not pressing enter 1,000 times! Maybe there are more messages being sent under the hood that I do not realize?
thanhhaimai•6mo ago
The issue is not about whether the limit is too high or too low. What turned me back was that they claimed "no weekly limits" as a selling feature, without mentioning that they change it to a "daily limits".

I understand it's a sale tactics. But it seems not forthcoming, and it's hard for me to trust the rest of the claims.

twothreeone•6mo ago
I don't see what's hard to understand about this.. other providers have weekly limits and daily limits. If you max out your daily every day you might still hit your weekly after 3-4 days of usage, meaning you cannot send more for the rest of the week. This is saying that no such weekly limit exists on top of the daily. E.g. see https://techcrunch.com/2025/07/28/anthropic-unveils-new-rate...
brandall10•6mo ago
Claude Code does not have a daily limit, it has 5 hour windows that reset. On the $100 plan it's pretty hard to hit a window limit w/ Sonnet unless you're using multiple/subagents. The $200 is better suited to those who do that or want to use a significant amount of Opus.

Also the weekly limit selling point is silly - it almost certainly only impacts those who are abusing, ie. running 24/7.

b2m9•6mo ago
At this point I'm afraid to ask, but I will do I anyways:

How do Claude's rate limits actually work?

I'm not a Pro/Max5/Max20 subscriber, only light API usage for Anthropic - so it's likely that I don't really understand the limits there.

For example, community reports that Anthropic's message limit for Max 5 translates to roughly 88k token per 5-hour window (there's variance, but it's somewhere in this 80-120k ballpark based on system load; also assuming Sonnet, not Opus). A normal user probably won't consume more than 250k token per day with this subscription. That's like 5M token for a month of 20 active days - which doesn't justify the 100 USD subscription cost. This also doesn't square with Anthropic's statement that users can consume 10000+ USD of usage on the Max 20 tier.

I'm clearly misunderstanding Claude's rate limits here. Can someone enlighten me? Is the 5-hour window somehow per task/instance and not per account?

ixel•6mo ago
With Anthropic's Claude subscriptions - while many people appear to use tokens as an idea of the usage limit, I doubt that's what is really used by Anthropic. Why do I say this? Well, there are multiple models, Haiku, Sonnet and Opus, we all know that Opus is the most expensive and burns through the usage limit of the subscription the fastest of all. I'd theorise that Anthropic have some kind of internal credit value (perhaps as simple as $ USD) which they allocate with some variance based on things like overall system load.

Anyway, my personal experience on Max 20x is that, with Opus at least, on a busy day in the past I can burn through between 150 to 200 million tokens in a day using Claude Code for development stuff. Split that up into 5 hour windows, and assume I'm possibly using 2 or 3 windows in a day, that still works out to a lot of tokens, well into the millions. So, the 88k tokens per 5-hour window on Max 5x, I'm not sure if it's really as small as that. Maybe the apparent reductions recently in usage limits have made it drop to around that ballpark. Originally I saw Max 5x as a heavy usage Sonnet plan, with Max 20x being a heavy usage Opus plan, however with the new and additional weekly usage limit happening on August 28th I think I'd see the plans as potentially moderate to heavy usage Sonnet for Max 5x, and heavy usage Sonnet with multiple concurrent agents for Max 20x.

TLDR: I strongly imagine that Claude subscription usage limits are based on some kind of internal credit value, perhaps $ USD, not specifically tokens, and depending which model you use is how fast this "credit" will be depleted.

The usage limits are currently for an account, based on a 5-hour window, from the first message that was sent in a new 5-hour window. From August 28th there's an additional weekly limit which looks like it will primarily make Opus usage restricted.

brandall10•6mo ago
It's probably best to look at it as credit based, which map to a certain scale of particular tokens (ie. an Opus token takes 5x the credits of a Sonnet token).
asaddhamani•6mo ago
Claude now does have a weekly limit so if you are able to hit your weekly (undisclosed, dynamic) limit in 2 days, you're unable to use the services for the next 5 days. That is what Cerebras is referencing with no weekly limits. Claude has session count limits, dynamic limits within each session, and now weekly limits on top of all that.
brandall10•6mo ago
Please read my full comment.

Cerebras is jumping on a marketing faux-pas by Anthropic. I say this for the point you bring up about monthly session limits - no one on the Claude subreddit has yet to report being hit by this despite many going way over that. These are checks to deal w/ abusive accounts.

d3vr•6mo ago
> no one on the Claude subreddit has yet to report being hit by this despite many going way over that

Because it hasn't gone into effect yet: "From August 28, we’ll introduce new weekly limits that’ll mitigate these problems while impacting as few customers as possible." [0]

[0] https://xcancel.com/AnthropicAI/status/1949898514844307953#m

brandall10•6mo ago
I already explained why it's not a legitimate market differentiation.
brendoelfrendo•6mo ago
It's not hard to understand, but I think there's a compelling argument to be made that the "daily AND weekly" limits is surely user hostile and differing limits across different pricing tiers can make it harder to tell at a glance what you actually need. It's not that Cerebrus has a feature worth advertising, it's that everyone else has introduced an anti-feature that has become the norm.
twothreeone•6mo ago
I don't know, I thought it was useful info given the context of the market. When I buy any service in general (e.g. a phone line) I'd like to know the highlights that differentiate that particular provider from others. And it didn't seem to me like this was front and center to their marketing, it sure seems like their output speed is the killer feature. This was just another item mentioned at the end of a sentence which also says a number of other things and just provides additional clarity about the endpoint.. ¯\_ (ツ)_/¯
SamDc73•6mo ago
Still not sure if it's 1000 messages or calls though, if messages that's good.
diggan•6mo ago
Neither, it seems. The blog post says "Send up to N messages per day", but the FAQ (https://cerebras-inference.help.usepylon.com/articles/346886...) says:

> How do you calculate messages per day? Actual number of messages per day depends on token usage per request. Estimates based on average requests of ~8k tokens each for a median user.

So seems there is a token limit? But they're not clear what exactly that is? Haven't tried to subscribe, just going by public information available.

bluelightning2k•6mo ago
Logically it can only be API call based, as you bring your own IDE plugin. So there's no possibility it's based upon any UI level concept such as top level messages. The subscription wouldn't even necessarily know.
itsafarqueue•6mo ago
Your “one enter” press might generate dozens or even hundreds of messages in an agent. Every file read, re-read, read a bit more, edit, whoops re-edit, ls, grep, etc etc counts as a message.
superasn•6mo ago
Pretty sure this is there to prevent this[1] from happening to them

[1] https://www.viberank.app/

bravesoul2•6mo ago
That's a CO2 emissions leader board!
LudwigNagasena•6mo ago
That’s almost no CO2 emissions at all. Here is a CO2 emissions leaderboard (need to sort by the correct column): https://celebrityprivatejettracker.com/leaderboard/
wraptile•6mo ago
The number one has 32k which is equivellent of 64,000 commercial transantlantic flight trips (per person). For reference, 2024 had a record flights summer of 140k.
yowlingcat•6mo ago
A commercial transatlantic flight costs $0.50 per person?
HighGoldstein•6mo ago
For a moment I thought it might be the presidential plane, which would explain the emissions, but no, for some reason Trump's personal plane is a whole ass Boring 757
bravesoul2•6mo ago
I'm surprised there hasn't been dick swinging pressure for some billionaire (the type who cant remember how many billions but net worth probably begins with a 1 due to Benford's law) to get a dreamliner as their private jet.
sunaookami•6mo ago
Inference != training
echelon•6mo ago
Oh my god. That's insane.

The anti-AI people would be pulling their pitchforks out against these people.

Would there be any way of compiling this without people's consent? Looking at GitHub public repos, etc.?

I imagine a future where we're all automatically profiled like this. Kind of like perverse employee tracking software.

stingraycharles•6mo ago
The pro-AI people are as well, as these people are all on the Claude Max plan, and they’re just burning through resources for internet lols, while ruining the fun for the rest of us. It’s the tragedy of the commons at work.
kristjansson•6mo ago
It’s a true statement - no weekly limits, just a daily limit. Easier to work with when you can only get locked out of your tool for 23h59m
brandall10•6mo ago
The CC weekly limits are in place to thwart abuse. This bit of marketing isn't particular useful as that limit primarily impacts those who are running it at all hours.

OTOH, 5 hour limits are far superior to daily limits when both can be realistically hit.

bluelightning2k•6mo ago
What a productive second you must have had
bongodongobob•6mo ago
The weekly limit is the daily limit x 7.
handfuloflight•6mo ago
You're going to send 1,000 messages in 1 minute?
newswasboring•6mo ago
Slightly funny in light of this https://www.catherinejue.com/fast
attentive•6mo ago
To put this into perspective, github copilot Business license is 300 "premium" requests a MONTH.
Paradigma11•6mo ago
But one premium request includes all subrequests from tool use and/or internal follow up requests.
Palmik•6mo ago
Yes, to differentiate from Claude Code which has 5-hour-window limits as well as weekly limits on top
weitendorf•6mo ago
We're just doing usage-based pricing for our ai devtools product because it's the only way to square the circle of "as much access to an expensive thing as you want, at a reasonable price".

It's harder to set up, lends itself to lower margins, and consumers generally do prefer more predictable/simpler pricing, but so many ai devtools products have pissed their users off by throttling their "unlimited"/plan-based pricing that I think it's now seen as a yellow flag

dude250711•6mo ago
[flagged]
fishsticks89•6mo ago
It will just be replaced by more vibe code in the future. Code is like toilet paper now.
reactordev•6mo ago
Nah, we’ll have a Legacy Coder agent to fix vibe coding agents so you’ll be supervising those. Yey…
andrewmutz•6mo ago
If you review every change as it goes, vibecoded results are often better than human-only and written much faster
jbc1•6mo ago
If you’re reviewing every change then what does “vibe coding” even mean?
chpatrick•6mo ago
It's not like human code doesn't need review.
jazzyjackson•6mo ago
the usage of vibe coding in my experience is towards those folks who run whatever the AI produced and if it does what they expect without throwing errors they ship it. If it throws errors they plug that back into the chatbot until the code stops throwing errors.

The whole point of vibe coding is its working faster than you would on your own. If you're reviewing it carefully and understand how it works, you might as well have written it by hand.

tjr•6mo ago
Even if it appears to do what you want, but you don't actually read and understand the code, how do you know it doesn't do something else also? Maybe something you don't want?
jdiff•6mo ago
Irrelevant in vibe coding. If it walks like a duck and quacks like a duck, you don't go looking for extra heads, eyes, fingers, tongues, or tails. You ship it then throw repl.it under the bus when it blows up.
steve_adams_86•6mo ago
If I'm not mistaken, vibe coding is supposed to be when you don't review at all, you just let'r rip. Reviewing the AI's code is just... Like if coding was riding a bike, and you got an electric bike. Kind of. It doesn't seem like vibes to me.
0xfaded•6mo ago
This is me seeing co-workers PRs :(
teaearlgraycold•6mo ago
I call this "half vibe coding" (needs a better term). For instances where you know how you'll solve a problem but don't want to type it all out it's great. I tend to comb through the output. Even the SOTA models will make pretty bad performance mistakes, poor maintenance decisions, etc. But it's super useful for getting over the hump of getting started on something.
dxxvi•6mo ago
I agree 100%.
dang•6mo ago
"Don't be snarky."

"Don't be curmudgeonly."

https://news.ycombinator.com/newsguidelines.html

sneilan1•6mo ago
I'm so excited to see a real competitor to Claude Code! Gemini CLI, while decent, does not have a $200/month pricing model and they charge per API access - Codex is the same. I'm trying to get into the https://cloud.cerebras.ai/ to try the $50/month plan but I can't even get in.
bangaladore•6mo ago
Unless I'm misunderstanding something. Cerebras Code is not equivalent to Claude Code or Gemini CLI. It's a strange name for a subscription to access an API endpoint.

You take your Cerebras Code endpoint and configure XYZ CLI tool or IDE plugin to point at it.

sneilan1•6mo ago
Oh so this is not an integrated command line tool like Claude code? I assumed it was something where Cerebras released a decent prompt and command line agent setup. A lot of the value of Claude Code is how polished it is and how much work went into the prompt design.
unshavedyak•6mo ago
There is i believe a forked Gemini Code which will work like Claude Code, or so it looks like on Youtube.
d4rkp4ttern•6mo ago
Yes it’s called OpenCode and today I was surprised to learn it works with Claude Pro/Max subscriptions:

https://opencode.ai/docs/

dcre•6mo ago
OpenCode is not a fork of the Gemini CLI. It is its own thing.
d4rkp4ttern•6mo ago
Ah I see, didn’t know
unshavedyak•6mo ago
https://github.com/QwenLM/Qwen3-Coder is the gemini code fork i was referring to i think. Not positive
unshavedyak•6mo ago
Wait no, looks like that's https://github.com/QwenLM/qwen-code
wordofx•6mo ago
This doesn’t feel like a competitor. Amp does tho.
flashblaze•6mo ago
I don't hear about Amp often. Have you tried it? How does it compare to Claude Code?
wordofx•6mo ago
It’s really good. Was discussing it with a friend recently who said he thinks it works out cheaper because it takes less loops to get things right. I’ve been having better success with it so would recommend it over Claude Code for now.
alfalfasprout•6mo ago
2k tokens/second is insane. While I'm very much against vibe coding, such performance essentially means you can get near-github copilot level speed with drastically better quality.

For in-editor use that's game changing.

itsafarqueue•6mo ago
At full pace that means 62 mins until you hit the daily cap.
kvemkon•6mo ago
Reminds me of high write speed on SSD (1.5 GB/s continuously to TLC) means 1 TB SSD warranty expires instead of 5 years just in less than 5 days (600 TB written).
sliken•6mo ago
Or comcast's 1 gb service with a 1TB limit, which means you hit the limit in 140 minutes at full rate.
knicholes•6mo ago
It says it works with your favorite IDE-- How do you (the reader) plan to use this? I use Cursor, but I'm not sure if this replaces my need to pay for Cursor, or if I need to pay for Cursor AND this, and add in the LLM?

Or is VS code pretty good at this point? Or is there something better? These are the only two ways I'd know how to actually consume this with any success.

alfalfasprout•6mo ago
any plugin that allows using an OpenAI compatible endpoint should work fine (eg; RooCode, Cline, etc. for VSCode).

Personally, I use code-companion on neovim.

Maybe not the best solution for vibe coders but for serious engineers using these tools for AI-assisted development, OpenAI API compatibility means total flexibility.

HardCodedBias•6mo ago
This has to be a monstrous money loser.

If they can maintain this pricing level, and if Qwen3‑Coder is as good as people say then they will have an enormous hit on their hands. A massive money losing hit, but a hit.

Very interesting!

PS: Did they reduce the context window, it looks like it.

kristopolous•6mo ago
They are a hardware company. They have a custom chip they are running it on.

The $200/month is their "poor person" product for people who can't shell out $500k on one of their rigs.

https://www.cerebras.ai/system

HardCodedBias•6mo ago
I know. These things are unbelievable machines. The people at Cerebras are fearless wrt. taking on difficult hardware challenges.

But this will certainly be a money loser. They have likely been waiting for an open source model that somewhat conforms to their hardware's limitations and which gives acceptable recommendations.

It looks like they have found it with QWEN. We'll see!

bdcravens•6mo ago
OpenAI lost around $5B last year.

https://www.lesswrong.com/posts/CCQsQnCMWhJcCFY9x/openai-los...

UnPerson-Alpha2•6mo ago
Honest ? What are you thinking in terms of cost structure that makes you sure it is a money loser? Can you break down your assumptions.
ahmadyan•6mo ago
Why?

For $200plan, it has 40M token cap per day, so assuming the API pricing, the max usage per day is $12/day or 360 per month. (Assuming user max-out usage every day or doesn't hit the 1000message limit first)

relatively standard subscription pricing vs API pricing, i believe they are making money from this and counting on people compare this to Claude Code, which is a much more generous offer.

supernova8•6mo ago
How is this even possible?
kristopolous•6mo ago
It's their own hardware : https://www.cerebras.ai/blog/cerebras-cs3
unshavedyak•6mo ago
Incase i'm missing something, why wouldn't it be possible?

Claude and Gemini have similar offerings for a similar/same price, i thought. Eg if Claude Code can do it for $200/m, why can't Cerebras?

(honest question, trying to understand the challenge for Cerebras that you're pointing to)

edit: Maybe it's the speed? 2k tokens/s sounds... fast, much faster than Claude. Is that what you're referring to?

UnPerson-Alpha2•6mo ago
He just wrote another way of making an exclamation, like "wow, incredible!".
meepmorp•6mo ago
They make frisbee-sized CPUs.
sliken•6mo ago
Indeed. Pretty much all silicon today comes on 12" or so wafers, broken into chip sized pieces, and each chip is tested and the ones that failed are thrown away.

Cerebras uses the entire 12" and builds in redundancy so that with current defect rates a large fraction of the wafers are usable. This allows a huge level of parallelism, a large amount of on board ram, and the removal of the need to move data on/off the wafer. So the available bandwidth is insane and inference is mostly bandwidth limited.

clbrmbr•6mo ago
At $200/month the comparable should be Opus 4 not Sonnet 4.
rowanG077•6mo ago
Not really. With Opus 4 you will burn into the thousand a month with serious usage. I tested it yesterday and 5 hours of use was 60$. If I extrapolate that you will easily hit 1K+.
lordofgibbons•6mo ago
Are you comparing using opus via API based usage vs opus via the $200/mo plan?
rowanG077•6mo ago
I didn't know anthropic offered a fixed price version.
unshavedyak•6mo ago
Super curious to see some comparisons to claude code. Especially Opus, since they're primarily comparing it to Sonnet in that graph.
dpkirchner•6mo ago
For those that have tried this, what kind of time-to-first-token latency are you seeing?
txyx303•6mo ago
feels very low compared to claude/gpt for me
anonym29•6mo ago
I had 9 seconds, earlier with Cline. That said, resulting output file I had requested generation of was over 122KB in 58.690 seconds, so I was approaching 2KB per second even factoring in high TTFT.
M4v3R•6mo ago
The high TTFT (around 5-6 seconds) is what kills the excitement for this for me. Sure, when it starts outputting its crazy fast so it’s good for generating single file prototypes, but as soon as you try to use it in Cline or any other agentic loop you’ll be waiting for API requests constantly and it’s a real bottleneck.
hollerith•6mo ago
TTFT == time to first token.

(I would've just said, "the throughput is fantastic, but the latency is about 3 times higher than other offerings".)

crawshaw•6mo ago
If you would like to try this in a coding agent (we find the qwen3-coder model works really well in agents!), we have been experimenting with Cerebras Code in Sketch. We just pushed support, so you can run it with the latest version, 0.0.33:

  brew install boldsoftware/tap/sketch
  CEREBRAS_API_KEY=...
  sketch --model=qwen3-coder-cerebras -skaband-addr=
Our experience is it seems overloaded right now, to the point where we have better results with our usual hosted version:

  sketch --model=qwen
lvl155•6mo ago
Their hardware is incredible. Why aren’t more investors lining up for this in this environment?
dmitrygr•6mo ago
Contradictions do not exist. Whenever you think that you are facing a contradiction, check your premises. You will find that one of them is wrong.
thfuran•6mo ago
Neither do perfectly efficient, perfectly rational markets.
sejje•6mo ago
A perfectly efficient market would be a bad premise, sure.
orbifold•6mo ago
In this case the hardware is a nightmare to program.
dmitrygr•6mo ago
Bingo
arisAlexis•6mo ago
Or just bad marketing vs the Goliath (Nvidia)
Invictus0•6mo ago
Kurt Gödel would like a word
no_flaks_given•6mo ago
This model is super quantized and the quality isn't great, but that's necessary because just like everyone else except for Nvidia and AMD

They shat the bed. They went for super crazy fast compute and not much memory, assuming that models would plateu at a fee billion parameters.

Last year 70b parameters was considered huge, and a good place to standardize around.

Today we have 1t parameter models and we know it still scales linearly with parameters.

So next year we might have 10T parameter LLMs and these guys will still be playing catch up.

All that matters for inference right now is how many HBM chips you can stack and that's it

smallerize•6mo ago
Cerebras doesn't normally quantize the models. Do you have more information about this?
d3vr•6mo ago
It's FP8 [0]

[0]: https://xcancel.com/CerebrasSystems/status/19513503371867015...

lxe•6mo ago
Is this available as cline/roo-code integration? I think it might be on openrouter too.
d3vr•6mo ago
Cline support added in v3.20.4: https://github.com/cline/cline/releases/tag/v3.20.4

Roo Code support added in v3.25.5: https://github.com/RooCodeInc/Roo-Code/releases/tag/v3.25.5

Cerebras has also been added as a provider for Qwen 3 Coder in OpenRouter: https://openrouter.ai/qwen/qwen3-coder?sort=throughput

d3vr•6mo ago
BTW you can also go through HuggingFace: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct?i...
scosman•6mo ago
Groq also probably has this in the works. Fun times.
Consumer-Basics•6mo ago
Verified or just a thought?
Flux159•6mo ago
Tried this out with Cline using my own API key (Cerebras is also available as a provider for Qwen3 Coder via via openrouter here: https://openrouter.ai/qwen/qwen3-coder) and realized that without caching, this becomes very expensive very quickly. Specifically, after each new tool call, you're sending the entire previous message history as input tokens - which are priced at $2/1M via the API just like output tokens.

The quality is also not quite what Claude Code gave me, but the speed is definitely way faster. If Cerebras supported caching & reduced token pricing for using the cache I think I would run this more, but right now it's too expensive per agent run.

Havoc•6mo ago
This seems to be rate limited by message not token so the lack of cache may matter less
Flux159•6mo ago
The lack of caching causes the price to increase for each message or tool call in a chat because you need to send the entire history back after every tool call. Because there isn’t any discount for cached tokens you’re looking at very expensive chat threads.
NitpickLawyer•6mo ago
Yes, but the new "thing" now is "agentic" where the driver is "tool use". So at every point where the LLM decides to make a tool use, there is a new request that gets sent. So a simple task where the model needs to edit one function down the tree, there might be 10 calls - 1st with the task, 2-5 for "read_file", then the model starts writing code, 6-7 trying to run the code, 8 fixing something, and so on...
itsafarqueue•6mo ago
Yup. If you’ve ever watched a 60+ minute agent loop spawning sub agents, your “one message” prompt leaves you several hundred messages in the hole.
andhuman•6mo ago
No it’s by token. The FAQ says this:

> Actual number of messages per day depends on token usage per request. Estimates based on average requests of ~8k tokens each for a median user.

https://cerebras-inference.help.usepylon.com/articles/346886...

jtbayly•6mo ago
How did you find that? Are you sure it applies to Cerebras Code Pro or Max?
sysmax•6mo ago
Adding entire files into the context window and letting the AI sift through it is a very wasteful approach.

It was adopted because trying to generate diffs with AI opens a whole new can of worms, but there's a very efficient approach in between: slice the files on the symbol level.

So if the AI only needs the declaration of foo() and the definition of bar(), the entire file can be collapsed like this:

  class MyClass {
    void foo();
    
    void bar() {
        //code
    }
  }
Any AI-suggested changes are then easy to merge back (renamings are the only notable exception), so it works really fast.

I am currently working on an editor that combines this approach with the ability to step back-and-forth between the edits, and it works really well. I absolutely love the Cerebras platform (they have a free tier directly and pay-as-you-go offering via OpenRouter). It can get very annoying refactorings done in one or two seconds based on single-sentence prompts, and it usually costs about half a cent per refactoring in tokens. Also great for things like applying known algorithms to spread out data structures, where including all files would kill the context window, but pulling individual types works just fine with a fraction of tokens.

If you don't mind the shameless plug, there's a more explanation how it works here: https://sysprogs.com/CodeVROOM/documentation/concepts/symbol...

postalcoder•6mo ago
this works if your code is exceptionally well composed. anything less can lead to looney tunes levels of goofiness in behavior, especially if there’s as little as one or two lines of crucial context elsewhere in the file.

This approach saves tokens theoretically, but i find it can lead to wastefulness as it tries to figure out why things aren’t working when loading the full file would have solved the problem in a single step.

sysmax•6mo ago
It greatly depends on the type of work you are trying to delegate to the AI. If you ask it to add one entire feature at a time, file level could work better. But the time and costs go up very fast, and it's harder to review.

What works for me (adding features to huge interconnected projects), is think what classes, algorithms and interfaces I want to add, and then give very brief prompts like "split class into abstract base + child like this" and "add another child supporting x,y and z".

So, I still make all the key decisions myself, but I get to skip typing the most annoying and repetitive parts. Also, the code don't look much different from what I could have written by hand, just gets done about 5x faster.

DrBenCarson•6mo ago
Yep and it collapses in the enterprise. The code you’re referencing might well be from some niche vendor’s bloated library with multiple incoherent abstractions, etc. Context is necessarily big
sysmax•6mo ago
Ironically, that's how I got the whole idea of symbol-level edits. I was working on project like that, and realized that a lot of work is actually fairly small edits. But to do one right, you need to you need to look through a bunch of classes, abstraction layers, and similar implementations, and then keep in your head how to get an instance of X from a pointer to Y, etc. Very annoying repetitive work.

I tried copy-pasting all the relevant parts into ChatGPT and gave it instructions like "add support for X to Y, similar to Z", and it got it pretty well each time. The bottleneck was really pasting things into the context window, and merging the changes back. So, I made a GUI that automated it - showed links on top of functions/classes to quickly attach them into the context window, either as just declarations, or as editable chunks.

That worked faster, but navigating to definitions and manually clicking on top of them still looked like an unnecessary step. But if you asked the model "hey, don't follow these instructions yet, just tell me which symbols you need to complete them", it would give reasonable machine-readable results. And then it's easy to look them up on the symbol level, and do the actual edit with them.

It doesn't do magic, but takes most of the effort out of getting the first draft of the edit, than you can then verify, tweak, and step through in a debugger.

hooo•6mo ago
Totally agree with your view on the symbolic context injection. Is this how things are done with code/dev AI right now? Like if you consider the state of the art.
seunosewa•6mo ago
They search for the token of interest, e.g. grep -n then they read that line and the next 50 lines or so. They continue until they get to the end.
BenGosub•6mo ago
If they say it costs $50 per month, why do you need to make additional payments?
davidweatherall•6mo ago
$50 per month is their SaaS solution that let's you make 1000 requests per day. The openrouter cost is the raw API cost if you try to use qwen3-coder via the pay as you go model when using Cline
seunosewa•6mo ago
The Cerebras.ai plan offers a flat fee of $50 or $200.

The API price is not a reason to reject the subscription price.

dedene•6mo ago
The flat fee is for a fixed max amount of tokens per day. Not requests, tokens.
beastman82•6mo ago
the API price is not very relevant to this flat fee service announcement.

In fact it seems obvious that you should use the flat fee model instead

waldrews•6mo ago
Does caching make as much sense as a cost saving measure on Cerebras hardware as it does on mainstream GPU's? Caching should be preferred if SSD->VRAM is dramatically cheaper than recalculation. If Cerebras is optimized for massively parallel compute with fixed weights, and not a lot of memory bandwidth into or out of the big wafer, it might actually make sense to price per token without a caching discount. Could someone from the company (or otherwise familiar with it) comment on the tradeoff?
cellis•6mo ago
What are the token prices?
anonym29•6mo ago
$2/Mtok in and out but no caching discounts
deevus•6mo ago
I'm finding myself switching between subscriptions to ChatGPT, T3 Chat, DeepSeek, Claude Code etc. Their subscription models aren't compatible with making it easy to take your data with you. I wish I could try this out and import all my data.
atkailash•6mo ago
I use regular cerebras for plan stage in cline, so I’m very excited to try this out
attentive•6mo ago
Attn: Cerebras

Any attempt to deal with "<think>" in the code gets it replaced with "<tool_call>".

Both in inference.cerebras.ai chat and API.

Same model on chat.qwen.ai doesn't do it.

unraveller•6mo ago
Some users who signed up for pro ($50 p.m.) are reporting further limitations than those advertised.

>While they advertise a 1,000-request limit, the actual daily constraint is a 7.5 million-token limit. [1]

Assumes an average of 7.5k/request whereas in their marketing videos they show API requests ballooning by ~24k per request. Still lower than the API price.

[1] https://old.reddit.com/r/LocalLLaMA/comments/1mfeazc/cerebra...

itsafarqueue•6mo ago
Bait and switched their FAQ after the fact too. Come on Cerebras, it’s only VC money you’re burning here in the first place, let’s see some commitment to winning market share. :money: :fire:
apwell23•6mo ago
shocking..
nickandbro•6mo ago
Had a similar experience. I got rate limited as well even when I well below 1M tokens. When its working, it's nice, but can't use it as a replacement for Cursor until higher rate limits are granted.
esafak•6mo ago
They should just host all the latest open source models FTW.
segmondy•6mo ago
FYI, you are probably going to use up your tokens because there's a total limit of tokens per day, so in about 300 requests it's feasible to use it all up. See https://www.reddit.com/r/LocalLLaMA/comments/1mfeazc/cerebra...
exclipy•6mo ago
Windsurf also has Cerebras/Qwen3-Coder. 1000 user messages per month for $15

https://x.com/windsurf/status/1951340259192742063

bluelightning2k•6mo ago
This is awesome. I still use windsurf and like it. Their tab model is really good
another_twist•6mo ago
How does context buildup work for the code generating machines generally ? Do the programs just use human notes + current code directly ? Are there some specific ranking steps that need to be done ?
jedisct1•6mo ago
I'm a little bit confused.

I subscribed to the $50 plan. It's super fast for sure, but rate limits kick in after just a couple requests. completely defeating the fact that responses are fast.

Did I miss something?

saberience•6mo ago
Ok it's fast, but rate limits seem to kick in extremely quickly and the results are less good than Claude Code and it ends up more expensive?

Who is the intended audience for Cerebras?

ritenuto•6mo ago
While I’m also curious, I’m fine with having a mostly inferior alternative too. This is a dynamic market with some big players already; having more options is beneficial. If only as a way to prevent others from doing a rug pull.
ixel•6mo ago
The usage limit on Cerebras Code is rather limited, $50 plan apparently gives you 7.5 million tokens per day which doesn't last long. This also isn't clearly advertised on the plans prior to purchasing.
d3vr•6mo ago
Yeah really disappointing, hopefully they'll reconsider this limit because it really isn't usable, especially with "agentic tools" (e.g: opencode) ..
romanovcode•6mo ago
> and no weekly limits!

No weekly limits so far. Just you wait if you get same or more traction as Claude you are going to go same playbook as they did.

scosman•6mo ago
Anyone get this working in Cursor? I can connect openrouter just fine, but Cerebras just errors out instantly. Same url/key works via curl, so some sort of Cerebras/Cursor compatibility issue.
dlojudice•6mo ago
Same here. Got this msg on the Celebras discord:

> Yeah I filed a ticket with Cursor

> They have problems with OpenAI customization

hereme888•6mo ago
So for <$1.7/day I can hire a programmer at a sort-of Claude Sonnet 4 level? I know it's got its quirks, limits, and needs supervision, but it's like 20x cheaper than an average programmer.
tbarbugli•6mo ago
ofc it depends where you would hire, for me (NL) its above 100x more efficient
rbitar•6mo ago
This token throughput is incredible and going to set a new bar in the industry. The main issue with the cerebras code plan is that number of requests/minute is throttled, and with agentic coding systems each tool call is treated as new "message" so you can easily hit the api limits (10 messages/minute).

One workaround we're doing now that seems to work is use claude for all tasks but delegate specific tools with cerebras/qwen-3-coder-480b model to generate files or other token heavy tasks to avoid spiking the total number of requests. This has cost and latency consequences (and adds complexity to the code), but until those throttle limits are lifted seems to be a good combo. I also find that claude has better quality with tool selection when the number of tools required is > 15 which our current setup has.