frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Why is Zig so cool?

https://nilostolte.github.io/tech/articles/ZigCool.html
266•vitalnodo•7h ago•144 comments

Snapchat open-sources Valdi a cross-platform UI framework

https://github.com/Snapchat/Valdi
192•yehiaabdelm•6h ago•45 comments

Becoming a Compiler Engineer

https://rona.substack.com/p/becoming-a-compiler-engineer
184•lalitkale•9h ago•79 comments

Myna: Monospace typeface designed for symbol-heavy programming languages

https://github.com/sayyadirfanali/Myna
244•birdculture•12h ago•103 comments

How did I get here?

https://how-did-i-get-here.net/
194•zachlatta•11h ago•34 comments

Immutable Software Deploys Using ZFS Jails on FreeBSD

https://conradresearch.com/articles/immutable-software-deploy-zfs-jails
54•vermaden•6h ago•20 comments

Ruby Solved My Problem

https://newsletter.masilotti.com/p/ruby-already-solved-my-problem
211•joemasilotti•12h ago•78 comments

Why I love OCaml (2023)

https://mccd.space/posts/ocaml-the-worlds-best/
314•art-w•16h ago•217 comments

Cerebras Code now supports GLM 4.6 at 1000 tokens/sec

https://www.cerebras.ai/code
66•nathabonfim59•7h ago•39 comments

How to find your ideal customer, right away

https://www.reifyworks.com/writing/2023-01-30-iicp
14•mrbbk•4d ago•2 comments

YouTube Removes Windows 11 Bypass Tutorials, Claims 'Risk of Physical Harm'

https://news.itsfoss.com/youtube-removes-windows-11-bypass-tutorials/
543•WaitWaitWha•10h ago•190 comments

Can you save on LLM tokens using images instead of text?

https://pagewatch.ai/blog/post/llm-text-as-image-tokens/
13•lpellis•6d ago•4 comments

Show HN: Find matching acrylic paints for any HEX color

https://acrylicmatch.com/
13•dotspencer•4d ago•6 comments

FSF40 Hackathon

https://www.fsf.org/events/fsf40-hackathon
71•salutis•4d ago•2 comments

How a devboard works (and how to make your own)

https://kaipereira.com/journal/build-a-devboard
63•kaipereira•8h ago•8 comments

Running a 68060 CPU in Quadra 650

https://github.com/ZigZagJoe/Macintosh-Q650-68060
27•zdw•5h ago•1 comments

Venn Diagram for 7 Sets

https://moebio.com/research/sevensets/
114•bramadityaw•3d ago•24 comments

Mullvad: Shutting down our search proxy Leta

https://mullvad.net/en/blog/shutting-down-our-search-proxy-leta
104•holysoles•6h ago•57 comments

Transducer: Composition, abstraction, performance (2018)

https://funktionale-programmierung.de/en/2018/03/22/transducer.html
91•defmarco•3d ago•3 comments

Angel Investors, a Field Guide

https://www.jeanyang.com/posts/angel-investors-a-field-guide/
128•azhenley•14h ago•27 comments

Local First Htmx

https://elijahm.com/posts/local_first_htmx/
15•srid•4h ago•8 comments

Using the Web Monetization API for fun and profit

https://blog.tomayac.com/2025/11/07/using-the-web-monetization-api-for-fun-and-profit/
48•tomayac•8h ago•11 comments

Blood, Brick and Legend: The Chemistry of Dracula's Castle

https://news.research.gatech.edu/2025/10/31/blood-brick-and-legend-chemistry-draculas-castle
4•dhfbshfbu4u3•4d ago•0 comments

Ribir: Non-intrusive GUI framework for Rust/WASM

https://github.com/RibirX/Ribir
55•adamnemecek•10h ago•7 comments

Oddest ChatGPT leaks yet: Cringey chat logs found in Google Analytics tool

https://arstechnica.com/tech-policy/2025/11/oddest-chatgpt-leaks-yet-cringey-chat-logs-found-in-g...
45•vlod•3h ago•11 comments

Why I love my Boox Palma e-reader

https://minimal.bearblog.dev/why-i-love-my-boox-palma-e-reader/
53•pastel5•5d ago•28 comments

Analysis of Hedy Lamarr's Contribution to Spread-Spectrum Communication

https://researchers.one/articles/24.01.00001v4
52•drmpeg•7h ago•37 comments

Shell Grotto: England's mysterious underground seashell chamber

https://boingboing.net/2025/09/05/shell-grotto-englands-mysterious-underground-seashell-chamber.html
19•the-mitr•3d ago•6 comments

VLC's Jean-Baptiste Kempf Receives the European SFS Award 2025

https://fsfe.org/news/2025/news-20251107-01.en.html
292•kirschner•10h ago•52 comments

James Watson has died

https://www.nytimes.com/2025/11/07/science/james-watson-dead.html
285•granzymes•11h ago•157 comments
Open in hackernews

Cerebras Code now supports GLM 4.6 at 1000 tokens/sec

https://www.cerebras.ai/code
66•nathabonfim59•7h ago

Comments

alyxya•3h ago
It would be nice if there was more information provided on that page. I assume this is just the output token generation speed. Is it using speculative decoding to get to 1000 tokens/sec? Is there lossy quantization being used to speed things up? I tend to think the number of tokens per second a model can generate to be relatively low on the list of things I care about, when things like model/inference quality and harness play a much bigger role in how I feel about using a coding agent.
cschneid•3h ago
Yes this is the output speed. Code just flashes onto the page, it's pretty impressive.

They've claimed repeatedly in their discord that they don't quantize models.

The speed of things does change how you interact with it I think. I had this new GLM model hooked up to opencode as the harness with their $50/mo subscription plan. It was seriously fast to answer questions, although there are still big pauses in workflow when the per-minute request cap is hit.

I got a meaningful refactor done, maybe a touch faster than I would have in claude code + sonnet? But my human interaction with it felt like the slow part.

alyxya•3h ago
The human interaction part is one of the main limitations to speed, where the more autonomous a model can be, the faster it is for me.
behnamoh•3h ago
If they don't quantize the model, how do they achieve these speeds? Groq also says they don't quantize models (and I want to believe them) but we literally have no way to prove they're right.

This is important because their premium $50 (as opposed to $20 on Claude Pro or ChatGPT Plus) should be justified by the speed. GLM 4.6 is fine but I don't think it's still at the GPT-5/Claude Sonnet 4.5 level, so if I'm paying $50 for it on Cerebras it should be mainly because of speed.

What kind of workflow justifies this? I'm genuinely curious.

cschneid•3h ago
so apparently they have custom hardware that is basically absolutely gigantic chips - across the scale of a whole wafer at a time. Presumably they keep the entire model right on chip, in effectively L3 cache or whatever. So the memory bandwidth is absurdly fast, allowing very fast inference.

It's more expensive to get the same raw compute as a cluster of nvidia chips, but they don't have the same peak throughput.

As far as price as a coder, I am giving a month of the $50 plan a shot. I haven't figured out how to adapt my workflow yet to faster speeds (also learning and setting up opencode).

bigyabai•3h ago
For $50/month, it's a non-starter. I hope they can find a way to use all this excess bandwidth to put out a $10 equivalent to Claude Code instead of a 1000 tok/s party trick I can't use properly.
typpilol•56m ago
I feel the same and it's also why I can't understand all these people using small local models.

Every local model I've used and even most open source are just not good

behnamoh•37m ago
the only good-enough model I still use it gpt-oss-120b-mxfp4 (not 20b) and glm-4.6 at q8 (not q4).

quantization ruins models and some models aren't that smart to begin with.

nine_k•2h ago
> What kind of workflow justifies this?

Think about waiting for compilation to complete: the difference between 5 minutes and 15 seconds is dramatic.

Same applies to AI-based code-wrangling tasks. The preserved concentration may be well worth the $50, especially when paid by your employer.

behnamoh•1h ago
they should offer a free trial so we build confidence in the model quality (e.g., to make sure it's not nerfed/quantized/limited-context/etc.).
conception•1h ago
A trial is literally front and center on their website.
NitpickLawyer•28m ago
You can usually use them with things like openrouter. Load some credits there and use the API in your preferred IDE like you'd use any provider. For some quick tests it's probably be <5$ for a few coding sessions so you can check out the capabilities and see if it's worth it for you.
behnamoh•10m ago
openrouter charges me $12 on a $100 credit...
NitpickLawyer•31m ago
> What kind of workflow justifies this? I'm genuinely curious.

Any workflow where verification is faster / cheaper than generation. If you have a well tested piece of code and want to "refactor it to use such and such paradigme", you can run n faster model queries and pick the fastest.

My colleagues that do frontend use faster models (not this one specifically, but they did try fast-code-1) to build components. Someone worked out a workflow w/ worktrees where the model generates n variants of a component, and displays them next to each other. A human can "at a glance" choose which one they like. And sometimes pick and choose from multiple variants (something like passing it to claude and say "keep the styling of component A but the data management of component B"), and at the end of the day is faster / cheaper than having cc do all that work.

niklassheth•3h ago
This is more evidence that Cognition's SWE-1.5 is a GLM-4.6 finetune
prodigycorp•2h ago
Can you provide more context for this? (eg Was SWE-1.5 released recently? Is it considered good? Is it considered fast? Was there speculation about what the underlying model was? How does this prove that it's a GLM finetune?)
mhuffman•2h ago
I suspect they are referencing the 950tok/s claim on Cognition's page.
prodigycorp•2h ago
Ah. Thx. Blogpost for others: https://cognition.ai/blog/swe-1-5

Takeaway is that this is sonnet-ish model at 10x the speed.

NitpickLawyer•23m ago
People saw chinese characters in generations made by swe-1.5 (windsurfs model) and also in the one made by cursor. This led to suspicions that the models are finetunes of chinese models (which makes sense, as there aren't many us/eu strong coding models out there). GLM4.5/4.6 are the "strongest" coding models atm (with dsv3.2 and qwen somewhat behind) so that's where the speculation came from. Cerebras serving them at roughly the same speeds kinda adds to that story (e.g. if it'd be something heavier like dsv3 or kimik2 it would be slower).
nl•2h ago
Not at all. Any model with somewhat-similar architecture and roughly similar size should run at the same speed on Cerabras.

It's like saying Llama 3.2 3B and Gemma 4B are fine tunes of each other because they run at similar speeds on NVidia hardware.

gatienboquet•3h ago
Vibe Slopping at 1000 tokens per second
mmaunder•2h ago
Yeah honestly having max cognitive capability is #1 for me. Faster tokens is a distant second. I think anyone working on creating valuable unique IP feels this way.
conception•1h ago
This us where agents actually shine. Having a smart model write code and plan is great and then having cerebra’s do ask the command line work, write documents effectively instantly and other simple tasks does sped things up quite a bit.
lordofgibbons•2h ago
At what quantization? And if it is in fact quantized below fp8, how is the performance impacted on all the various benchmarks?
renewiltord•2h ago
Unfortunately for me, the models on Cerebras weren’t as good as Claude Code. Speedy but I needed to iterate more. Codex is trustworthy and slow. Claude is better at iterating. But none of the Cerebras models at the $50 tier were worth anything for me. They would have been something if they’d just come out but we have these alternatives now.
elzbardico•2h ago
I don't care. I want LLMs to help with the boring stuff, the toil. It may not be as intelligent as Claude, but if it takes care of the boring stuff, and it is fast while doing it, I am happy. Use it surgically, do the top-down design, and just let it fill the blanks.
renewiltord•1h ago
Give it a crack. It took a lot of iteration for it to write decent code. If you figure out differences in prompting technique, do share. I was really hoping for the speed to improve a lot of execution - because that’s genuinely the primary problem for me. Unfortunately, speed is great but quality wasn’t great for me.

Good luck. Maybe it’ll do well in some self-directed agent loop.

Flux159•2h ago
Was able to sign up for the Max plan & start using it via opencode. It does a way better job than Qwen3 Coder in my opinion. Still extremely fast, but in less than 1 hour I was able to use 7M input tokens, so with a single agent running I would be able easily to pass that 120M daily token limit. The speed difference between Claude Code is significant though - to the point where I'm not waiting for generation most of the time, I'm waiting for my tests to run.

For reference, each new request needs to send all previous messages - tool calls force new requests too. So it's essentially cumulative when you're chatting with an agent - my opencode agent's context window is only 50% used at 72k tokens, but Cerebra's tracking online shows that I've used 1M input tokens and 10k output tokens already.

NitpickLawyer•42m ago
> For reference, each new request needs to send all previous messages - tool calls force new requests too. So it's essentially cumulative when you're chatting with an agent - my opencode agent's context window is only 50% used at 72k tokens, but Cerebra's tracking online shows that I've used 1M input tokens and 10k output tokens already.

This is how every "chatbot" / "agentic flow" / etc works behind the scenes. That's why I liked that "you should build an agent" post a few days ago. It gets people to really understand what's behind the curtain. It's requests all the way down, sometimes with more context added, sometimes with less (subagents & co).

elzbardico•2h ago
50 dollars month cerebras code plan, first with qwen-420, now with glm, is my secret weapon.

Stalin used to say that in war "quantity has a quality all its own". And I think that in terms of coding agents, speed is quality all its own too.

Maybe not for blind vibe coding, but if you are a developer, and is able to understand the code the agent generates and change it, the fast feedback of fast inference is a game changer. I don't care if claude is better than GLM 4.6, fast iteractions are king for me now.

It is like moving from DSL to gigabit fiber FTTH

divmain•2h ago
I have been an AI-coding skeptic for some time. I always acknowledged LLMs as useful for solving specific problems and making certain things possible that weren't possible before. But I've not been surprised to see AI fail to live up to the hype. And I never had a personally magical moment - an experience that shifted my perspective à la the peak end rule.

I've been using GLM 4.6 on Cerebras for the last week or so, since they began the transition, and I've been blown away.

I'm not a vibe coder; when I use AI coding tools, they're in the hot path. They save me time when whipping up a bash script and I can't remember the exact syntax, or for finding easily falsifiable answers that would otherwise take me a few minutes of reading. But, even though GLM 4.6 is not as smart as Sonnet 4.5, it is smart enough. And because it is so fast on Cerebras, I genuinely feel that it augments my own ability and productivity; the raw speed has considerably shifted the tipping point of time-savings for me.

YMMV, of course. I'm very precise with the instructions I provide. And I'm constantly interleaving my own design choices into the process - I usually have a very clear idea in my mind of what the end result should look like - so, in the end, the code ends up how I would have written it without AI. But building happens much faster.

No affiliation with Cerebras, just a happy customer. Just upgraded to the $200/mo plan - and I'll admit that I was one that scoffed when folks jumped on the original $200/mo Claude plan. I think this particular way of working with LLMs just fits well with how I think and work.

mythz•53m ago
AI moves so fast that Vibe Coding still has a negative stigma attached to it, but even after 25 years of development, I'm not able to match the productivity of getting AI to implement the features I want. It's basically getting multiple devs to set out and go do work for you where you just tell them what you want and provide iterative feedback till they implement all the features you want, in the way you want and to fix all the issues you find along the way, which they can create tests and all the automated and deployment scripts for.

This is clearly the future of Software Development, but the models are so good atm that the future is possible now. I'm still getting used to and having to rethink my entire dev workflow for maximum productivity, and whilst I wouldn't unleash AI Agents on a decade old code base, all my new Web Apps will likely end up being AI-first unless there's a very good reason why it wouldn't provide a net benefit.

namanyayg•34m ago
Exactly, Codex gpt-5-high is quite like sending smart devs. It still makes mistakes, and when it does they're extremely stupid ones, but I am now accepting the code it generates as throwable and I just reroll when it does something dumb.
dust42•20m ago
It just depends on what you are doing. A green field react app in typescript with a CRUD API behind? The LLMs are a mind blowing assistant and 1000t/s is crazy.

You are doing embedded development or anything else not as mainstream as web dev? LLMs are still useful but no longer mind blowing and often produce hallucinations. You need to read every line of their output. 1000t/s is crazy but no longer always in a good way.

You are doing stuff which the LLMs haven't seen yet? You are on your own. There is quite a bit of irony in the fact that the devs of llama.cpp barely use AI - just have a look at the development of support for Qwen3-Next-80B [1].

[1] https://github.com/ggml-org/llama.cpp/pull/16095

almostgotcaught•7m ago
I've said it before but no one takes it seriously: LLMs are only useful if you're building something that's already in the training set ie already commodity. In which case why are you building it???
whiterook6•3m ago
It's not that the product you're building is a commodity. It's that the tools you're using to built it are. Why not build a landing page using HTML and CSS and tailwind? Why not use swift to make an app? Why not write an AWS lambda using JavaScript?
odie5533•43m ago
I find the fast models good for rapidly iterating UI changes with voice chat. Like "add some padding above the text box" or "right align the button". But I find the fast models useless for deep coding work. But a fast model has its place. Not $50/month though. Cursor has Compose 1 and Grok Code Fast for free. Not sure what $50/month gets me that those don't. I liked the stealth supernova model a lot too.
gardnr•37m ago
GLM 4.6 isn't a "fast" model. It does well in benchmarks vs Sonnet 4.5.

Cerebras makes a giant chip that runs inference at unreal speeds. I suspect they run their cloud service more as an advertising mechanism for their core business: hardware. You can hear the founder describing their journey:

https://podcasts.apple.com/us/podcast/launching-the-fastest-...

bn-l•19m ago
Composer and grok fast are not free.