frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

https://prismml.com/
61•PrismML•3h ago

Comments

yodon•1h ago
Is Bonsai 1 Bit or 1.58 Bit?
woadwarrior01•1h ago
1-bit g128 with a shared 16-bit scale for every group. So, effectively 1.125 bit.
stogot•1h ago
What is the value of a 1 bit? For those that do not kno
trebligdivad•1h ago
Speed and density.
jacquesm•1h ago
That you can process many operations with a single instruction.
SwellJoe•1h ago
0 or 1
jjcm•54m ago
Technically not in this case, or not effectively. The 0 or 1 correspond to a FP16 scaling factor for each group of 128 bits. The value fluctuates between each group of 128.
syntaxing•1h ago
Super interesting, building their llama cpp fork on my Jetson Orin Nano to test this out.
alyxya•1h ago
I expect the trend of large machine learning models to go towards bits rather than operating on floats. There's a lot of inefficiency in floats because typically they're something like normally distributed, which makes the storage and computation with weights inefficient when most values are clustered in a small range. The foundation of neural networks may be rooted in real valued functions, which are simulated with floats, but float operations are just bitwise operations underneath. The only issue is that GPUs operate on floats and standard ML theory works over real numbers.
OutOfHere•1h ago
How do I run this on Android?
najarvg•11m ago
Pocket Pal is what I've seen used before. Although recently heard about "Off Grid" but not read any reviews about it or tried it personally so caveat emptor. Will see if the community has other suggestions
Archit3ch•1h ago
Doesn't Jevons paradox dictate larger 1-bit models?
_fw•1h ago
What’s the trade-off? If it’s smaller, faster and more efficient - is it worse performance? A layman here, curious to know.
kvdveer•58m ago
Their own (presumably cherry picked) benchmarks put their models near the 'middle of the market' models (llama3 3b, qwen3 1.7b), not competing with claude, chatgtp, or gemini. These are not models you'd want to directly interact with. but these models can be very useful for things like classification or simple summarization or translation tasks.

These models quite impressive for their size: even an older raspberry pi would be able to handle these.

There's still a lots of use for this kind of model

adityashankar•27m ago
If you look at their whitepaper (https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-b...) you'll notice that it does have some tradeoffs due to model intelligence being reduced (page 10)

The average of MMLU Redux,MuSR,GSM8K,Human Eval+,IFEval,BFCLv3 for this model is 70.5 compared to 79.3 for Qwen3, that being said the model is also having a 16x smaller size and is 6x faster on a 4090....so it is a tradeoff that is pretty respectable

I'd be interested in fine tuning code here personally

jjcm•56m ago
1 bit with a FP16 scale factor every 128 bits. Fascinating that this works so well.

I tried a few things with it. Got it driving Cursor, which in itself was impressive - it handled some tool usage. Via cursor I had it generate a few web page tests.

On a monte carlo simulation of pi, it got the logic correct but failed to build an interface to start the test. Requesting changes mostly worked, but left over some symbols which caused things to fail. Required a bit of manual editing.

Tried a Simon Wilson pelican as well - very abstract, not recognizable at all as a bird or a bicycle.

Pictures of the results here: https://x.com/pwnies/status/2039122871604441213

There doesn't seem to be a demo link on their webpage, so here's a llama.cpp running on my local desktop if people want to try it out. I'll keep this running for a couple hours past this post: https://unfarmable-overaffirmatively-euclid.ngrok-free.dev

adityashankar•45m ago
here's the google colab link, https://colab.research.google.com/drive/1EzyAaQ2nwDv_1X0jaC5... since the ngrok like likely got ddosed by the number of individuals coming along
jjcm•40m ago
Good call. Right now though traffic is low (1 req per min). With the speed of completion I should be able to handle ~100x that, but if the ngrok link doesn't work defo use the google colab link.
adityashankar•37m ago
The link didn't work for me personally, but that may be a bandwidth issue with me fighting for a connection in the EU
uf00lme•26m ago
The speed is impressive, I wish it could be setup for similar to speculative decoding
najarvg•24m ago
Thanks for sharing the link to your instance. Was blazing fast in responding. Tried throwing a few things at it with the following results: 1. Generating an R script to take a city and country name and finding it's lat/long and mapping it using ggmaps. Generated a pretty decent script (could be more optimal but impressive for the model size) with warnings about using geojson if possible 2. Generate a latex script to display the gaussian integral equation - generated a (I think) non-standard version using probability distribution functions instead of the general version but still give it points for that. Gave explanations of the formula, parameters as well as instructions on how to compile the script using BASH etc 3. Generate a latex script to display the euler identity equation - this one it nailed.

Strongly agree that the knowledge density is impressive for the being a 1-bit model with such a small size and blazing fast response

jjcm•20m ago
> Was blazing fast in responding.

I should note this is running on an RTX 6000 pro, so it's probably at the max speed you'll get for "consumer" hardware.

najarvg•18m ago
I must add that I also tried out the standard "should I walk or drive to the carwash 100 meters away for washing the car" and it made usual error or suggesting a walk given the distance and health reasons etc. But then this does not claim to be a reasoning model and I did not expect, in the remotest case, for this to be answered correctly. Ever previous generation larger reasoning models struggle with this
hmokiguess•24m ago
wow that was cooler than I expected, curious to embed this for some lightweight semantic workflows now
hatthew•44m ago
I feel like it's a little disingenuous to compare against full-precision models. Anyone concerned about model size and memory usage is surely already using at least an 8 bit quantization.

Their main contribution seems to be hyperparameter tuning, and they don't compare against other quantization techniques of any sort.

volume_tech•13m ago
the speed is not just about storage -- at 1-bit you are reading roughly 16x less data from DRAM per forward pass compared to FP16. on memory-bandwidth-constrained hardware that is usually the actual bottleneck, so the speedup scales pretty directly. the ac
keyle•13m ago
Extremely cool!

Can't wait to give it a spin with ollama, if ollama could list it as a model that would be helpful.

ariwilson•11m ago
Very cool and works pretty well!

The Claude Code Source Leak: fake tools, frustration regexes, undercover mode

https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/
686•alex000kim•11h ago•275 comments

Ministack (Replacement for LocalStack)

https://ministack.org/
110•kerblang•3h ago•17 comments

A dot a day keeps the clutter away

https://scottlawsonbc.com/post/dot-system
94•scottlawson•3h ago•32 comments

OpenAI closes funding round at an $852B valuation

https://www.cnbc.com/2026/03/31/openai-funding-round-ipo.html
286•surprisetalk•4h ago•262 comments

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs

https://prismml.com/
62•PrismML•3h ago•28 comments

4D Doom

https://github.com/danieldugas/HYPERHELL
106•chronolitus•4d ago•22 comments

Ordinary Lab Gloves May Have Skewed Microplastic Data

https://nautil.us/ordinary-lab-gloves-may-have-skewed-microplastic-data-1279386
37•WaitWaitWha•3h ago•8 comments

Slop is not necessarily the future

https://www.greptile.com/blog/ai-slopware-future
166•dakshgupta•10h ago•300 comments

Cohere Transcribe: Speech Recognition

https://cohere.com/blog/transcribe
151•gmays•8h ago•49 comments

Teenage Engineering's PO-32 acoustic modem and synth implementation

https://github.com/ericlewis/libpo32
77•ericlewis•3d ago•19 comments

Open source CAD in the browser (Solvespace)

https://solvespace.com/webver.pl
277•phkahler•11h ago•91 comments

TinyLoRA – Learning to Reason in 13 Parameters

https://arxiv.org/abs/2602.04118
10•sorenjan•4d ago•0 comments

I Traced My Traffic Through a Home Tailscale Exit Node

https://tech.stonecharioteer.com/posts/2026/tailscale-exit-nodes/
62•stonecharioteer•4h ago•33 comments

Back to FreeBSD – Part 2 – Jails

https://hypha.pub/back-to-freebsd-part-2
32•vermaden•4d ago•7 comments

OkCupid gave 3M dating-app photos to facial recognition firm, FTC says

https://arstechnica.com/tech-policy/2026/03/okcupid-match-pay-no-fine-for-sharing-user-photos-wit...
321•whiteboardr•6h ago•75 comments

Learn Something Old Every Day, Part XVIII: How Does FPU Detection Work?

https://www.os2museum.com/wp/learn-something-old-every-day-part-xviii-how-does-fpu-detection-work/
16•kencausey•3d ago•0 comments

Show HN: Forkrun – NUMA-aware shell parallelizer (50×–400× faster than parallel)

https://github.com/jkool702/forkrun
102•jkool702•4d ago•24 comments

GitHub's Historic Uptime

https://damrnelson.github.io/github-historical-uptime/
374•todsacerdoti•5h ago•100 comments

Show HN: Postgres extension for BM25 relevance-ranked full-text search

https://github.com/timescale/pg_textsearch
87•tjgreen•8h ago•30 comments

From 300KB to 69KB per Token: How LLM Architectures Solve the KV Cache Problem

https://news.future-shock.ai/the-weight-of-remembering/
76•future-shock-ai•3d ago•5 comments

Nematophagous Fungus

https://en.wikipedia.org/wiki/Nematophagous_fungus
35•lordgilman•4d ago•5 comments

Why the US Navy won't blast the Iranians and 'open' Strait of Hormuz

https://responsiblestatecraft.org/iran-strait-of-hormuz/
149•KoftaBob•15h ago•403 comments

Axios compromised on NPM – Malicious versions drop remote access trojan

https://www.stepsecurity.io/blog/axios-compromised-on-npm-malicious-versions-drop-remote-access-t...
1763•mtud•21h ago•712 comments

Audio tapes reveal mass rule-breaking in Milgram's obedience experiments

https://www.psypost.org/audio-tapes-reveal-mass-rule-breaking-in-milgram-s-obedience-experiments-...
198•lentoutcry•3d ago•125 comments

A Primer on Long-Duration Life Support

https://mceglowski.substack.com/p/a-primer-on-long-duration-life-support
64•zdw•4d ago•15 comments

Super Micro Computer Investors Look for Exits

https://catenaa.com/markets/equities/super-micro-computer-investors-look-for-exits/
25•malindasp•4h ago•13 comments

Accidentally created my first fork bomb with Claude Code

https://www.droppedasbaby.com/posts/2602-01/
55•offbyone42•16h ago•14 comments

GitHub Monaspace Case Study

https://lettermatic.com/custom/monaspace-case-study
110•homebrewer•9h ago•37 comments

Combinators

https://tinyapl.rubenverg.com/docs/info/combinators
128•tosh•12h ago•39 comments

Microsoft: Copilot is for entertainment purposes only

https://www.microsoft.com/en-us/microsoft-copilot/for-individuals/termsofuse
446•lpcvoid•10h ago•164 comments