frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Efficient LLM Architectures for 32GB RAM (Ternary and Sparse Inference)

https://github.com/opengraviton/graviton-native
2•fatihturker•2h ago
Hi HN,

I’ve been exploring how far large language models can be pushed on machines with limited memory.

I built an experimental runtime and architecture approach focused on making extremely large models more feasible on systems with around 32GB of RAM.

The core idea is combining several efficiency techniques:

ternary weight representation {-1, 0, +1} (~1.58 bits per weight), sparse execution that skips zero weights, memory-mapped layer streaming from NVMe storage, and lightweight tensor unpacking optimized for Apple Silicon.

Instead of keeping the entire model in RAM, weights can be streamed from fast SSD storage and unpacked during execution. This shifts the bottleneck from memory capacity toward storage bandwidth and compute efficiency.

Early experiments show significant compression compared to FP16 weights (for example TinyLlama-1.1B shrinking from ~2.05GB to ~0.24GB with ternary packing).

The project is still experimental, but the goal is to explore whether extreme compression + sparsity + SSD streaming can make much larger models practical on consumer machines.

Paper: https://opengraviton.github.io/paper.html

Runtime: https://github.com/opengraviton/graviton-native

I’d really appreciate feedback from people working on inference engines, quantization, or efficient model architectures.

Comments

fatihturker•2h ago
One question I'm interested in exploring:

If models become heavily compressed and streamed from SSD, where do people think the real bottleneck moves to — storage bandwidth, memory bandwidth, or kernel efficiency?

New AI Note Tool

1•xlisp•31s ago•0 comments

Shipping Grayscale Photos at Small Scale

https://underjord.io/shipping-grayscale-photos-at-small-scale.html
1•zdw•3m ago•0 comments

Head to head: Claude Code (Opus 4.6 / 1M) vs. Cursor (Composer 1.5 / 200k)

https://medium.com/@yoavaa/head-to-head-claude-code-opus-4-6-1m-vs-cursor-composer-1-5-200k-f15c5...
1•yoava•4m ago•0 comments

Managing My Open Source Repos with Autonomous AI Agents

https://massadas.com/posts/managing-opensource-with-ai-agents/
1•G4brym•5m ago•0 comments

Aatel: The Anti-AI Training Ethical License – What It Is and Why It Was Built

https://github.com/aatel-license/aatel-license.github.io
1•aatel-license•5m ago•1 comments

Cinder CSI vs. Ceph RBD CSI in Kubernetes

https://spot.rackspace.com/blogs/cinder-csi-vs-ceph-rbd-csi-in-kubernetes-an-analysis-of-persiste...
1•aleroawani•5m ago•0 comments

Sorca – Voice-first AI therapy companion

https://sorca.life
1•nothanii•5m ago•1 comments

Context plane for AI agents (Rust, S3)

https://aboutphilippe.com/2026/03/04/context-plane/
1•aboutphilippe•7m ago•1 comments

Music Programming Studio

https://folkstack.com/blog/introducing_music_programming_studio.html
1•folkstack•8m ago•0 comments

Ask HN: Value and demand for space-manufactured products?

1•dnlh_lvg•9m ago•0 comments

Anthropic says Trump ban puts federal contractor partnerships 'in jeopardy'

https://fedscoop.com/anthropic-says-trump-ban-puts-federal-contractor-partnerships-jeopardy/
3•petethomas•10m ago•0 comments

Treat Agent Output Like Compiler Output

https://skiplabs.io/blog/codegen_as_compiler
2•friscofoodie•10m ago•0 comments

New HIV cure approach forces hidden virus into tripping immune sensor

https://www.science.org/content/article/new-hiv-cure-approach-forces-hidden-virus-tripping-immune...
2•rbanffy•14m ago•0 comments

LibreOffice learns to speak Markdown in version 26.2

https://www.theregister.com/2026/03/09/markdown_in_libreoffice/
3•rbanffy•14m ago•0 comments

EV charger biz ELECQ zapped by ransomware crooks, customer contact data stolen

https://www.theregister.com/2026/03/09/ransomware_crooks_hit_ev_charger/
1•Bender•15m ago•0 comments

Moody humans should let AI handle bad public feedback first, study finds

https://www.theregister.com/2026/03/09/ai_negative_reviews/
2•Bender•16m ago•0 comments

Number Stations

https://priyom.org/number-stations
1•carlos-menezes•16m ago•0 comments

Microsoft 365 confirms new premium tier, stuffed with AI and few discounts

https://www.theregister.com/2026/03/09/microsoft_adds_a_premium_tier/
3•Bender•17m ago•0 comments

Ending rent seeking in academic publishing

https://dontaylor13.substack.com/p/ending-rent-seeking-in-academic-publishing
4•paulpauper•17m ago•0 comments

OpenLDAP 2.6.13 Now Available

https://lists.openldap.org/hyperkitty/list/openldap-announce@openldap.org/thread/2LZ7KDEMPB66TWBA...
2•neustradamus•17m ago•0 comments

If You're Going to Defend AI, You Should Be Honest About Its Actual Harms

https://www.techdirt.com/2026/03/09/if-youre-going-to-defend-ai-and-whine-about-its-critics-you-s...
1•hn_acker•18m ago•1 comments

Show HN: An open-source DAW plugin built on JUCE, React, and Lyria RealTime

https://github.com/magenta/the-infinite-crate
1•DesaiAshu•18m ago•0 comments

Show HN: Dark Hacker News

https://darkhn.com
2•alexandermorgan•21m ago•4 comments

The Safe Act Is an Imperfect Vehicle for Real Section 702 Reform

https://www.eff.org/deeplinks/2026/03/safe-act-imperfect-vehicle-real-section-702-reform
2•hn_acker•21m ago•0 comments

Claude Code, Claude Cowork and Codex #5

https://thezvi.substack.com/p/claude-code-claude-cowork-and-codex
1•paulpauper•22m ago•0 comments

Is Spotify Enabling Impersonation of Famous Jazz Musicians?

https://www.honest-broker.com/p/is-spotify-enabling-massive-impersonation
1•paulpauper•22m ago•0 comments

Microsoft Just Launched an AI That Does Your Office Work for You

https://medium.com/@him2696/microsoft-just-launched-an-ai-that-does-your-office-work-for-you-fc10...
1•voisin•22m ago•0 comments

I made a site for learning languages like C++, Rust, 86-64 ASM, SQL, and more

https://youg-otricked.github.io/learnhardcode/
1•YougOtricked•23m ago•0 comments

Masterpiece or cheap copy? Art historians and AI may not agree

https://www.cnn.com/2026/03/09/style/ai-art-recognition-authentication
1•dabinat•25m ago•0 comments

Researchers copy viral strategies to get mRNA medicines into cells in one piece

https://phys.org/news/2026-02-viral-strategies-mrna-medicines-cells.html
2•PaulHoule•26m ago•0 comments