frontpage.

Hi HN,

I’ve been exploring how far large language models can be pushed on machines with limited memory.

I built an experimental runtime and architecture approach focused on making extremely large models more feasible on systems with around 32GB of RAM.

The core idea is combining several efficiency techniques:

ternary weight representation {-1, 0, +1} (~1.58 bits per weight), sparse execution that skips zero weights, memory-mapped layer streaming from NVMe storage, and lightweight tensor unpacking optimized for Apple Silicon.

Instead of keeping the entire model in RAM, weights can be streamed from fast SSD storage and unpacked during execution. This shifts the bottleneck from memory capacity toward storage bandwidth and compute efficiency.

Early experiments show significant compression compared to FP16 weights (for example TinyLlama-1.1B shrinking from ~2.05GB to ~0.24GB with ternary packing).

The project is still experimental, but the goal is to explore whether extreme compression + sparsity + SSD streaming can make much larger models practical on consumer machines.

Paper: https://opengraviton.github.io/paper.html

Runtime: https://github.com/opengraviton/graviton-native

I’d really appreciate feedback from people working on inference engines, quantization, or efficient model architectures.

New AI Note Tool

Shipping Grayscale Photos at Small Scale

Head to head: Claude Code (Opus 4.6 / 1M) vs. Cursor (Composer 1.5 / 200k)

Managing My Open Source Repos with Autonomous AI Agents

Aatel: The Anti-AI Training Ethical License – What It Is and Why It Was Built

Cinder CSI vs. Ceph RBD CSI in Kubernetes

Sorca – Voice-first AI therapy companion

Context plane for AI agents (Rust, S3)

Music Programming Studio

Ask HN: Value and demand for space-manufactured products?

Anthropic says Trump ban puts federal contractor partnerships 'in jeopardy'

Treat Agent Output Like Compiler Output

New HIV cure approach forces hidden virus into tripping immune sensor

LibreOffice learns to speak Markdown in version 26.2

EV charger biz ELECQ zapped by ransomware crooks, customer contact data stolen

Moody humans should let AI handle bad public feedback first, study finds

Number Stations

Microsoft 365 confirms new premium tier, stuffed with AI and few discounts

Ending rent seeking in academic publishing

OpenLDAP 2.6.13 Now Available

If You're Going to Defend AI, You Should Be Honest About Its Actual Harms

Show HN: An open-source DAW plugin built on JUCE, React, and Lyria RealTime

Show HN: Dark Hacker News

The Safe Act Is an Imperfect Vehicle for Real Section 702 Reform

Claude Code, Claude Cowork and Codex #5

Is Spotify Enabling Impersonation of Famous Jazz Musicians?

Microsoft Just Launched an AI That Does Your Office Work for You

I made a site for learning languages like C++, Rust, 86-64 ASM, SQL, and more

Masterpiece or cheap copy? Art historians and AI may not agree

Researchers copy viral strategies to get mRNA medicines into cells in one piece

Show HN: Efficient LLM Architectures for 32GB RAM (Ternary and Sparse Inference)

Comments