frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Run 500B+ Parameter LLMs Locally on a Mac Mini

https://github.com/opengraviton/graviton
3•fatihturker•8h ago
Hi HN, I built OpenGraviton, an open-source AI inference engine that pushes the limits of running extremely large LLMs on consumer hardware. By combining 1.58-bit ternary quantization, dynamic sparsity with Top-K pruning and MoE routing, and mmap-based layer streaming, OpenGraviton can run models far larger than your system RAM—even on a Mac Mini. Early benchmarks: TinyLlama-1.1B drops from ~2GB (FP16) to ~0.24GB with ternary quantization. At 140B scale, models that normally require ~280GB fit within ~35GB packed. Optimized for Apple Silicon with Metal + C++ tensor unpacking, plus speculative decoding for faster generation. Check benchmarks, architecture, and details here: https://opengraviton.github.io GitHub: https://github.com/opengraviton This project isn’t just about squeezing massive models onto tiny hardware—it’s about democratizing access to giant LLMs without cloud costs. Feedback, forks, and ideas are very welcome!

Comments

ryanholtdev•7m ago
Running a Mac Mini M4 as a home server for a bunch of automation scripts right now. The mmap-based layer streaming is the part I'm most curious about -- how does latency look when you're streaming layers from disk mid-inference? I'd expect throughput to degrade sharply once you exceed unified memory, but maybe the Top-K sparsity masks enough of the weight accesses that it's not as bad as sequential streaming would be. What's the actual tokens/sec at 140B scale on the base Mac Mini config?

Show HN: VS Code Agent Kanban: Task Management for the AI-Assisted Developer

https://www.appsoftware.com/blog/introducing-vs-code-agent-kanban-task-management-for-the-ai-assi...
49•gbro3n•4h ago•21 comments

Show HN: Run autoresearch on a gaming PC (Windows and RTX GPUs fork)

https://github.com/jsegov/autoresearch-win-rtx
2•segov•1h ago•0 comments

Show HN: Skir – like Protocol Buffer but better

https://skir.build/
97•gepheum•22h ago•53 comments

Show HN: I built a real-time OSINT dashboard pulling 15 live global feeds

https://github.com/BigBodyCobain/Shadowbroker
276•vancecookcobxin•20h ago•109 comments

Show HN: Mcp2cli – One CLI for every API, 96-99% fewer tokens than native MCP

https://github.com/knowsuchagency/mcp2cli
132•knowsuchagency•10h ago•91 comments

Show HN: Engram – open-source persistent memory for AI agents (Bun and SQLite)

https://github.com/zanfiel/engram
2•zanfiel•4h ago•1 comments

Show HN: I built a site where strangers leave kind voice notes for each other

https://kindvoicenotes.com
24•thepaulthomson•17h ago•16 comments

Show HN: Husky hook that blocks Git push until you do your pushups

https://git-push.app
5•zimboy•6h ago•1 comments

Show HN: Reviving a 20-year-old puzzle game Chromatron with Ghidra and AI

https://quesma.com/blog/chromatron-recompiled/
21•stared•2d ago•7 comments

Show HN: Eyot, A programming language where the GPU is just another thread

https://cowleyforniastudios.com/2026/03/08/announcing-eyot/
74•steeleduncan•1d ago•14 comments

Show HN: cursor-tg – Run Cursor Cloud Agents from Telegram

https://github.com/tb5z035i/cursor-tg
3•tb5z035i•6h ago•0 comments

Show HN: WolfStack – Proxmox-like server management in a single Rust binary

https://wolfscale.org/
28•wolfsoftware•20h ago•2 comments

Show HN: Botais (Battle of the AI's) – Competitive Snake Game for LLMs

https://botais.sello.dev
3•giza182•6h ago•3 comments

Show HN: Curiosity – DIY 6" Newtonian Reflector Telescope

https://curiosity-telescope.vercel.app/
80•big_Brain69•1d ago•19 comments

Show HN: OpenMeters – A fast and free audio metering/visualization suite

https://github.com/httpsworldview/openmeters
11•httpsworldview•14h ago•0 comments

Show HN: Finsight – A Privacy First, AI Credit Card and Bank Statement Analyzer

https://github.com/AJ/FinSight
3•aj•7h ago•1 comments

Show HN: OxiMedia – Pure Rust Reconstruction of FFmpeg and OpenCV

https://github.com/cool-japan/oximedia
10•kitasan•16h ago•8 comments

Show HN: Bunway – Express-compatible web framework for Bun

https://bunway.jointops.dev/
3•rockstarsb•8h ago•0 comments

Show HN: Run 500B+ Parameter LLMs Locally on a Mac Mini

https://github.com/opengraviton/graviton
3•fatihturker•8h ago•1 comments

Show HN: ANSI-Saver – A macOS Screensaver

https://github.com/lardissone/ansi-saver
102•lardissone•2d ago•37 comments

Show HN: µJS, a 5KB alternative to Htmx and Turbo with zero dependencies

https://mujs.org
155•amaury_bouchard•2d ago•81 comments

Show HN: U-Claw – An Offline Installer USB for OpenClaw in China

https://www.u-claw.org/
4•17vibe•6h ago•0 comments

Show HN: Environment Variable Checker

https://github.com/Chrilleweb/dotenv-diff
7•chrillemn•17h ago•0 comments

Show HN: AlphaPerch – Track product execution for companies you follow using AI

https://alphaperch.com
3•sebasnar•12h ago•0 comments

Show HN: Compose Launcher – A macOS app to run multiple Docker Compose files

https://github.com/yingbo/compose-launcher
3•yingbo•12h ago•0 comments

Show HN: Moongate – Ultima Online server emulator in .NET 10 with Lua scripting

https://github.com/moongate-community/moongatev2
286•squidleon•3d ago•164 comments

Show HN: Kula – Lightweight, self-contained Linux server monitoring tool

https://github.com/c0m4r/kula
89•c0m4r•2d ago•56 comments

Show HN: Claude-replay – A video-like player for Claude Code sessions

https://github.com/es617/claude-replay
101•es617•2d ago•34 comments

Show HN: I open-sourced my Steam game, 100% written in Lua, engine is also open

https://github.com/willtobyte/reprobate
60•delduca•2d ago•22 comments

Show HN: ChatML - Run Claude Code Parallel Sessions in a Desktop app

https://github.com/chatml/chatml
4•mcastilho•14h ago•6 comments