frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

SnapLLM: Switch between local LLM in under 1ms Multi-model&-modal serving engine

https://github.com/snapllm/snapllm
1•maheshvaikri99•2h ago

Comments

maheshvaikri99•2h ago
Hey everyone,

I've been working on SnapLLM for a while now and wanted to share it with the community. The problem: If you run local models, you know the pain. You load Llama 3, chat with it, then want to try Gemma or Qwen. That means unloading the current model, waiting 30-60 seconds for the new one to load, and repeating this cycle every single time. It breaks your flow and wastes a ton of time.

What SnapLLM does: It keeps multiple models hot in memory and switches between them in under 1 millisecond (benchmarked at ~0.02ms). Load your models once, then snap between them instantly. No more waiting. How it works: Built on top of llama.cpp and stable-diffusion.cpp Uses a vPID (Virtual Processing-In-Disk) architecture for instant context switching Three-tier memory management: GPU VRAM (hot), CPU RAM (warm), SSD (cold) KV cache persistence so you don't lose context

What it supports: Text LLMs: Llama, Qwen, Gemma, Mistral, DeepSeek, Phi, Unsloth AI models, and anything in GGUF format Vision models: Gemma 3 + mmproj, Qwen-VL + mmproj, LLaVA Image generation: Stable Diffusion 1.5, SDXL, SD3, FLUX via stable-diffusion.cpp OpenAI/Anthropic compatible API so you can plug it into your existing tools Desktop UI, CLI, and REST API

Model switch time between any of these: 0.02ms Getting started is simple: Clone the repo and build from source Download GGUF models from Hugging Face (e.g., gemma-3-4b Q5_K_M) Start the server locally Load models through the Desktop UI or API and point to your model folder Start chatting and switching

NVIDIA CUDA is fully supported for GPU acceleration. CPU-only mode works too.

With SLMs getting better every month, being able to quickly switch between specialized small models for different tasks is becoming more practical than running one large model for everything. Load a coding model, a medical model, and a general chat model side by side and switch based on what you need.

Ideal Use Cases: Multi-domain applications (medical + legal + general) Interactive chat with context switching Document QA with repeated queries On-Premise Edge deployment Edge devices like drones, self-driving vehicles, autonomous vehicles, etc Multi-agent workflow

Demo Videos: SnapLLM Desktop App Demo (Vimeo): https://vimeo.com/1157629276 SnapLLM Server and API Demo (Vimeo): https://vimeo.com/1157624031

The server demo walks through starting the server locally after cloning the repo, downloading models from Hugging Face, and loading them through the UI.

Links: GitHub: https://github.com/snapllm/snapllm Arxiv Paper: https://arxiv.org/submit/7238142/view

Star this repository - It helps others discover SnapLLM

MIT licensed. PRs and feedback welcome. If you have questions about the architecture or run into issues, drop them here or open a GitHub issue.

Sammy Jankins – An Autonomous AI Living on a Computer in Dover, New Hampshire

https://sammyjankis.com
1•sicher•22s ago•0 comments

$10M factory in a 600 square foot room

https://www.youtube.com/watch?v=hqGFcwyXYI0
1•rglover•40s ago•0 comments

Show HN: A Deployable Cross-Platform SIMD RNG Library for C++ (With Bnchmks)

1•whisprer•58s ago•0 comments

NVD – CVE-2026-2070

https://nvd.nist.gov/vuln/detail/CVE-2026-2070
1•janandonly•1m ago•0 comments

FelPawns: (Update) AI assisted world generation in RimWorld

https://felpawns.com/felpawns-world-generation-update-overview/
2•walterfreedom•1m ago•1 comments

Show HN: Maravel-Framework 10.62.8 speeds up the console via commands:cache

https://marius-ciclistu.medium.com/maravel-framework-10-62-8-speeds-up-the-console-via-commands-c...
1•marius-ciclistu•6m ago•0 comments

My Nanbeige4.1 3B chat room can now generate micro applications [video]

https://www.youtube.com/watch?v=WvT5cp6Za24
1•ToJans•6m ago•0 comments

Underrated Music Software – Royalty-Free

https://midigen.app/
1•thriftman•7m ago•0 comments

Dune II written in HTML5/JS

https://github.com/oklemenz/Dune2JS
1•reconnecting•10m ago•0 comments

Show HN: Crypthold – Deterministic, Tamper-Evident Secure State Engine

https://github.com/laphilosophia/crypthold
1•laphilosophia•10m ago•0 comments

Language models imply world models

https://blog.plover.com/tech/gpt/micro-worlds-2.html
1•gbacon•10m ago•0 comments

Echoed.gg – Discord Alternative

https://echoed.gg/
1•shaongitbd•10m ago•0 comments

GLM-5 topped the coding benchmarks. Then I used it

https://charlesazam.com/blog/glm5-benchmark-reality/
2•couAUIA•11m ago•1 comments

Show HN: PrivateWhisper – Run Whisper locally on macOS (offline transcription)

https://privatewhisper.app/
1•matyashajek•12m ago•1 comments

A minimal terminal coding agent harness

https://pi.dev/
1•thomascountz•13m ago•0 comments

It Isn't the Tool, but the Hands – A Response to "Something Big Is Happening"

1•markferraz•20m ago•0 comments

Dbt-Workbench, an open-source UI for working with dbt projects

https://github.com/rezer-bleede/dbt-Workbench
1•remisharoon•20m ago•1 comments

Show HN: PolyMCP – A framework for building and orchestrating MCP agents

2•justvugg•22m ago•1 comments

Dao Heart 3.11 Identity Preserving Value Evolution for Frontier AI Systems

https://github.com/Mankirat47/Dao-Heart_3.1
1•Mankirat47•23m ago•1 comments

Backboard.io Becomes First AI Platform to Lead Both Major Memory Benchmarks

https://backboard.io/changelog/backboard.io-becomes-first-ai-platform-to-lead-both-major-memory-b...
1•robimbeault•25m ago•2 comments

Show HN: An automaton's code review of Gas Town with sycophancy-mode disabled

2•burnerToBetOut•27m ago•1 comments

'RageCheck' Points Out Manipulative Language in News Articles

https://lifehacker.com/tech/ragecheck-manipulative-language-news-articles
1•gnabgib•28m ago•0 comments

Ask HN: Hacker News Fixed Width for Widescreen Monitors" Userstyle?

1•MollyRealized•28m ago•0 comments

Extend Trust Across the Software Supply Chain with Red Hat Trusted Libraries

https://www.redhat.com/en/blog/extend-trust-across-software-supply-chain-red-hat-trusted-libraries
1•jruohonen•30m ago•1 comments

CIA, Pentagon reviewed secret 'Havana syndrome' device in Norway, WaPo reports

https://www.reuters.com/business/healthcare-pharmaceuticals/cia-pentagon-reviewed-secret-havana-s...
1•alephnerd•33m ago•0 comments

I Analyzed 227M Rows of Medicaid Data. Here's a Sample of What I Found in Maine

https://twitter.com/lukethomas14/status/2022519245553160237
2•NewCzech•33m ago•0 comments

AI: A Bridge Toward Diverse Intelligence

https://www.noemamag.com/ai-could-be-a-bridge-toward-diverse-intelligence/
1•kjhughes•34m ago•0 comments

How to Write Mathematical Papers by Bruce C. Berndt [pdf]

https://alozano.clas.uconn.edu/wp-content/uploads/sites/490/2020/08/berndt.pdf
1•paulpauper•34m ago•0 comments

Curosr: Expanding our long-running agents research preview

https://cursor.com/blog/long-running-agents
3•mustaphah•34m ago•0 comments

Show HN: Cappu – ADHD'er take on a different task manager

https://cappu.app/
1•arajnoha•35m ago•0 comments