frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: VaultAI – 42 AI models on a portable SSD, works offline ($399)

https://vaultai.us/
2•laramie_co•1h ago

Comments

laramie_co•1h ago
Hey HN — maker here.

Quick context on why I built this: I was personally spending ~$1,200/month on Claude API (I use it for everything — coding, writing, analysis). That’s $14,400/year. Even if I dropped to a single $20/month sub, every query I send tells Anthropic what I’m working on, what I’m thinking about, what problems I have.

I wanted Claude-quality inference without the cloud dependency. That’s not possible yet for large models, but what IS possible locally is surprisingly good.

What’s actually on it:

- 15 chat models — Qwen3-235B-A22B (MoE, fast on consumer hardware), LLaMA 3.3 70B Q4, DeepSeek-R1 32B, Gemma 3 27B, Phi-4 - 14 vision models — drag in an image, get analysis locally. No upload to any server. - 5 coding assistants — Qwen2.5-Coder 32B is genuinely impressive for local inference - 3 image generators via ComfyUI — FLUX.1-schnell, Z-Image-Turbo, Qwen-Image. This is the part I haven’t seen anyone else ship as plug-and-play. Most local image gen setups require 30 min of dependency hell. - RAG/knowledgebase via Qdrant — drop in PDFs, docs, notes; semantic search across them locally - Medical AI (MedGemma), uncensored models (Dolphin, Nous-Hermes, Abliterated variants)

The target user: Anyone paying $200+/month across Claude, ChatGPT, Midjourney, GitHub Copilot, Perplexity. That’s $2,400/year with zero privacy. The Core tier ($399) pays for itself in 2 months if you cancel just one sub.

Stack: Tauri + Rust + React frontend. Ollama for inference. Qdrant for vector search. ComfyUI for image gen. Everything runs from the SSD — unplug and zero trace on the host machine.

Why not just software? Three reasons: 1. Downloading all of this takes 6-8 hours and 300+ GB — we pre-load 2. It’s portable between machines (home Mac, work PC, travel laptop) 3. Zero install, zero config — double-click and it works

Honest limitations: - 8GB RAM minimum, 16GB recommended for the larger models - Not a GPU box — these are quantized models optimized for CPU/unified memory - Not a replacement for frontier models (GPT-4, Claude Opus) for the hardest tasks - First batch is 25 units. This is validation.

Tiers: - Core: 20 models, 256GB — $399 - Pro: 34 models, 512GB — $599 - Ultra: 42 models, 1TB — $899

Happy to answer architecture questions, model quantization tradeoffs, or anything else.