Quick context on why I built this: I was personally spending ~$1,200/month on Claude API (I use it for everything — coding, writing, analysis). That’s $14,400/year. Even if I dropped to a single $20/month sub, every query I send tells Anthropic what I’m working on, what I’m thinking about, what problems I have.
I wanted Claude-quality inference without the cloud dependency. That’s not possible yet for large models, but what IS possible locally is surprisingly good.
What’s actually on it:
- 15 chat models — Qwen3-235B-A22B (MoE, fast on consumer hardware), LLaMA 3.3 70B Q4, DeepSeek-R1 32B, Gemma 3 27B, Phi-4
- 14 vision models — drag in an image, get analysis locally. No upload to any server.
- 5 coding assistants — Qwen2.5-Coder 32B is genuinely impressive for local inference
- 3 image generators via ComfyUI — FLUX.1-schnell, Z-Image-Turbo, Qwen-Image. This is the part I haven’t seen anyone else ship as plug-and-play. Most local image gen setups require 30 min of dependency hell.
- RAG/knowledgebase via Qdrant — drop in PDFs, docs, notes; semantic search across them locally
- Medical AI (MedGemma), uncensored models (Dolphin, Nous-Hermes, Abliterated variants)
The target user: Anyone paying $200+/month across Claude, ChatGPT, Midjourney, GitHub Copilot, Perplexity. That’s $2,400/year with zero privacy. The Core tier ($399) pays for itself in 2 months if you cancel just one sub.
Stack: Tauri + Rust + React frontend. Ollama for inference. Qdrant for vector search. ComfyUI for image gen. Everything runs from the SSD — unplug and zero trace on the host machine.
Why not just software? Three reasons:
1. Downloading all of this takes 6-8 hours and 300+ GB — we pre-load
2. It’s portable between machines (home Mac, work PC, travel laptop)
3. Zero install, zero config — double-click and it works
Honest limitations:
- 8GB RAM minimum, 16GB recommended for the larger models
- Not a GPU box — these are quantized models optimized for CPU/unified memory
- Not a replacement for frontier models (GPT-4, Claude Opus) for the hardest tasks
- First batch is 25 units. This is validation.
laramie_co•1h ago
Quick context on why I built this: I was personally spending ~$1,200/month on Claude API (I use it for everything — coding, writing, analysis). That’s $14,400/year. Even if I dropped to a single $20/month sub, every query I send tells Anthropic what I’m working on, what I’m thinking about, what problems I have.
I wanted Claude-quality inference without the cloud dependency. That’s not possible yet for large models, but what IS possible locally is surprisingly good.
What’s actually on it:
- 15 chat models — Qwen3-235B-A22B (MoE, fast on consumer hardware), LLaMA 3.3 70B Q4, DeepSeek-R1 32B, Gemma 3 27B, Phi-4 - 14 vision models — drag in an image, get analysis locally. No upload to any server. - 5 coding assistants — Qwen2.5-Coder 32B is genuinely impressive for local inference - 3 image generators via ComfyUI — FLUX.1-schnell, Z-Image-Turbo, Qwen-Image. This is the part I haven’t seen anyone else ship as plug-and-play. Most local image gen setups require 30 min of dependency hell. - RAG/knowledgebase via Qdrant — drop in PDFs, docs, notes; semantic search across them locally - Medical AI (MedGemma), uncensored models (Dolphin, Nous-Hermes, Abliterated variants)
The target user: Anyone paying $200+/month across Claude, ChatGPT, Midjourney, GitHub Copilot, Perplexity. That’s $2,400/year with zero privacy. The Core tier ($399) pays for itself in 2 months if you cancel just one sub.
Stack: Tauri + Rust + React frontend. Ollama for inference. Qdrant for vector search. ComfyUI for image gen. Everything runs from the SSD — unplug and zero trace on the host machine.
Why not just software? Three reasons: 1. Downloading all of this takes 6-8 hours and 300+ GB — we pre-load 2. It’s portable between machines (home Mac, work PC, travel laptop) 3. Zero install, zero config — double-click and it works
Honest limitations: - 8GB RAM minimum, 16GB recommended for the larger models - Not a GPU box — these are quantized models optimized for CPU/unified memory - Not a replacement for frontier models (GPT-4, Claude Opus) for the hardest tasks - First batch is 25 units. This is validation.
Tiers: - Core: 20 models, 256GB — $399 - Pro: 34 models, 512GB — $599 - Ultra: 42 models, 1TB — $899
Happy to answer architecture questions, model quantization tradeoffs, or anything else.