frontpage.

Hi HN! I built LLMKube, a Kubernetes operator for deploying GPU-accelerated LLMs in production. One command gets you from zero to inference with full observability.

Why this exists: Regulated industries (healthcare, defense, finance) need air-gapped LLM deployments, but existing tools are either single-node only (Ollama) or lack GPU optimization and SLO enforcement. LLMKube bridges the gap.

What's working:

- 17x speedup with NVIDIA GPUs (64 tok/s on Llama 3.2 3B vs 4.6 tok/s CPU)

- One command: llmkube deploy llama-3b --gpu (auto CUDA setup, scheduling, layer offloading)

- Production observability: Prometheus + Grafana + DCGM GPU metrics out of the box

- OpenAI-compatible API endpoints

- Terraform configs for GKE GPU clusters with auto-scale to zero

Tech: Kubernetes CRDs, llama.cpp with CUDA, NVIDIA GPU Operator, cost-optimized spot instances (~$50-150/mo dev workloads).

Status: v0.2.0 production-ready for single-GPU deployments on standard K8s clusters. Multi-GPU and multi-node model sharding on the roadmap.

Apache 2.0 licensed. Would love feedback from anyone running LLMs in production!

Website: https://llmkube.com

GitHub: https://github.com/Defilan/LLMKube

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]