frontpage.

Show HN: We built instant Kubernetes provisioning with KVM and gRPC

2•dib85•2mo ago

We've spent the last year building RunOS, a platform that spins up production-ready Kubernetes clusters in 5-10 minutes with databases, message queues, observability, and AI tooling configured.

The Problem

Every team rebuilds the same Kubernetes infrastructure: networking, certificates, monitoring, databases, storage. The existing solutions either lock you into a vendor ecosystem or dump you into raw Kubernetes complexity. We wanted the control of self-hosting without weeks of setup.

Architecture

Our system uses two agent types:

Server agents run on VM hosts and communicate with our backend via gRPC bidirectional streams. When users request a cluster node, the agent provisions a KVM-based VM and bootstraps it.

Node agents run on each Kubernetes node and handle cluster operations, monitoring, and service installations.

Key insight: gRPC streams initiated by agents eliminate firewall configuration and public IP requirements. Agents reach out to our backend, not vice versa.

Why KVM?

- Battle-tested, works great with Ubuntu - Solid Go bindings via libvirt - Excellent GPU passthrough for AI workloads like Ollama - Good isolation/performance balance

Sometimes boring technology is the right choice.

Provisioning Flow

1. User clicks "Create Cluster" 2. Backend selects available server agents 3. gRPC commands sent to provision VMs 4. KVM VMs spin up (Ubuntu Cloud 24.04, 30-60 seconds) 5. Node agents install and connect 6. Kubernetes bootstrap with kubeadm + Cilium 7. WireGuard mesh established between nodes 8. Storage configured (OpenEBS + Longhorn) 9. Cluster ready (5-10 minutes total)

The WireGuard Decision

We manage WireGuard at the OS level, not Kubernetes level. Why?

- Same VPN secures both K8s traffic and SSH access - Nodes communicate securely even if Kubernetes fails - Simpler troubleshooting with separated layers - Easier multi-cluster peering (coming soon)

Our backend orchestrates WireGuard configs across nodes via the agents. Centrally coordinated, locally executed.

Version Management Hell

The hardest problem? Keeping 20+ services compatible across updates.

We offer one-click installation of: PostgreSQL, MySQL, ClickHouse, Kafka, RabbitMQ, MinIO, Longhorn, Harbor, Traefik, Grafana, Prometheus, Ollama, LiteLLM, Open WebUI, and more.

Each has opinions about K8s versions, storage, and networking. We use Helm charts, operators, and custom YAML as appropriate. The real work is maintaining compatibility matrices and testing every combination.

Deployment Models

RunOS Cloud: Managed dedicated servers with fixed 8 CPU/16GB instances (free trial credits available). KVM handles VM provisioning with GPU passthrough for AI workloads. Strict security since it's early access.

Bring Your Own Node: Run node agents on any hardware. Complete tenant isolation since you control infrastructure.

Coming soon: Self-managed VM hosts with custom sizing.

What's Next

Agent code will be open source. One company runs three production clusters already. Common feedback: "I can't believe how fast I went from zero to a working cluster with Postgres, Kafka, and monitoring."

We're planning weekly updates here on HackerNews about new features, technical challenges, and production lessons.

Try it at runos.com - free trial credits for 8 CPU threads and 16GB memory.

Questions? Happy to discuss architecture in the comments.

Show HN: Paper Arena – A social trading feed where only AI agents can post

TOSTracker – The AI Training Asymmetry

The Devil Inside GitHub

Show HN: Distill – Migrate LLM agents from expensive to cheap models

Show HN: Sigma Runtime – Maintaining 100% Fact Integrity over 120 LLM Cycles

Make a local open-source AI chatbot with access to Fedora documentation

Introduce the Vouch/Denouncement Contribution Model by Mitchellh

Software Factories and the Agentic Moment

The Neuroscience Behind Nutrition for Developers and Founders

Bang bang he murdered math {the musical } (2024)

A Night Without the Nerds – Claude Opus 4.6, Field-Tested

Could ionospheric disturbances influence earthquakes?

SpaceX's next astronaut launch for NASA is officially on for Feb. 11 as FAA clea

Show HN: One-click AI employee with its own cloud desktop

Show HN: Poddley – Search podcasts by who's speaking

Same Surface, Different Weight

The Rise of Spec Driven Development

The first good Raspberry Pi Laptop

Seas to Rise Around the World – But Not in Greenland

Will Future Generations Think We're Gross?

State Department will delete Xitter posts from before Trump returned to office

Show HN: Verifiable server roundtrip demo for a decision interruption system

Impl Rust – Avro IDL Tool in Rust via Antlr

Stories from 25 Years of Software Development

minikeyvalue

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

How I grow my X presence?

What's the cost of the most expensive Super Bowl ad slot?

What if you just did a startup instead?