frontpage.

I work on inference scheduling — KV cache-aware routing, load balancing across GPU workers, that kind of thing. I wanted something like k9s but for my inference stack. Nothing existed, so I built it.

llmtop is a real-time terminal dashboard for LLM inference workers. It scrapes the Prometheus /metrics endpoints that vLLM, SGLang, and LMCache already expose and shows everything in one view: KV cache usage, queue depth, TTFT/ITL latencies (P50/P99 from histogram buckets), token throughput, prefix cache hit rates. Color-coded — red means go fix it.

``` brew install InfraWhisperer/tap/llmtop Or go install github.com/InfraWhisperer/llmtop/cmd/llmtop@latest. ```

Single binary, no Prometheus server needed, no Grafana, no config. Just run llmtop and it auto-discovers local workers.

Written in Go with Bubbletea. Working on Kubernetes pod auto-discovery and a GPU metrics view next.

Echoo – Free, open-source macOS app for system-wide AI via shortcuts and voice

Run any LLM on any hardware. Auto-detects your GPU, checks if the model fits

AI changed software cost structure not Value structure

Arizona Charges Kalshi with Illegal Gambling Operation

Show HN: Open-source YouTube summary, transcript chat, and timeline sidepanel

British physics faces 'catastrophic' cuts

Show HN: Vue-OTP-pro – Minimal OTP input for Vue 3

Celebrating Tony Hoare's mark on computer science

The Hyperscale IPv4 Moat: Analyzing AWS's Latest 9M Address Acquisition

The Last Quiet Thing

CVE-2025-71263: Unix fourth edition buffer overflow vulnerability

The End of Political Hypocrisy

Show HN: I wrote a macOS C++ audio driver to fix HDMI volume controls

Contract-Centered Iterative Stability v4.7.3

DeepSeek by Hand in Excel

Thrum – Agent coordination through messaging

How Israel is censoring reporting on the war

100% private Mac dictation app

Designing Defensive Databases for Agents

LightningPDF – HTML to PDF API with a native renderer and Chromium fallback

Apache Fory C++: 10x faster serialization than protobuf

Solow's 1987 Computer Paradox Explains Today's AI Measurement Crisis

Snap-confine and systemd-tmpfiles = root (CVE-2026-3888)

Show HN: The Lottery of Life

Undone Computer Science

I Don't Know ML. Claude Does. 0.871 F1 on Predicting Linux Game Compatibility

Title: "Announcing S3syncy – Open-Source Real-Time S3 Synchronization"

Using Ledger, plain text accounting and a touch AI to fill in my UK tax return

Agents over Bubbles

Agent Package Manager

Show HN: Llmtop – Htop for LLM Inference Clusters (vLLM, SGLang, Ollama, llama)