frontpage.

We built Valohai LLM for tracking and comparing LLM evaluation results. Whether your evals live in notebooks and spreadsheets, or you're using an observability tool that wasn't built for comparison, this gives you a purpose-built eval comparison dashboard.

Run evals with a Python library (pip install valohai-llm), results stream in, and you can compare up to 6 configurations side by side. Group by any dimension (model, category, difficulty) to see where each model excels.

It doesn't do tracing or production observability, for now just eval tracking and comparison. What's cool is that you can define parameters you would like to test with and run a sweep across all of them.

Feedback welcome, especially from anyone comparing models and evaluating regularly!

Platforms hide ads and manipulation in their DOM – FB was the toughest

Honey bees navigate more precisely than previously thought

Omacon Comes to New York

I tested Claude Code and Codex for supply chain attacks. Both failed

The $2k Laptop That Replaced My $200/Month AI Subscription

Site is now webmention-aware

As memory shortage persists, vendor price quotes are not long remembered

The 80% Power Lie

Learning KeyBee

Keybee: A Keyboard Designed for Smartphones

Untapped Way to Learn a Codebase: Build a Visualizer

Cringeworthy in the Future

Testing Linux memory limits is a bit of a pain

Accenture 'links staff promotions to use of AI tools'

NSA and IETF, Part 5

International box-sizing Awareness Day

You Are NOT getting replaced [video]

Show HN: AI agent audited its platform, got 80% wrong, rewrote its methodology

Astronomers detect a solar system they say should not be possible

Show HN: Oura Ring 4, Thoughts?

Baby microbiomes in the West differ from those everywhere else

Taste for Makers (2002)

Host Leadership

Show HN: CMV – Virtual memory for Claude Code sessions

Show HN: HyprVim – System-wide Vim modes in Hyprland with which-key HUD

US admin official threatens NY Fed economists – Krugman [video]

Google Banning AI Ultra subscribers for using OpenCode while still charging them

Meta CEO Knew Kids Were Being Hurt and He Covered It Up

56% of PyPI malware runs at install, so I sandboxed pip with eBPF

Show HN: HoldMyWhoop – Professional wearable delegation for fitness trackers

Show HN: Valohai LLM – Track and compare LLM evaluation results in one dashboard