frontpage.

Hi HN,

I built Search Bench as a small experiment to compare search engines without showing which engine produced which results. It was inspired by the idea behind the LLM Arena, but applied to search.

How it works:

1. You enter a query.

2. You see two result sets side-by-side (search engine names hidden).

3. You pick which is better, or mark them as similar.

Methodology

- Each vote is a pairwise comparison (ties count as 0.5 win each).

- Ratings use a Bradley–Terry model with iteratively updated ability scores, normalized by geometric mean.

- Final scores are log-scaled (1500 + 400 * log10(ability)), like ELO but derived from the Bradley–Terry model.

- Pair selection is adaptive, prioritizing under-sampled search engines and close matchups via an uncertainty × closeness weighting.

This definitely isn't an objective ranking: queries and voters are self-selected, results vary by context, and what counts as “better” depends on the person. Right now, the dataset is small (≈200 comparisons, mostly from me), so I'm especially interested in seeing:

- Whether results change with more independent voters.

- Whether there's a real quality signal at scale, or if most differences disappear once brand bias is removed.

If you have a minute, comparing a few queries yourself would be very helpful! I'd also appreciate critique, especially around statistical validity, bias sources, aggregation methods, or ways this could be gamed or misinterpreted.

Show HN: Engineering Perception with Combinatorial Memetics

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

The Anthropic Hive Mind

Just Started Using AmpCode

LLM as an Engineer vs. a Founder?

Crosstalk inside cells helps pathogens evade drugs, study finds

Show HN: Design system generator (mood to CSS in <1 second)

Show HN: 26/02/26 – 5 songs in a day

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

Top AI models fail at >96% of tasks

The Science of the Perfect Second (2023)

Bob Beck (OpenBSD) on why vi should stay vi (2006)

Show HN: a glimpse into the future of eye tracking for multi-agent use

The Optima-l Situation: A deep dive into the classic humanist sans-serif

Barn Owls Know When to Wait

Implementing TCP Echo Server in Rust [video]

LicGen – Offline License Generator (CLI and Web UI)

Service Degradation in West US Region

The Janitor on Mars

Bringing Polars to .NET

Adventures in Guix Packaging

Show HN: We had 20 Claude terminals open, so we built Orcha

Your Best Thinking Is Wasted on the Wrong Decisions

Warcraftcn/UI – UI component library inspired by classic Warcraft III aesthetics

Trump Vodka Becomes Available for Pre-Orders

Velocity of Money

Stop building automations. Start running your business

You can't QA your way to the frontier

Show HN: PalettePoint – AI color palette generator from text or images

Robust and Interactable World Models in Computer Vision [video]