frontpage.

I'm sharing a simple utility I worked on to make evaluation of structured outputs from LLMs a bit simpler.

For context:

When evaluating structured outputs, you often want to composable comparison logic to allow for meaningful comparison across different types of outputs (free text, enums, ints, and all the other json tyeps). You also want to compare arrays as a multisets -- order-agnostic pairwise matching across elements in sets.

What it is: This CLI and python library (I called "structeval" but not to be compared to the LLM eval framework with the same name -- I may change it!) supports order-agnostic pairwise matching, customizable comparison logic, and recursive metric aggregation. It can also be used to compare outputs when sampling from an LLM with N>1 to measure semantic entropy or find the "median" result. As it works as a generic json tool without requiring a schema, it could also be applied, at least in principle, as a more configurable (and quirkier :) ) alternative to a generic diffing tool like jd.

I had struggled with this task in a few contexts and found I was often rewriting a utility like this, so figured it may be helpful for others if encapsulated in a little library.

But I'm curious if any feedback or suggestions!

Cobalt Qube

Antic Magazine Interviews Alan Reeve, the Creator of the Diamond OS (1990)

Running the Google Pixel Camera app on a robustly de-Googled cellphone

Show HN: Producthunt Alternative

Hamilton Smith obituary:co-discovered precise molecular scissors for cutting DNA

Show HN: Metcalfe – private network for marketplace operators

A modern 35mm film scanner for home

Contacted by the US Secret Service and the AI Surveillance Center Dystopia [video]

The AI Surveillance Dystopia: Spying, Data Trafficking, & Corruption

A Catalog of Side Effects

Slow moving UX disaster: Passkeys are now required

It Is a Perl

Agentic Pelican on a Bicycle

Ring-1T: open-source, SOTA thinking model with a trillion parameters

Instead of forcing buy-in, make it fun

How to Train an LLM: Part 1

Show HN: SceneReaderAI – Hear your script read aloud by AI voices (free trial)

Will I Make It to the Restaurant Before the Soup Dumplings Get Cold?

Hiring and the Market for Lemons

Can Elon Musk Read Your X Chat Messages?

Vertical Integration is the only thing that matters

Show HN: WorkBill – Modern Alternative to QuickBooks

<Ruby>: The Ruby Annotation element

Lidar technology improves forest assessment with laser beams

Reddit mod jailed for sharing movie sex scenes in rare "moral rights" verdict

Rust 1.91.1

Disaggregated Database Management Systems

Cursor CEO on Scaling and the Coming 'iPhone Moment' for AI Coding

New Linux patch lets you cancel the hibernation process

VMS/XDE: An OpenVMS x86 Development Environment for Linux and Windows/WSL

Show HN: StructEval - a structured output evaluation and comparison tool