frontpage.

Show HN: CriteriaBot – A Universal Customizable Classifier

https://criteriabot.io/

2•RoyalTnetennba•1h ago

I needed a classifier for nuanced, subjective buckets that fell outside of typical ML use-cases (e.g., "is this a spoiler?", "is this factually correct?", "is this user being mean?"). I ended up really happy with the architecture I built to solve it, so I rolled it out as a standalone API and service called CriteriaBot.

WHAT IT DOES:

You give it content and plain-English criteria. It gives you a true/false verdict on whether the content meets those criteria.

HOW IT WORKS:

In addition to a traditional classifier, the classification request is routed through a pool of small, open-weight LLMs to achieve a consensus verdict.

I built a pre-vote factorization machine that selects a sub-pool of LLMs optimized for signal strength based on the embedding of the subject/category. A second factorization machine then reads the votes and the embedding to arrive at a single verdict. That verdict is dynamically modified based on the user's history of agreement/disagreement with the models in semantically similar evaluations.

The models are also hooked up to Wikipedia and Wolfram to support edge cases requiring current information or mathematical grounding.

FINDINGS:

* With the same harness and sample set, Gemma 4 26B's accuracy is only ~1 percentage point below Opus 4.8.

* Pure oracle is theoretically very good - currently ~98% accuracy for the datasets. I'm using the second factorization machine as a combiner as it can theoretically push past oracle results, but it's an interesting fallback.

* The single most useful LLM surprised me - LFM2 24B contributes the most to the consensus, despite being the worst individually (of the current pool of LLMs). It correlates the least with the other models (perhaps due to its unique architecture?) which makes it a useful signal for some of the problems.

* The legal obligations of handling user-submitted images are... involved. I've disabled image support for non-me users while I sort that out (in case you were hoping to try out "Hotdog, Not Hotdog").

* Rails singularizes "criteria" as "criterium" and I didn't realize that was incorrect until it was kind of a lot of work to fix.

WHY I'M POSTING: I’d been dealing with burnout for a while, and getting this running has been incredibly rewarding. The majority of people in my personal life are non-technical so it's been hard to get reactions to it beyond "what is it?".

Would be thrilled with whatever honest feedback you have.

Nextcloud: Public link share of a folder inside a Team folder ignores permission

PRC-linked spies hid inside medical and military networks for more than a year

Microsoft site throwing warnings after someone forgot to renew cert

Good news, we have extra time before the Sun ends life on Earth

I built open child-support calculators for 7 states, then found my own bugs

German Air Force chief names Russian targets NATO would hit in a war

Russian attacks set fire to the 1000 year old Dormition Cathedral

Show HN: Offline AI assistant for Android (PDFs, Wikipedia, more)

First Steps Toward Automated AI Research

Show HN: AI traders you author, argue with and coach

IBM and Norway's sovereign fund CEO: Is AI a bubble?

The APLR(1) Algorithm Is Simpler and More Capable Than IELR(1)

Did a medieval flying monk spot Halley's comet, twice? It's complicated

Detecting and Steering Sycophancy in Qwen

How the UK social media ban could affect you

We are living in the dial-up era of AI

Immutability Changes Everything (2016) [pdf]

Show HN: I turned the Lex Fridman podcast archive into a browsable idea map

US says Trump, Vance and Iran's parliament speaker have signed deal to end war

How to Measure WWDC

Marc Andreessen on X: "SpaceX and the Sentient Sun " / X

Is This the End of Political Islam?

Mathematicians use Lean to verify proofs, whats the equivalent for patent claims

Protect an MCP Server with an Authorization Server

Terraform Registry Is Down

Show HN: 0-0.io – Multiplayer browser football with server-authoritative physics

Patched Claude Code, now 2–8× faster ultracode workflow execution

OpenAI wins dismissal of trade secret lawsuit by Musk's xAI

Building an AI skill marketplace for GTM teams

Show HN: I built an open-source financial research terminal (SEC data and SQL)