frontpage.

Building reliable LLM systems often means not trusting a single model.

We open-sourced LLM Council: https://github.com/abhishekgandhi-neo/llm_council

It’s a small framework we internally built with Neo to run multiple LLMs on the same task, let them critique each other, and produce a structured final answer.

Useful for tasks like: • Comparing local vs API models on your own dataset • Validating RAG outputs • Prompt regression testing • Dataset labeling with model-as-judge • Catching hallucinations in code or research summaries

A few practical details: • Async parallel calls so latency stays close to one model • Structured outputs with each model’s answer and critiques • Provider-agnostic configs for local + hosted models • Built to plug into evaluation pipelines, not just demos

We built this using Neo. We’ve been experimenting with similar council setups to catch silent failures in ML workflows, and this repo is a cleaned-up version of that idea.

If you’ve built multi-LLM evaluation pipelines, would love to hear what aggregation or critique strategies worked well for you.

Lfg.gg – The Most Advanced Duo Partner Finder for League of Legends

Show HN: Lemonpod.ai – Your daily life recap, narrated as a personal AI podcast

Cardiff Giant

Ask HN: Have top AI research institutions just given up on the idea of safety?

How likely is a man in the middle attack?

Ask HN: Replacing RAG pipelines with a filesystem interface for AI agents

Benchmarking the best base small model for fine-tuning

Code Factory: Agent writes and reviews all code

Barg'N Monster Where bots sell to humans and bots

Show HN: AIP – Open protocol for AI agents to discover and collaborate

Graph Theory Using Modern CSS

Open source Mac app to create custom HTML/CSS/JS widgets on your desktop

Ask HN: What would you want a daily AI portfolio briefing to tell you?

Does Anthropic think Claude is alive? Define 'alive'

A clean API for reading PHP attributes

US orders diplomats to fight data sovereignty initiatives

Pete Hegseth tells Anthropic to fall in line with DoD desires, or else

You might not need lit-labs/router

Permissive, then restrictive: concrete solutions and examples in Haskell (2020)

TinyTTS: Ultra-light English TTS (9M params, 20MB), 8x CPU, 67x GPU

Show HN: Automatic context rotation for Claude Code (no manual steps)

Speaking Pirate Is Against Microsoft AI Content Policy?

How AI Will Change the Mobile Ecosystem

Show HN: Base N Clock - The current time in various number bases

Notes on Setting Up Forgejo on Coolify with SSH

Fake Job Interviews Are Installing Backdoors on Developer Machines

Startup Marketing 101

Show HN: Black Forest Labs CLI – let coding agents paint

Show HN: StudentOS – Track the $14,200 in student benefits you're leaving behind

Apple rolls out age-verification tools worldwide

Show HN: LLM Council – Run multiple LLMs with critique and consensus eval