frontpage.

Code: https://github.com/ChandanKSahu/MiRAGE

Hi HN, we are the authors of MiRAGE.

We built this because standard RAG benchmarks (like Natural Questions) rely on text-only Wikipedia-like data, which doesn't reflect the reality of enterprise RAG. In the real world, "truth" is often locked in a chart, a complex table, or a diagram deep inside a PDF.

MiRAGE is an open-source framework that uses a swarm of specialized agents to reverse-engineer evaluation datasets from your own documents.

How it works:

1. Ingest: It uses vision models to describe charts/tables and "semantically chunk" the PDF.

2. Generate: An agent swarm (Generator, Retriever, Persona-Injector) creates multi-hop questions.

3. Verify: An adversarial "Verifier Agent" fact-checks the answers against the source to prevent hallucinated ground truth.

Key Finding: In our ablation studies, removing the adversarial verifier dropped the faithfulness of the generated dataset from 97% to 74%. Synthetic data needs self-verification.

Resources:

- Paper (arXiv): https://arxiv.org/abs/2601.15487 - Install: pip install mirage-benchmark - Demo: (See the terminal video in the repo)

We’d like your feedback, especially on the "Visual Grounding" challenge, it’s still the hardest part of multimodal RAG. Happy to answer any questions!

The mathematics of compression in database systems

The Datacenter as a Computer (2013)

Show HN: Open-Source SDK for AI Knowledge Work

Study: LLMs found to echo false claims in medical notes and social media

Hyundai Motor to supply 50k autonomous vehicles to Waymo by 2028

Show HN: Deploy Multiple OpenClaw Assistants Easily

Vibe Coding

AI workloads challenge the cattle model

The Singularity Will Occur on a Tuesday

A Stanford Experiment to Pair 5,000 Singles Has Taken over Campus

Ask HN: What's your opinion on the Swisscows search engine?

NYC subway stations by population in catchment area

Free LLM API Resources – A List of Free LLM Inference APIs

Lokutor Orchestrator: A Go library for full-duplex, interruptible voice AI

Show HN: HN Companion – web app that enhances the experience of reading HN

"Hate brings views": Confessions of a London fake news TikToker

Show HN: I made an open source dashboard to track your Stripe and RevenueCat rev

Why some Canadians are betting big on 3D printed housing in Canada

Show HN: ClearDemand – Cross-case search and drafting for injury firms

Copilot SDK in Technical Preview

Show HN: I made paperboat.website, a platform for friends and creativity

Daylight Mirror: Mac on paperlike screen, 30fps <10ms, Opus 4.6 in <8 hours

Show HN: A real-time collaborative word puzzle inspired by NYT Spelling Bee

Semaglutide improves knee osteoarthritis independant of weight loss

Claude Feature Request: Support Agents.md

AI Flattened the Engineering Ladder

Map showing most notable people per region

America Isn't Ready for What AI Will Do to Jobs

Show HN: Browse neologisms for the feelings and experiences English can't name

ICE Is Expanding Across the US at Breakneck Speed. Here's Where It's Going Next

MiRAGE: Open-source framework for multimodal RAG evaluation