frontpage.

We created 24 image prompts across 3 characters living in Rome, 110 CE. Each prompt has a naive version and a culturally-grounded version enhanced by the Triad Engine (structured domain knowledge injection). Same model, same pipeline, only the prompt changes. A blinded Gemini Vision judge scores each pair without knowing which is which.

Results:

RAW (naive prompt): 12.5% historically accurate TRIAD (grounded prompt): 83.3% historically accurate In 23 of 24 pairs, the grounded image was judged more accurate In 0 of 24 pairs was the naive image judged better The key insight for prompt engineers: image models silently drop historical terms they don't recognize. "dextrarum iunctio handshake" produces nothing useful. "two men clasping right hands wrist-to-wrist, elbows raised" works. Visual translation, not historical terminology.

The full benchmark — all 48 images, prompts, evaluation data, and reproducible pipeline — is open source. You can re-run the blinded evaluation yourself with a free Gemini API key.

Repo: https://github.com/Mysticbirdie/image-cultural-accuracy-benc...

Paper: https://github.com/Mysticbirdie/image-cultural-accuracy-benc...

Programmers will document for Claude, but not for each other

Emit Emails – Personalized email sending without the fluff

AI Crash the Physics of the Collapse [video]

3AM Coding:cracking persistent open-source memory for agents

Crow Watch: A Hacker News Alternative

Analysis of Ninth Circuit Allows TOS Amendment by Email–Ireland-Gordy vs. Tile

Terence Tao: Formalizing a proof in Lean using Claude Code [video]

Apple: The first 50 years, CBS Sunday Morning [video]

NSF National Deep Inference Fabric

CorridorKey – Perfect Green Screen Keys

Shockwave Player Reimplemented in Rust and WASM

How to win slots and influence people

Ask HN: Where do all the laid off devs hang out?

Set-OutlookSignatures v4.26.0 support for M365 sovereign clouds

Show HN: TrustScan – Simplify privacy policies and audit GDPR compliance

Every business will have AGI by 2027

Show HN: Marketing Content Generator AI-powered multi-channel content platform

Show HN: I built a mini PowerBI for tech comps with no dev experience with Codex

Fontcrafter: Turn Your Handwriting into a Real Font

Show HN: cursor-tg – Run Cursor Cloud Agents from Telegram

MoltBrowser MCP

Show HN: FretBench – I tested 14 LLMs on reading guitar tabs. Most failed

Show HN: NirvaCrop – Offline Python tool for batch video cropping

A sneak preview behind an embedded software factory. I suspect "rad" is back

Sumi – Open-source voice-to-text with local AI polishing

Show HN: U-Claw – An Offline Installer USB for OpenClaw in China

Replaced by a Goldfish

Why AI Agents Need Email Guardrails

SQLite: Query Result Formatting in the CLI

Seedance2.0 and OmniVideo: AI video creation from text and images – experiences?

Show HN: AI image models hallucinate history, we built a method to fix it it

Comments