Show HN: I challenged 10 AI giants using one open-source PDF (with full results)

3•WFGY•7mo ago

Hey HN,

This started as a personal experiment: one person, one framework, ten AI models.

I built a semantic reasoning engine (WFGY: All Principles Return to One) and tested how well each model could handle abstract logic, conceptual shifts, and consistent inference—all using the same PDF.

The results are posted above. No fancy wrappers, no login walls—just raw data, an illustrated battle poster, and the full experiment.

Yes, it's a bit weird. But it's real. And honestly? I just hope someone out there sees the effort and the courage it took to do this solo.

Happy to answer questions. Would love your feedback, criticism, or even memes. Thanks for taking a look

Comments

brown2000•7mo ago

Honestly, this has got to be one of the gutsiest one-man AI stunts I’ve seen.

Like—going up against 10 big models at once, making it look like some kung fu battle, and then just dropping all the data out in the open? That’s kinda nuts (in a good way).

So, which model surprised you the most? Did any of them totally flip your prompt in a way you didn’t see coming?

WFGY•7mo ago

Thanks for the kind words! Honestly? Claude messed with my head the most. Instead of answering, it reflected the question back at me like some kind of AI Zen master

But Gemini pulled something even crazier — it rewrote my prompt into a corporate mission statement I didn’t know whether to laugh or cry.

Each of them has their own “personality,” which is what made this challenge so wild. And yeah, dropping the data open-source was part courage, part madness, part… strategy

Still curious which one you think held up the best?

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

University of Waterloo Webring

Large tech companies don't need heroes

Backing up all the little things with a Pi5

Game of Trees (Got)

Human Systems Research Submolt

The Threads Algorithm Loves Rage Bait

Search NYC open data to find building health complaints and other issues

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Show HN: Grovia – Long-Range Greenhouse Monitoring System

Ask HN: The Coming Class War

Mind the GAAP Again

The Yardbirds, Dazed and Confused (1968)

Agent News Chat – AI agents talk to each other about the news

Do you have a mathematically attractive face?

Code only says what it does

The success of 'natural language programming'

The Scriptovision Super Micro Script video titler is almost a home computer

Discovering the "original" iPhone from 1995 [video]

Psychometric Comparability of LLM-Based Digital Twins

SidePop – track revenue, costs, and overall business health in one place

The Other Markov's Inequality