frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Can you beat an AI at "being human" using one word?

https://turingduel.com/
1•jacob_indie•2h ago
I built TuringDuel, a Turing Test game where each move isjust one word. It's based on a research paper called "A Minimal Turing Test". You play human vs AI until one hits 4 points; an AI judge scores each round.

I’m collecting data to benchmark different models as both players and judges (OpenAI / Anthropic / Gemini / Mistral / DeepSeek), but I only have ~45 games so far and need way more before publishing comparisons. (5 AI players and 4 judges at random gives 20 different game setups to evaluate)

It's fully free (I pay for all the tokens), not even a signup required for the first game: https://turingduel.com

Questions + criticism welcome! I will share aggregated results once there’s enough signal.

Comments

altmanaltman•1h ago
Kind of ironic "Poop" is the word that stands out the most. But having an AI judge it seems weird. To get a true benchmark, the judge must be a human who is susceptible to the 'irrational' cues (like 'Poop' or humor) that the original paper highlighted.
jacob_indie•1h ago
Thanks for the comment, I agree re the irony of having AI judges. Human judges would just not be feasible for now...

What is interesting though is that there are different judges and how they compare to each other (first looks at the data shows they are different).

Also, it is interesting to see how well the AI opponents and judges are picking up personality and clues based on round history. Some LLMs pick it up very well and counter humans, some are quite "dumb" and just submit random words.

Same for AI judges

I do store the reasoning of opponents and judges in the background but am not displaying it for the moment; maybe something interesting to add for later, but it would distort the data ;)

Show HN: TLA+ Workbench skill for coding agents (compat. with Vercel skills CLI)

https://github.com/younes-io/agent-skills/tree/main/skills/tlaplus-workbench
1•youio•3m ago•0 comments

Amazon Kiro took down AWS for 13 hours. Nine other AI agents did worse

https://blog.barrack.ai/amazon-ai-agents-deleting-production/
1•dhayabaran•3m ago•0 comments

In 92% of DeFi exploits AI security review flags underlying problem

https://www.coindesk.com/business/2026/02/20/specialized-ai-detects-92-of-real-world-defi-exploits
1•GustavHartz•4m ago•0 comments

New York Just Killed Its Robotaxi Plan. The Real Problem Isn't the Technology

https://www.phyware.io/blog/ny-robotaxi-trust-gap
1•chris_money202•6m ago•0 comments

Duration between rewards controls the rate of behavioral, dopaminergic learning

https://www.nature.com/articles/s41593-026-02206-2
1•PaulHoule•8m ago•0 comments

Method and system for determining illumination of models using an ambient cube

https://patents.google.com/patent/US7227548B2/en
1•throwaway2027•11m ago•0 comments

Show HN: Cryphos – no-code crypto signal bot with Telegram alerts

https://cryphos.com
1•duckducker•15m ago•0 comments

Ibex: A Typed DataFrame Language with C++ Code Generation

https://bobjansen.net/ibex-a-typed-dataframe-language-with-c-code-generation/
2•Bootvis•15m ago•0 comments

Show HN: RMirror Cloud – Open-Source OCR and Notion Sync for ReMarkable Tablets

https://rmirror.io
1•gottino•15m ago•0 comments

Crosspassion: The Magical Intersection When Interests Collide

https://micahblachman.beehiiv.com/p/crosspassion
2•subdomain•17m ago•0 comments

Ask HN: How do you track 2026 AI price wars? I built a tool to help

1•TokenCost•19m ago•0 comments

Salvius Joins Moltbook

https://blog.salvius.org/2026/02/salvius-joins-moltbook.html
2•gunthercox•20m ago•0 comments

Velocity Is Dead: AI-Generated Compilers and the Future of Software

https://www.openhands.dev/blog/20260219-velocity-is-dead
1•todsacerdoti•21m ago•0 comments

Local LLM Setup on Windows with Ollama and LM Studio (ThinkPad / RTX A3000 GPU)

https://github.com/gbro3n/local-ai/blob/main/docs/local-llm-setup-windows-ollama-lm-studio.md
1•appsoftware•22m ago•1 comments

Show HN: CanaryAI v0.2.5 – Security monitoring on Claude Code actions

https://github.com/jx887/homebrew-canaryai
1•jx887•24m ago•0 comments

The Risk of GCP Viewer Role: Cross-Project Disk Replication

https://aneviaro.eu/posts/the-hidden-risk-of-gcp-viewer-role-cross-project-disk-replication/
2•xrustalik•26m ago•0 comments

Volatility: The volatile memory forensic extraction framework

https://github.com/volatilityfoundation/volatility3
2•transpute•27m ago•0 comments

Show HN: PaiperSwipe – Crowdsourcing AI summaries for 250M+ research papers

https://paiperswipe.com
3•blakecoffee•29m ago•0 comments

Surreal Number

https://en.wikipedia.org/wiki/Surreal_number
1•the-mitr•29m ago•0 comments

We Built UltrafastSecp256k1 Up to 51% Faster ECC Across x86,ARM64,and RISC-V

1•shrecshrec•30m ago•0 comments

I built a local search CLI for my Claude Code history

https://github.com/madzarm/ccsearch
1•madzarm•30m ago•1 comments

Visualizing one system 11 ways

https://app.ilograph.com/demo.ilograph.Ilograph/__overview
1•billyp-rva•30m ago•0 comments

Alyph – Branch ChatGPT conversations visually

https://alyph.app
1•rrr_oh_man•31m ago•1 comments

Open-Source Bionic Reading Chrome Extension (MIT)

1•sdgnbs•33m ago•0 comments

Make and Ideal Man

https://enombic.com/make-an-ideal-man
1•vxxzy•35m ago•0 comments

Orvia – Spin up a real-time room, share files, leave – everything disappears

1•yc_surajkr•39m ago•1 comments

Four Astronauts, One Orbit – What Will They Find?

https://theguildenet.blogspot.com/2026/02/what-four-astronauts-could-find-as-they.html
1•AriaValrhazar•39m ago•0 comments

Why Moltbook Failed: The Lack of Identity in Autonomous AI Agents

1•benstarslett•40m ago•1 comments

SkillPad – GUI for Agent Skills

https://skillpad.dev
1•devxoul•40m ago•0 comments

Capitalle – Daily world capital guessing game (no back end, pure client-side)

https://www.capitalle.app
1•dingobp•43m ago•1 comments