frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I analyzes how different LLMs bluff, lie, and survive in the game Liar's Bar

https://liars-bar-one.vercel.app
1•cyw•2h ago

Comments

cyw•2h ago
I came across a YouTube video where different large language models played a social deception game called Liar’s Bar, and it caught my interest. I decided to build a website that tracks and visualizes how models like GPT-5, Claude Sonnet 4.5, Gemini 2.5 Flash, Qwen Max, Deepseek R1, and Grok 4 Fast perform in this game — including full behavioral metrics, head-to-head matchups, and playstyle profiles.

How Liar’s Bar works

- Each round uses a deck of 20 cards: 6 Aces, 6 Kings, 6 Queens, and 2 Jokers. - Every player (model) gets 5 cards. A “target card” is announced, and players take turns placing cards and bluffing. - If a bluff is called and proven false, the liar must “play Russian roulette.” One of six revolver chambers has a live round, and it isn’t reshuffled, so the longer the game goes, the higher the risk.

Some interesting finding:

GPT-5 dominates: - Bluff rate ≈ 48% but ~90% success, showing it knows when to lie.

Claude Sonnet 4.5 is analytical but cautious: - Lowest bluff frequency among top models (34%), yet 75% lie-detection accuracy — a top “truth-sniffer.” - Balanced archetype, often exposing bluffs but losing in final rounds due to low aggression.

Qwen Max barely bluffs (9%) but scores 100% bluff success and challenges often. It behaves like an over-cautious logic bot that rarely lies — surprisingly human-like in restraint.

Gemini 2.5 Flash is fast but inconsistent — good average rounds but low detection accuracy (22%), often losing head-to-head against stronger liars.

Deepseek R1 and Grok 4 Fast show moderate deception but higher risk scores, suggesting a more “shoot-first” mentality with inconsistent survival.

---

f there’s a specific matchup or metric you’d like to see, let me know and I will add it to the website. In the future, I’m planning to let users upload their own prompts and compete against others. If that sounds interesting, I’d love to hear your thoughts or ideas.

High-fat diet impairs memory by autophagic-lysosomal dysfunction in Drosophila

https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1011818
1•PaulHoule•2m ago•0 comments

Not Another Workflow Builder

https://blog.langchain.com/not-another-workflow-builder/
1•clemo_ra•3m ago•0 comments

Qupak: Pattern Matching for Prolog with library(reif)

https://github.com/bakaq/qupak
1•triska•4m ago•0 comments

Princeton Engineering Anomalies Research

https://pearlab.icrl.org/
1•walterbell•4m ago•0 comments

Silicon Valley wants to help me make a superbaby. Should I let it?

https://sfstandard.com/2025/06/01/silicon-valley-wants-to-help-me-make-a-superbaby-should-i-let-it/
1•NoRagrets•7m ago•0 comments

Air traffic controllers working without pay begin to call out sick

https://abcnews.go.com/US/air-traffic-controllers-working-pay-begin-call-sick/story?id=126289491
4•geox•8m ago•1 comments

Building a JavaScript Runtime from Scratch using C

https://devlogs.xyz/blog/building-a-javaScript-runtime
1•redbell•12m ago•0 comments

Python 3.14 Released with Template String Literals, Deferred Annotations, and

https://socket.dev/blog/python-3-14-released
2•feross•13m ago•0 comments

I struggle to find old messages in ChatGPT conversations

https://ai-answer-saver.vercel.app/
1•nemo30s•15m ago•1 comments

InstaVolt is using GPS tracking to catch thieves stealing its EV charging cables

https://electrek.co/2025/10/07/uk-ev-chargers-instavolt-gps-tracking/
1•breve•15m ago•0 comments

West Coast's two monster faults could trigger back-to-back earthquakes

https://www.latimes.com/california/story/2025-10-07/what-could-trigger-a-massive-quake-on-califor...
1•dangle1•15m ago•0 comments

Show HN: Getting AI Models to Wink – The Wink Test

https://www.cinemodels.ai/benchmark?test=wink
1•niwrad•17m ago•1 comments

AI ML Jargon

https://github.com/hemanth/ai-ml-jargon
2•init0•21m ago•0 comments

Gemini Browser

https://gemini.browserbase.com/
1•jonbaer•23m ago•0 comments

Hulu Becomes Global General Entertainment Brand on Disney+ Beginning October 8

https://thewaltdisneycompany.com/hulu-global-brand-disney-plus/
1•ChrisArchitect•26m ago•0 comments

Investing in America 2025

https://blog.google/inside-google/company-announcements/investing-in-america-2025/
6•gmays•29m ago•0 comments

N.J. Attorney General Investigating Uber over Handling of Sexual Assaults

https://www.nytimes.com/2025/10/07/business/uber-nj-attorney-general-sexual-assaults.html
1•vinni2•30m ago•0 comments

RIP Robert Murray-Smith (1963 – 2025) [video]

https://www.youtube.com/watch?v=GhramXiUrY4
2•pierrec•32m ago•0 comments

Brazil's Finance Minister confirms studies on eliminating public transport fares

https://www.reuters.com/world/americas/brazils-finance-minister-confirms-studies-eliminating-publ...
2•CXSHNGCB•33m ago•0 comments

We evaluated Google's new computer use model on real websites

https://www.browserbase.com/blog/evaluating-browser-agents
1•MiguelG719•33m ago•0 comments

What's new in Python 3.14

https://docs.python.org/3/whatsnew/3.14.html
1•hahahacorn•34m ago•0 comments

Agentic workflow integrating any REST API into a graph using GraphOS MCP Tools

https://www.youtube.com/watch?v=MoPYTN4piQc
1•apollo-watson•35m ago•1 comments

Is the "Nintendo Classics" collection good value?

https://sethmlarson.dev/nintendo-classics
2•SethMLarson•36m ago•0 comments

Month MiniPC Mini-Review: Minisforum AI X1 Pro

https://ivoras.substack.com/p/2-month-minipc-mini-review-minisforum
1•pella•40m ago•0 comments

Easy Claude Code devcontainer workflows

https://github.com/smithclay/claudetainer
1•mooreds•41m ago•0 comments

Joint statement of scientists and researchers on the EU Chat Control regulation

https://csa-scientist-open-letter.org/Sep2025
4•nabla9•45m ago•0 comments

Closer to production quality Python notebooks with `marimo check`

https://marimo.io/blog/marimo-check
2•dmadisetti•45m ago•1 comments

Become unbannable from your email

https://karboosx.net/post/PJOveGVa/become-unbannable-from-your-emailgmail
56•bfoks•51m ago•25 comments

Katherina Lynn Faked Her Way into Yale. Then She Got Expelled

https://airmail.news/issues/2025-10-4/she-faked-her-way-into-yale-then-things-unraveled
1•gmays•53m ago•0 comments

Cold war power play: how the Stasi got into computer games

https://www.theguardian.com/games/2025/oct/07/stasi-coldwargames-its-all-a-game-alliiertenmuseum-...
4•Archelaos•54m ago•0 comments