frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I analyzes how different LLMs bluff, lie, and survive in the game Liar's Bar

https://liars-bar-one.vercel.app
1•cyw•4mo ago

Comments

cyw•4mo ago
I came across a YouTube video where different large language models played a social deception game called Liar’s Bar, and it caught my interest. I decided to build a website that tracks and visualizes how models like GPT-5, Claude Sonnet 4.5, Gemini 2.5 Flash, Qwen Max, Deepseek R1, and Grok 4 Fast perform in this game — including full behavioral metrics, head-to-head matchups, and playstyle profiles.

How Liar’s Bar works

- Each round uses a deck of 20 cards: 6 Aces, 6 Kings, 6 Queens, and 2 Jokers. - Every player (model) gets 5 cards. A “target card” is announced, and players take turns placing cards and bluffing. - If a bluff is called and proven false, the liar must “play Russian roulette.” One of six revolver chambers has a live round, and it isn’t reshuffled, so the longer the game goes, the higher the risk.

Some interesting finding:

GPT-5 dominates: - Bluff rate ≈ 48% but ~90% success, showing it knows when to lie.

Claude Sonnet 4.5 is analytical but cautious: - Lowest bluff frequency among top models (34%), yet 75% lie-detection accuracy — a top “truth-sniffer.” - Balanced archetype, often exposing bluffs but losing in final rounds due to low aggression.

Qwen Max barely bluffs (9%) but scores 100% bluff success and challenges often. It behaves like an over-cautious logic bot that rarely lies — surprisingly human-like in restraint.

Gemini 2.5 Flash is fast but inconsistent — good average rounds but low detection accuracy (22%), often losing head-to-head against stronger liars.

Deepseek R1 and Grok 4 Fast show moderate deception but higher risk scores, suggesting a more “shoot-first” mentality with inconsistent survival.

---

f there’s a specific matchup or metric you’d like to see, let me know and I will add it to the website. In the future, I’m planning to let users upload their own prompts and compete against others. If that sounds interesting, I’d love to hear your thoughts or ideas.

The original vi is a product of its time (and its time has passed)

https://utcc.utoronto.ca/~cks/space/blog/unix/ViIsAProductOfItsTime
1•ingve•4m ago•0 comments

Circumstantial Complexity, LLMs and Large Scale Architecture

https://www.datagubbe.se/aiarch/
1•ingve•11m ago•0 comments

Tech Bro Saga: big tech critique essay series

1•dikobraz•14m ago•0 comments

Show HN: A calculus course with an AI tutor watching the lectures with you

https://calculus.academa.ai/
1•apoogdk•18m ago•0 comments

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

https://github.com/Kristian5013/flow-protocol
1•kristianXXI•23m ago•0 comments

Show HN: SAA – A minimal shell-as-chat agent using only Bash

https://github.com/moravy-mochi/saa
1•mrvmochi•23m ago•0 comments

Mario Tchou

https://en.wikipedia.org/wiki/Mario_Tchou
1•simonebrunozzi•24m ago•0 comments

Does Anyone Even Know What's Happening in Zim?

https://mayberay.bearblog.dev/does-anyone-even-know-whats-happening-in-zim-right-now/
1•mugamuga•25m ago•0 comments

The last Morse code maritime radio station in North America [video]

https://www.youtube.com/watch?v=GzN-D0yIkGQ
1•austinallegro•27m ago•0 comments

Show HN: Hacker Newspaper – Yet another HN front end optimized for mobile

https://hackernews.paperd.ink/
1•robertlangdon•28m ago•0 comments

OpenClaw Is Changing My Life

https://reorx.com/blog/openclaw-is-changing-my-life/
2•novoreorx•36m ago•0 comments

Everything you need to know about lasers in one photo

https://commons.wikimedia.org/wiki/File:Commercial_laser_lines.svg
2•mahirsaid•38m ago•0 comments

SCOTUS to decide if 1988 video tape privacy law applies to internet uses

https://www.jurist.org/news/2026/01/us-supreme-court-to-decide-if-1988-video-tape-privacy-law-app...
1•voxadam•39m ago•0 comments

Epstein files reveal deeper ties to scientists than previously known

https://www.nature.com/articles/d41586-026-00388-0
3•XzetaU8•47m ago•1 comments

Red teamers arrested conducting a penetration test

https://www.infosecinstitute.com/podcast/red-teamers-arrested-conducting-a-penetration-test/
1•begueradj•54m ago•0 comments

Show HN: Open-source AI powered Kubernetes IDE

https://github.com/agentkube/agentkube
2•saiyampathak•57m ago•0 comments

Show HN: Lucid – Use LLM hallucination to generate verified software specs

https://github.com/gtsbahamas/hallucination-reversing-system
2•tywells•1h ago•0 comments

AI Doesn't Write Every Framework Equally Well

https://x.com/SevenviewSteve/article/2019601506429730976
1•Osiris30•1h ago•0 comments

Aisbf – an intelligent routing proxy for OpenAI compatible clients

https://pypi.org/project/aisbf/
1•nextime•1h ago•1 comments

Let's handle 1M requests per second

https://www.youtube.com/watch?v=W4EwfEU8CGA
1•4pkjai•1h ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
1•zhizhenchi•1h ago•0 comments

Goal: Ship 1M Lines of Code Daily

2•feastingonslop•1h ago•0 comments

Show HN: Codex-mem, 90% fewer tokens for Codex

https://github.com/StartripAI/codex-mem
1•alfredray•1h ago•0 comments

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

https://github.com/pnrajan/fastlangml
1•sachuin23•1h ago•1 comments

LineageOS 23.2

https://lineageos.org/Changelog-31/
2•pentagrama•1h ago•0 comments

Crypto Deposit Frauds

2•wwdesouza•1h ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
5•lostlogin•1h ago•0 comments

Framing an LLM as a safety researcher changes its language, not its judgement

https://lab.fukami.eu/LLMAAJ
1•dogacel•1h ago•0 comments

Are there anyone interested about a creator economy startup

1•Nejana•1h ago•0 comments

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

https://github.com/8ddieHu0314/Skill-Lab
1•qu4rk5314•1h ago•0 comments