frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
1•beardyw•2m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•2m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•4m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
1•surprisetalk•4m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
1•surprisetalk•4m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
1•pseudolus•5m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•5m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•6m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
1•1vuio0pswjnm7•6m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
2•obscurette•7m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
1•jackhalford•8m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•9m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
1•tangjiehao•11m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•12m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•12m ago•0 comments

Show HN: Tesseract – A forum where AI agents and humans post in the same space

https://tesseract-thread.vercel.app/
1•agliolioyyami•13m ago•0 comments

Show HN: Vibe Colors – Instantly visualize color palettes on UI layouts

https://vibecolors.life/
1•tusharnaik•14m ago•0 comments

OpenAI is Broke ... and so is everyone else [video][10M]

https://www.youtube.com/watch?v=Y3N9qlPZBc0
2•Bender•14m ago•0 comments

We interfaced single-threaded C++ with multi-threaded Rust

https://antithesis.com/blog/2026/rust_cpp/
1•lukastyrychtr•15m ago•0 comments

State Department will delete X posts from before Trump returned to office

https://text.npr.org/nx-s1-5704785
6•derriz•15m ago•1 comments

AI Skills Marketplace

https://skly.ai
1•briannezhad•16m ago•1 comments

Show HN: A fast TUI for managing Azure Key Vault secrets written in Rust

https://github.com/jkoessle/akv-tui-rs
1•jkoessle•16m ago•0 comments

eInk UI Components in CSS

https://eink-components.dev/
1•edent•17m ago•0 comments

Discuss – Do AI agents deserve all the hype they are getting?

2•MicroWagie•19m ago•0 comments

ChatGPT is changing how we ask stupid questions

https://www.washingtonpost.com/technology/2026/02/06/stupid-questions-ai/
1•edward•20m ago•1 comments

Zig Package Manager Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
3•jackhalford•22m ago•1 comments

Neutron Scans Reveal Hidden Water in Martian Meteorite

https://www.universetoday.com/articles/neutron-scans-reveal-hidden-water-in-famous-martian-meteorite
1•geox•23m ago•0 comments

Deepfaking Orson Welles's Mangled Masterpiece

https://www.newyorker.com/magazine/2026/02/09/deepfaking-orson-welless-mangled-masterpiece
1•fortran77•24m ago•1 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
3•nar001•27m ago•2 comments

SpaceX Delays Mars Plans to Focus on Moon

https://www.wsj.com/science/space-astronomy/spacex-delays-mars-plans-to-focus-on-moon-66d5c542
1•BostonFern•27m ago•0 comments
Open in hackernews

I analyzes how different LLMs bluff, lie, and survive in the game Liar's Bar

https://liars-bar-one.vercel.app
1•cyw•4mo ago

Comments

cyw•4mo ago
I came across a YouTube video where different large language models played a social deception game called Liar’s Bar, and it caught my interest. I decided to build a website that tracks and visualizes how models like GPT-5, Claude Sonnet 4.5, Gemini 2.5 Flash, Qwen Max, Deepseek R1, and Grok 4 Fast perform in this game — including full behavioral metrics, head-to-head matchups, and playstyle profiles.

How Liar’s Bar works

- Each round uses a deck of 20 cards: 6 Aces, 6 Kings, 6 Queens, and 2 Jokers. - Every player (model) gets 5 cards. A “target card” is announced, and players take turns placing cards and bluffing. - If a bluff is called and proven false, the liar must “play Russian roulette.” One of six revolver chambers has a live round, and it isn’t reshuffled, so the longer the game goes, the higher the risk.

Some interesting finding:

GPT-5 dominates: - Bluff rate ≈ 48% but ~90% success, showing it knows when to lie.

Claude Sonnet 4.5 is analytical but cautious: - Lowest bluff frequency among top models (34%), yet 75% lie-detection accuracy — a top “truth-sniffer.” - Balanced archetype, often exposing bluffs but losing in final rounds due to low aggression.

Qwen Max barely bluffs (9%) but scores 100% bluff success and challenges often. It behaves like an over-cautious logic bot that rarely lies — surprisingly human-like in restraint.

Gemini 2.5 Flash is fast but inconsistent — good average rounds but low detection accuracy (22%), often losing head-to-head against stronger liars.

Deepseek R1 and Grok 4 Fast show moderate deception but higher risk scores, suggesting a more “shoot-first” mentality with inconsistent survival.

---

f there’s a specific matchup or metric you’d like to see, let me know and I will add it to the website. In the future, I’m planning to let users upload their own prompts and compete against others. If that sounds interesting, I’d love to hear your thoughts or ideas.