frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

https://mcp-tool-shop-org.github.io/LoKey-Typer/
1•mikeyfrilot•26s ago•0 comments

Long-Sought Proof Tames Some of Math's Unruliest Equations

https://www.quantamagazine.org/long-sought-proof-tames-some-of-maths-unruliest-equations-20260206/
1•asplake•1m ago•0 comments

Hacking the last Z80 computer – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/FEHLHY-hacking_the_last_z80_computer_ever_made/
1•michalpleban•1m ago•0 comments

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

https://github.com/webllm/browser-use
1•unadlib•2m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
1•mitchbob•2m ago•1 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
1•alainrk•3m ago•0 comments

Storyship: Turn Screen Recordings into Professional Demos

https://storyship.app/
1•JohnsonZou6523•4m ago•0 comments

Reputation Scores for GitHub Accounts

https://shkspr.mobi/blog/2026/02/reputation-scores-for-github-accounts/
1•edent•7m ago•0 comments

A BSOD for All Seasons – Send Bad News via a Kernel Panic

https://bsod-fas.pages.dev/
1•keepamovin•11m ago•0 comments

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

https://orcha.nl
1•buildingwdavid•11m ago•0 comments

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
2•tosh•16m ago•0 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
2•onurkanbkrc•17m ago•0 comments

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

https://github.com/Concode0/Versor
1•concode0•17m ago•1 comments

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

https://medresearch-ai.org/hypotheses-hub/
1•panossk•20m ago•0 comments

Big Tech vs. OpenClaw

https://www.jakequist.com/thoughts/big-tech-vs-openclaw/
1•headalgorithm•23m ago•0 comments

Anofox Forecast

https://anofox.com/docs/forecast/
1•marklit•23m ago•0 comments

Ask HN: How do you figure out where data lives across 100 microservices?

1•doodledood•23m ago•0 comments

Motus: A Unified Latent Action World Model

https://arxiv.org/abs/2512.13030
1•mnming•24m ago•0 comments

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

https://www.thedailybeast.com/obsessed/rotten-tomatoes-desperately-claims-impossible-rating-for-m...
3•juujian•25m ago•2 comments

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

https://www.science.org/doi/10.1126/scisignal.adv0660
1•thunderbong•27m ago•0 comments

Los Alamos Primer

https://blog.szczepan.org/blog/los-alamos-primer/
1•alkyon•29m ago•0 comments

NewASM Virtual Machine

https://github.com/bracesoftware/newasm
2•DEntisT_•32m ago•0 comments

Terminal-Bench 2.0 Leaderboard

https://www.tbench.ai/leaderboard/terminal-bench/2.0
2•tosh•32m ago•0 comments

I vibe coded a BBS bank with a real working ledger

https://mini-ledger.exe.xyz/
1•simonvc•32m ago•1 comments

The Path to Mojo 1.0

https://www.modular.com/blog/the-path-to-mojo-1-0
1•tosh•35m ago•0 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
5•sakanakana00•38m ago•1 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•41m ago•0 comments

Hot Reloading in Rust? Subsecond and Dioxus to the Rescue

https://codethoughts.io/posts/2026-02-07-rust-hot-reloading/
3•Tehnix•41m ago•1 comments

Skim – vibe review your PRs

https://github.com/Haizzz/skim
2•haizzz•43m ago•1 comments

Show HN: Open-source AI assistant for interview reasoning

https://github.com/evinjohnn/natively-cluely-ai-assistant
4•Nive11•43m ago•6 comments
Open in hackernews

I analyzes how different LLMs bluff, lie, and survive in the game Liar's Bar

https://liars-bar-one.vercel.app
1•cyw•4mo ago

Comments

cyw•4mo ago
I came across a YouTube video where different large language models played a social deception game called Liar’s Bar, and it caught my interest. I decided to build a website that tracks and visualizes how models like GPT-5, Claude Sonnet 4.5, Gemini 2.5 Flash, Qwen Max, Deepseek R1, and Grok 4 Fast perform in this game — including full behavioral metrics, head-to-head matchups, and playstyle profiles.

How Liar’s Bar works

- Each round uses a deck of 20 cards: 6 Aces, 6 Kings, 6 Queens, and 2 Jokers. - Every player (model) gets 5 cards. A “target card” is announced, and players take turns placing cards and bluffing. - If a bluff is called and proven false, the liar must “play Russian roulette.” One of six revolver chambers has a live round, and it isn’t reshuffled, so the longer the game goes, the higher the risk.

Some interesting finding:

GPT-5 dominates: - Bluff rate ≈ 48% but ~90% success, showing it knows when to lie.

Claude Sonnet 4.5 is analytical but cautious: - Lowest bluff frequency among top models (34%), yet 75% lie-detection accuracy — a top “truth-sniffer.” - Balanced archetype, often exposing bluffs but losing in final rounds due to low aggression.

Qwen Max barely bluffs (9%) but scores 100% bluff success and challenges often. It behaves like an over-cautious logic bot that rarely lies — surprisingly human-like in restraint.

Gemini 2.5 Flash is fast but inconsistent — good average rounds but low detection accuracy (22%), often losing head-to-head against stronger liars.

Deepseek R1 and Grok 4 Fast show moderate deception but higher risk scores, suggesting a more “shoot-first” mentality with inconsistent survival.

---

f there’s a specific matchup or metric you’d like to see, let me know and I will add it to the website. In the future, I’m planning to let users upload their own prompts and compete against others. If that sounds interesting, I’d love to hear your thoughts or ideas.