frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

155M US land parcel boundaries

https://www.kaggle.com/datasets/landrecordsus/us-parcel-layer
1•tjwebbnorfolk•3m ago•0 comments

Private Inference

https://confer.to/blog/2026/01/private-inference/
1•jbegley•7m ago•0 comments

Font Rendering from First Principles

https://mccloskeybr.com/articles/font_rendering.html
1•krapp•10m ago•0 comments

Show HN: Seedance 2.0 AI video generator for creators and ecommerce

https://seedance-2.net
1•dallen97•14m ago•0 comments

Wally: A fun, reliable voice assistant in the shape of a penguin

https://github.com/JLW-7/Wally
1•PaulHoule•15m ago•0 comments

Rewriting Pycparser with the Help of an LLM

https://eli.thegreenplace.net/2026/rewriting-pycparser-with-the-help-of-an-llm/
1•y1n0•17m ago•0 comments

Lobsters Vibecoding Challenge

https://gist.github.com/MostAwesomeDude/bb8cbfd005a33f5dd262d1f20a63a693
1•tolerance•17m ago•0 comments

E-Commerce vs. Social Commerce

https://moondala.one/
1•HamoodBahzar•18m ago•1 comments

Avoiding Modern C++ – Anton Mikhailov [video]

https://www.youtube.com/watch?v=ShSGHb65f3M
2•linkdd•19m ago•0 comments

Show HN: AegisMind–AI system with 12 brain regions modeled on human neuroscience

https://www.aegismind.app
2•aegismind_app•23m ago•1 comments

Zig – Package Management Workflow Enhancements

https://ziglang.org/devlog/2026/#2026-02-06
1•Retro_Dev•24m ago•0 comments

AI-powered text correction for macOS

https://taipo.app/
1•neuling•28m ago•1 comments

AppSecMaster – Learn Application Security with hands on challenges

https://www.appsecmaster.net/en
1•aqeisi•29m ago•1 comments

Fibonacci Number Certificates

https://www.johndcook.com/blog/2026/02/05/fibonacci-certificate/
1•y1n0•30m ago•0 comments

AI Overviews are killing the web search, and there's nothing we can do about it

https://www.neowin.net/editorials/ai-overviews-are-killing-the-web-search-and-theres-nothing-we-c...
3•bundie•35m ago•1 comments

City skylines need an upgrade in the face of climate stress

https://theconversation.com/city-skylines-need-an-upgrade-in-the-face-of-climate-stress-267763
3•gnabgib•36m ago•0 comments

1979: The Model World of Robert Symes [video]

https://www.youtube.com/watch?v=HmDxmxhrGDc
1•xqcgrek2•41m ago•0 comments

Satellites Have a Lot of Room

https://www.johndcook.com/blog/2026/02/02/satellites-have-a-lot-of-room/
2•y1n0•41m ago•0 comments

1980s Farm Crisis

https://en.wikipedia.org/wiki/1980s_farm_crisis
4•calebhwin•42m ago•1 comments

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•47m ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•54m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•1h ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•1h ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
2•rolph•1h ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•1h ago•3 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•1h ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•1h ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•1h ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•1h ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
4•rolph•1h ago•1 comments
Open in hackernews

Show HN: Theory of Mind benchmark for 8 LLMs with reproducible markers

1•AlekseN•5mo ago
I built a formal protocol (FPC v2.1 + AE-1) to detect behavioral uncertainty in large language models. The goal is enabling safer AI deployment in critical domains medicine, autonomous vehicles, government where confident hallucinations can lead to high-stakes failures.

Current benchmarks focus on accuracy but miss reasoning coherence under stress. This protocol uses tri-state affective markers (Satisfied / Engaged / Distressed) to detect when models lose logical consistency, allowing abstention instead of confident hallucination.

We evaluated 8 models (Claude, GPT-4 families). Only Claude Opus reached full ToM-3+. GPT-4 family consistently failed third-order reasoning. Extended temperature tests (Claude 3.5 Haiku, GPT-4o) showed 180/180 stable AE-1 matches (p≈1e-54), independent of sampling temperature.

Dataset: https://huggingface.co/datasets/AIDoctrine/FPC-v2.1-AE1-ToM-...

A demo notebook exists for replication. Looking for feedback on methodology and possible applications in safety critical AI.

Comments

AlekseN•5mo ago
Extended results and safety relevance

Temperature stability tests Claude 3.5 Haiku: 180/180 AE-1 matches at T=0.0, 0.8, 1.3 GPT-4o: 180/180 matches under the same conditions Statistical significance: p ≈ 1×10⁻⁵⁴

Theory of Mind by tier Basic (ToM-1): All models except GPT-3.5 passed Advanced (ToM-2): Claude family + GPT-4o passed Extreme (ToM-3+): Only Claude Opus reached 100%

Key safety point AE-1 markers (Satisfied / Distressed) lined up perfectly with correct vs conflict cases. This means we can detect when a model is in an epistemically unsafe state, often a precursor to confident hallucinations.

In practice this could let systems in critical areas choose to abstain instead of giving a wrong but confident answer.

Protocol details, raw data, and replication code are in the dataset link above. A demo notebook also exists if anyone wants to reproduce directly.

Looking for feedback on: - Does this kind of marker make sense as a unit test for reliability? - How to extend beyond ToM into other reasoning domains? - How would formal verification folks see the proof obligations (consistency, conflict rejection, recovery, etc.)?