news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Will Future Generations Think We're Gross?

https://chillphysicsenjoyer.substack.com/p/will-future-generations-think-were

1•crescit_eundo•28s ago•0 comments

Kernel Key Retention Service

https://www.kernel.org/doc/html/latest/security/keys/core.html

1•networked•33s ago•0 comments

State Department will delete Xitter posts from before Trump returned to office

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x

1•righthand•3m ago•0 comments

Show HN: Verifiable server roundtrip demo for a decision interruption system

https://github.com/veeduzyl-hue/decision-assistant-roundtrip-demo

1•veeduzyl•4m ago•0 comments

Impl Rust – Avro IDL Tool in Rust via Antlr

https://www.youtube.com/watch?v=vmKvw73V394

1•todsacerdoti•4m ago•0 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html

2•vinhnx•5m ago•0 comments

minikeyvalue

https://github.com/commaai/minikeyvalue/tree/prod

3•tosh•10m ago•0 comments

Neomacs: GPU-accelerated Emacs with inline video, WebKit, and terminal via wgpu

https://github.com/eval-exec/neomacs

1•evalexec•14m ago•0 comments

Show HN: Moli P2P – An ephemeral, serverless image gallery (Rust and WebRTC)

https://moli-green.is/

2•ShinyaKoyano•19m ago•1 comments

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b

2•m00dy•20m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b

1•bkls•21m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup

3•okaywriting•27m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html

2•todsacerdoti•30m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse

1•zhenghaoz•31m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR

1•ms7892•32m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark

1•MikeVeerman•33m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/

1•Raiplus•33m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU

1•indiantinker•33m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html

3•pseudolus•34m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller

1•tosh•38m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159

2•bkls•38m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause

1•yindia•39m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent

4•roknovosel•39m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z

2•beardyw•48m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/

1•taubek•48m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en

2•surprisetalk•50m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience

2•surprisetalk•50m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...

2•surprisetalk•50m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...

5•pseudolus•51m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/

1•1vuio0pswjnm7•51m ago•0 comments

Open in hackernews

Evaluating Probabilistic Reasoning in LLMs Through Language-Only Decision Tasks

https://arxiv.org/abs/2510.13878

1•PaulHoule•3mo ago

Comments

adamzwasserman•3mo ago

I see problems:

The paper claims that Qwen3-4B achieved 89.2% best-arm selection by demonstrating superior "probabilistic reasoning". But this is a 2-armed bandit where random guessing should converge to ~50% over 500 runs of 25 iterations each. An 89% rate is suspiciously high and suggests to me that something else is happening (like prompt bias or the model pattern-matching rather than reasoning)

When they increase from 2 to 5 arms, Qwen3-4B drops from 89% to 6.5% accuracy. I assert that if it truly had probabilistic reasoning capability, performance would degrade more gracefully.

The "overthinking" explanation is hand-wavy. I don't see evidence or chain of reasoning. This is just a post-hoc story to explain unexpected results.

No discussion of variance, confidence intervals, or statistical significance. With 500 runs, these should be straightforward to calculate.

Does the claimed 89% accuracy in a binary choice task strike anyone else as implausibly high for what they're claiming?