frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

We can't measure LLM reasoning because LLMs don't inhabit a world

2•kimounbo•1h ago
I’ve been frustrated by how hard it is to even define or measure “reasoning” in current LLMs.

This post argues that the issue is structural rather than cognitive: LLMs don’t inhabit a world where statements persist, bind future behavior, or incur consequences.

I show a minimal, reproducible demo that anyone can run in a commercial LLM session. Same model, same questions — the only difference is a single “world” declaration added at the start.

With that minimal constraint, observable behavior changes immediately: - less position drift - fewer automatic reversals - more conservative judgments - refusal to exit the defined world

This does NOT claim that LLMs think, reason, or approach AGI. It only shows that without a world, reasoning-like properties are not even measurable.

Full write-up (with public session transcripts): https://medium.com/@kimounbo38/llms-dont-lack-reasoning-they-lack-a-world-0daf06fcdaeb?postPublishedType=initial

Resizable arrays in optimal time and space [pdf]

https://cs.uwaterloo.ca/~imunro/cs840/ResizableArrays.pdf
1•fanf2•1m ago•0 comments

Agentry: An intelligent orchestration platform for dynamic AI agent workflows

https://github.com/amtp-protocol/agentry
1•wang_cong•2m ago•0 comments

A universal law could explain how large trades change stock prices

https://phys.org/news/2025-12-universal-law-large-stock-prices.html
1•wjSgoWPm5bWAhXB•4m ago•0 comments

The Age of 10xy Opportunity

https://gonzo.engineer/posts/10xy/
1•Dowwie•4m ago•0 comments

Building a Code Review system that uses prod data to predict bugs

https://blog.sentry.io/building-a-code-review-system-that-uses-prod-data-to-predict-bugs/
1•jshchnz•5m ago•0 comments

Business Learnings in 2025?

1•rjmtax•7m ago•0 comments

Naughty Dog Studio Orders Employee Overtime for 'Intergalactic'

https://www.bloomberg.com/news/articles/2025-12-18/sony-s-naughty-dog-studio-orders-employee-over...
5•HelloUsername•14m ago•1 comments

A TS library for connecting videos in your Mux account to multi-modal LLMs

https://github.com/muxinc/ai
1•tilt•17m ago•0 comments

Plaintext Casa First Release

https://github.com/nkoehring/plaintext.casa/releases/tag/v0.3
1•koehr•17m ago•1 comments

The Art of Vibe Design

https://www.ivan.codes/blog/the-art-of-vibe-design
1•dohguy•18m ago•0 comments

Starlink 35956 suffered a failure with venting of the propulsion tank

https://bsky.app/profile/planet4589.bsky.social/post/3mac4a3owxs2c
3•perihelions•18m ago•0 comments

CVSS 10.0 HPE OneView RCE bug identified

https://www.scworld.com/news/10-0-hpe-oneview-rce-bug-identified-patch-now
2•Bender•18m ago•0 comments

Wyoming Blasted by 123 MPH Winds on Wednesday and More Wind to Come

https://cowboystatedaily.com/2025/12/17/wyoming-blasted-by-123-mph-winds-and-fierce-mountain-snow...
1•Bender•19m ago•0 comments

Token-Count-Based Batching: Faster, Cheaper Embedding Inference for Queries

https://www.mongodb.com/company/blog/engineering/token-count-based-batching-faster-cheaper-embedd...
1•fzliu•20m ago•0 comments

New X-ray images show interstellar comet as it makes closest approach to Earth

https://www.cnn.com/2025/12/18/science/interstellar-comet-3i-atlas-xray-earth
2•Bender•20m ago•0 comments

Bill Gates and Sergey Brin Among Newly Released Epstein Photos

https://www.ft.com/content/96d65675-f4c2-4b70-aede-2e77a8648fe8
4•aanet•20m ago•0 comments

Trump media group agrees $6B merger with Google-backed fusion energy company

https://www.ft.com/content/1e1978d5-535b-4241-872f-38db778df694
3•perihelions•21m ago•0 comments

A Starlink Satellite Exploded

https://twitter.com/Starlink/status/2001691802911289712
6•wmf•21m ago•0 comments

LionsOS Design, Implementation and Performance

https://arxiv.org/abs/2501.06234
2•indolering•21m ago•0 comments

Mitsubishi Electric Technology Detects Intoxication During Driving

https://us.mitsubishielectric.com/en/pr/global/2025/1216/
2•geox•22m ago•0 comments

LLMs' impact on science: Booming publications, stagnating quality

https://arstechnica.com/science/2025/12/llms-impact-on-science-booming-publications-stagnating-qu...
3•pseudolus•24m ago•0 comments

GIJN's Top Investigative Tools of 2025

https://gijn.org/stories/gijn-top-investigative-tools-2025/
2•runningmike•25m ago•1 comments

BoltCache: A High-Performance Redis Alternative Built in Go

https://github.com/wutlu/boltcache
1•spotlayn•25m ago•0 comments

2005 Elon Musk Sounded Like Satoshi Nakamoto

https://old.reddit.com/r/conspiracy/comments/1pp2is1/2005_elon_musk_sounded_like_satoshi_nakamoto/
1•tokenmemory•26m ago•1 comments

Two Kinds of Vibe Coding

https://davidbau.com/archives/2025/12/16/vibe_coding.html
5•jxmorris12•27m ago•0 comments

Control Panel for Twitter

https://soitis.dev/control-panel-for-twitter
1•xnx•27m ago•1 comments

Model hallucinations aren't random. They have geometric structure

https://arxiv.org/abs/2512.13771
2•devy•30m ago•0 comments

Analytical dashboards and AI chat: local dev to prod (Vercel and Boreal)

https://www.fiveonefour.com/blog/chat-analytical-dashboards-guide
1•oatsandsugar•33m ago•0 comments

Most Top-Achieving Adults Werent Elite Specialists in Childhood, New Study Finds

https://www.wsj.com/science/elite-high-performance-adults-children-sports-study-ae8d6bed
4•achristmascarl•34m ago•0 comments

FAA Warns of Military Aircraft Flying Undetected in Caribbean

https://www.bloomberg.com/news/articles/2025-12-18/faa-warns-of-military-aircraft-flying-undetect...
3•toomuchtodo•35m ago•1 comments