news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Can LLMs reason about math? The Subtraction Trick Test

https://haversine.substack.com/p/can-llms-reason-about-math-the-subtraction

4•MakeAJiraTicket•1h ago

Comments

mapontosevenths•1h ago

This is really clever, but I think Gemini got it on the first try. I kept my prompts close to yours, but didn't include the initial framing bit about how it was supposed to be an expert.

https://gemini.google.com/share/b66e0158ee29

MakeAJiraTicket•1h ago

Thank you! Gemini has consistently been the best performer that I've tried, but they always require the connection to be made explicit. The point of the test is that it is very low complexity and is very targeted toward what can be considered reasoning and these models can't produce the connection without prodding.

In the ideal case of reasoning you would simply present the methods and they'd bridge the gap independently when it is brought to the forefront of their context together, but it doesn't happen.

mapontosevenths•40m ago

ChatGPT got it with less prodding, but I had to set it to "Pro" thinking mode (ChatGPT's version of Deep Think, I suspect). I'm sure Deep Think could get it with even less prompting.

I think your conclusion that they aren't really thinking doesn't hold. They're already there, it just costs more and time to get good results.

https://chatgpt.com/share/69a12666-64b0-8009-8dfe-59546ac400...

EDIT - Updated the link to include the full conversation. Note that I didn't change it to pro mode until the end, and eventually got tired of waiting and just told it "answer now."

Show HN: Nano Banana 2 – Sub-second AI image gen via Gemini 3.1 Flash

https://nano-banana2.me/

1•naxtsass•58s ago•0 comments

Show HN: Conduit – Automatic Port Forwarding for Docker Containers

https://github.com/Oranda-IO/Conduit

1•orandaio•1m ago•0 comments

RFC 9925: Unsigned X.509 Certificates

https://datatracker.ietf.org/doc/rfc9925/

1•raquuk•4m ago•0 comments

I used Claude AI to build this website that shows upcoming indie game festivals

https://festival-watch.vercel.app/

1•rotub•4m ago•1 comments

Chivalry Test

https://chivalryscore.com

1•onSmallMessage•4m ago•1 comments

We found 118 performance bugs across 2 PRs written with Claude Code

https://www.codeflash.ai/blog-posts/hidden-cost-of-coding-agents

3•misrasaurabh1•6m ago•1 comments

Vegetarians have 'substantially lower risk' of five types of cancer

https://www.theguardian.com/society/2026/feb/27/vegetarians-have-substantially-lower-risk-of-five...

1•plaguna•6m ago•0 comments

Man jailed after selling £7M of fake plane parts

https://www.bbc.com/news/articles/c78xz5j848vo

1•dataflow•6m ago•0 comments

Pplx-Embed: Embedding Models for Web-Scale Retrieval

https://research.perplexity.ai/articles/pplx-embed-state-of-the-art-embedding-models-for-web-scal...

1•jxmorris12•8m ago•0 comments

CoreWeave slides as surging capex, backlog risks overshadow small revenue beat

https://www.reuters.com/business/coreweave-beats-fourth-quarter-revenue-estimates-2026-02-26/

1•petethomas•14m ago•0 comments

Indian ISPs block Supabase due to a ministry order

https://twitter.com/supabase/status/2027249469545386102

1•alt-glitch•14m ago•0 comments

Google paid startup Form Energy $1B for its 30GWh, 100-hour battery

https://techcrunch.com/2026/02/26/google-paid-startup-form-energy-1b-for-its-massive-100-hour-bat...

1•epistasis•19m ago•0 comments

I stopped writing code. I only review AI-generated PRs now

https://alec.is/posts/how-i-went-from-code-reviewer-to-code-reviewer/

1•arm32•20m ago•0 comments

'Really Simple Licensing' (RSL) – Open Licensing Standard for AI Crawlers

https://en.wikipedia.org/wiki/Really_Simple_Licensing

1•evolve2k•38m ago•1 comments

The AI Transformation Framework

https://zapier.com/playbooks/ai-transformation-framework

1•swolpers•40m ago•0 comments

Shifting Security Left for AI Agents with GitGuardian MCP

https://blog.gitguardian.com/shifting-security-left-for-ai-agents-enforcing-ai-generated-code-sec...

1•umairnadeem123•40m ago•0 comments

Feature-Sliced Design

https://feature-sliced.design/

1•saikatsg•41m ago•0 comments

High Speed Rail by Country 2026

https://worldpopulationreview.com/country-rankings/high-speed-rail-by-country

2•thunderbong•43m ago•0 comments

Trend Is Concerning

https://techcrunch.com/2026/02/26/jack-dorsey-block-layoffs-4000-halved-employees-your-company-is...

1•melvinodsa•44m ago•0 comments

Show HN: I built a Chrome extension to record demo videos without editing

https://zoomflow.rovelin.com/

1•hritik7742•50m ago•0 comments

Judge says he will order Greenpeace to pay $345M in oil pipeline case

https://apnews.com/article/greenpeace-energy-transfer-dakota-access-pipeline-30bfb9939dea06f1e976...

4•e2e4•57m ago•1 comments

Model Collapse Ends AI Hype

https://www.youtube.com/watch?v=ShusuVq32hc

3•signa11•59m ago•0 comments

Research suggests mating direction bias between Neanderthals and humans

https://www.theguardian.com/science/2026/feb/26/male-neanderthals-human-females-mating-research-d...

3•uxhacker•59m ago•1 comments

Seeing Is Not Believing: Benchmarking AI Image Detectors

https://blog.succinct.xyz/ai-image-detection-benchmark/

1•ncb9094•1h ago•0 comments

Pakistan bombs targets in Afghan cities, minister calls it 'open war'

https://www.reuters.com/world/asia-pacific/pakistan-strikes-afghanistan-targets-clashes-intensify...

3•petethomas•1h ago•0 comments

How Stupid Would It Be to Put Data Centers in Space?

https://spectrum.ieee.org/orbital-data-centers

2•amaks•1h ago•1 comments

Upload 23andMe,myHeritage, Myancestry, get 1,200 GRS score and a longevity prot

1•HelixSequencing•1h ago•0 comments

Dear Time Lords: Freeze Computers in 1993

https://graydon2.dreamwidth.org/322461.html

7•zdw•1h ago•0 comments

Reduce Claude Token Usage by 50%

https://ham-pro.vercel.app/

2•Luseniik•1h ago•1 comments

Making Video Games in 2025 (without an engine)

https://www.noelberry.ca/posts/making_games_in_2025/

2•alvivar•1h ago•0 comments