frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
1•myk-e•1m ago•0 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•2m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
1•1vuio0pswjnm7•4m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
1•1vuio0pswjnm7•5m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•7m ago•0 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•10m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•15m ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•17m ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•20m ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•32m ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•34m ago•0 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•35m ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•48m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•51m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
2•helloplanets•53m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•1h ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•1h ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•1h ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•1h ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
2•basilikum•1h ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•1h ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•1h ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•1h ago•2 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•1h ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•1h ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•1h ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•1h ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•1h ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•1h ago•1 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•1h ago•0 comments
Open in hackernews

AI Capabilities May Be Overhyped on Bogus Benchmarks, Study Finds

https://gizmodo.com/ai-capabilities-may-be-overhyped-on-bogus-benchmarks-study-finds-2000682577
43•Cynddl•3mo ago

Comments

lispisok•3mo ago
There is way too much money being thrown at AI to not game/cheat the benchmarks
vivzkestrel•3mo ago
I am amazed not a single pro AI person on HN has anything to say or even speculate about this. This is such a serious issue
ulfw•3mo ago
Because the pro AI persons are busy trying sell their whatevertheyhave before the bubble bursts
simianwords•3mo ago
This is a very poor article. What I understood is that they take one benchmark (in particular) that tests grade school level math. This benchmark apparently claims to test ability to reason through math problems.

They agree that the benchmarks show that the LLMs can solve such questions and models are getting better. But their main point is that this does not prove that the model is reasoning.

But so what??? It may not reason in the way humans do but it is pretty damn close. The mechanics are the same - recursively generate a prompt that terminates in an answer generating prompt.

They don’t like that this indicates the model “reasons through” the problem. But it’s just semantics at this point. For me and for most others - getting the final answer is important. And it largely accomplishes this task.

I don’t buy that the model couldn’t reason through - have you ever asked a model for its explanation? It does genuinely explain how it got the solution. At this point who the hell cares what “reasoning” means if it

1. Gets me the right answer

2. Reasonably explains how it did it

YeGoblynQueenne•3mo ago
We care whether it's reasoning or not because the alternative is that it's guessing, rather than reasoning, and when guessing is measured on benchmarks that are supposed to measure reasoning, the results are likely to be misleading.

Why do we care if the benchmark results are misleading? The reason we have benchmarks in machine learning is that we can use the results on the benchmarks to predict the performance of a system in uncontrolled conditions, i.e. "in the real world". If the benchmarks don't measure what we think they measure then they can't be used to make that kind of prediction. If that's the case then we really have no idea how good or bad a system really is. Seen another way, if a benchmark is not measuring what we think it measures, all we learn from a system passing the benchmark is that the system passes the benchmark.

Still, what do you care if it gets you the right answer? The question is, exactly, how do you know it's really getting you the right answer? Maybe you can tell when you know the answer, but what about answers you genuinely don't know? And how often does it get you the wrong answer but you don't realise? You can't realistically test an AI system by interacting with it as thoroughly and as rigorously as you can with... a benchmark.

That's why we care about having accurate benchmarks that measure what they're supposed to be measuring.

P.S. Another issue of course is that guessing is limited while reasoning is... less limited. We care about reasoning because we ideally want to have systems that are better than the best guessing machine.

simianwords•3mo ago
“ When researchers tested the same performance on a new set of benchmark questions, they noticed that models experienced “significant performance drops.””

This is very misleading because the generalisation ability of LLMs is very very high. It doesn’t just memorise problems - that’s nonsense.

At high school level maths you genuinely can’t get gpt-5 thinking to make a single mistake. Not possible at all. Unless you give some convoluted ambiguous prompt that no human can understand. If you assume I’m correct, how does gpt memorise then?

In fact even undergraduate level mathematics is quite simple for gpt-5 thinking.

IMO gold was won.. by what? Memorising solutions?

I challenge people to find ONE example that gpt-5 thinking gets wrong in high school or undergrad level maths. I could not achieve it. You must allow all tools though.

callmesnek•3mo ago
"You must allow all tools though"
y0eswddl•3mo ago
"if you don't let chatgpt search the internet or use python code, it doesn't count..."

look at those goalposts go!

simianwords•3mo ago
Ok don’t allow search or python. Can you come up with an example? Probably not.
YeGoblynQueenne•3mo ago
The best performance on GSM8K is currently at 0.973, so less than perfect [1]. Given that GSM8K is a grade school math question data set, and the leading LLMs still don't get all answers correctly it's safe to assume that they won't get all high school questions' answers correctly either, since those are going to be harder than grade school questions. This means there has got to be at least one example that GPT-5 as well as every other LLM fails on [2].

If you don't think that's the case I think it's up to you to show that it's not.

___________________

[1] GSM8K leaderboard: https://llm-stats.com/benchmarks/gsm8k

[2] This is regardless of what GSM8K or any other benchmark is measuring.

simianwords•3mo ago
Sure I didn’t say it was perfect. But questioning the essence of the article.
simianwords•3mo ago
“ In many reasoning-heavy benchmarks, o1 rivals the performance of human experts. Recent frontier models1 do so well on MATH2 and GSM8K that these benchmarks are no longer effective at differentiating models. We evaluated math performance on AIME, an exam designed to challenge the brightest high school math students in America.”

https://openai.com/index/learning-to-reason-with-llms/

The benchmark was so saturated that they didn’t even bother running it on the newer models.

Which is interesting because it shows the rapid progress LLMs are making.

I’m also making a bigger claim - you can’t get gpt-5 thinking to make a mistake in undergraduate level maths. At least it would be comparable in performance to a good student.

geoduck14•3mo ago
>At high school level maths you genuinely can’t get gpt-5 thinking to make a single mistake. Not possible at all.

If you give an LLM an incomplete question, it will guess at an answer. They don't know what they don't know, and they are trained to guess

simianwords•3mo ago
Example?
autop0ietic•3mo ago
I would think GPT5 is great at high school level math but what high school level math problems are not in the training data?

I think the problem is that GPT5 is not "memorising" but conversely that doesn't automatically mean it is "reasoning". These are human attributes that we are trying to equate to machines and it just causes confusion.

simianwords•3mo ago
Make up one yourself and try it?
Khaine•3mo ago
I'm shocked, shocked, that AI is optimised to pass bogus benchmarks.

Just like how GPUs were optimised to pass synthetic benchmarks.