frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Hungary's oldest library is fighting to save books from a beetle infestation

https://www.npr.org/2025/07/14/nx-s1-5467062/hungary-library-books-beetles
50•smollett•3d ago•2 comments

Make Your Own Backup System – Part 1: Strategy Before Scripts

https://it-notes.dragas.net/2025/07/18/make-your-own-backup-system-part-1-strategy-before-scripts/
191•Bogdanp•8h ago•65 comments

I tried Vibe coding in BASIC and it didn't go well

https://www.goto10retro.com/p/vibe-coding-in-basic
37•ibobev•3d ago•23 comments

Local LLMs versus offline Wikipedia

https://evanhahn.com/local-llms-versus-offline-wikipedia/
186•EvanHahn•11h ago•96 comments

Nobody knows how to build with AI yet

https://worksonmymachine.substack.com/p/nobody-knows-how-to-build-with-ai
258•Stwerner•12h ago•213 comments

Mushroom learns to crawl after being given robot body (2024)

https://www.the-independent.com/tech/robot-mushroom-biohybrid-robotics-cornell-b2610411.html
79•Anon84•2d ago•13 comments

OpenAI claims gold-medal performance at IMO 2025

https://twitter.com/alexwei_/status/1946477742855532918
428•Davidzheng•18h ago•639 comments

Ring introducing new feature to allow police to live-stream access to cameras

https://www.eff.org/deeplinks/2025/07/amazon-ring-cashes-techno-authoritarianism-and-mass-surveillance
181•xoa•5h ago•84 comments

"Bypassing" Specialization in Rust or How I Learned to Stop Worrying and Love F

https://oakchris1955.eu/posts/bypassing_specialization/
11•todsacerdoti•2d ago•0 comments

Death by AI

https://davebarry.substack.com/p/death-by-ai
200•ano-ther•13h ago•63 comments

Rethinking CLI interfaces for AI

https://www.notcheckmark.com/2025/07/rethinking-cli-interfaces-for-ai/
142•Bogdanp•11h ago•67 comments

Babies made using three people's DNA are born free of mitochondrial disease

https://www.bbc.com/news/articles/cn8179z199vo
265•1659447091•3d ago•155 comments

What Were the Earliest Laws Like?

https://worldhistory.substack.com/p/what-were-the-earliest-laws-really
45•crescit_eundo•4d ago•10 comments

Erythritol linked to brain cell damage and stroke risk

https://www.sciencedaily.com/releases/2025/07/250718035156.htm
30•OutOfHere•2h ago•14 comments

Matterport walkthrough of the original Microsoft Building 3

https://my.matterport.com/show/?m=SZSV6vjcf4L
5•uticus•3d ago•0 comments

The curious case of the Unix workstation layout

https://thejpster.org.uk/blog/blog-2025-07-19/
76•ingve•11h ago•23 comments

The borrowchecker is what I like the least about Rust

https://viralinstruction.com/posts/borrowchecker/
166•jakobnissen•8h ago•223 comments

TSMC to start building four new plants with 1.4nm technology

https://www.taipeitimes.com/News/front/archives/2025/07/20/2003840583
150•giuliomagnifico•8h ago•102 comments

Zig Interface Revisited

https://williamw520.github.io/2025/07/13/zig-interface-revisited.html
86•ww520•3d ago•21 comments

Intel to boost gross margins – new products must deliver 50% gross profit

https://www.tomshardware.com/tech-industry/semiconductors/intel-draws-a-line-in-the-sand-to-boost-gross-margins-new-products-must-deliver-50-percent-to-get-the-green-light
43•walterbell•3h ago•28 comments

Trigon: Exploiting coprocessors for fun and for profit (part 2)

https://alfiecg.uk/2025/07/16/Trigon.html
31•Bogdanp•7h ago•1 comments

What the Fuck Python

https://colab.research.google.com/github/satwikkansal/wtfpython/blob/master/irrelevant/wtf.ipynb
137•sundarurfriend•9h ago•141 comments

Pimping My Casio: Part Deux

https://blog.jgc.org/2025/07/pimping-my-casio-part-deux.html
170•r4um•20h ago•53 comments

Show HN: Am-I-vibing, detect agentic coding environments

https://github.com/ascorbic/am-i-vibing
53•ascorbic•12h ago•24 comments

How we tracked down a Go 1.24 memory regression

https://www.datadoghq.com/blog/engineering/go-memory-regression/
130•gandem•2d ago•8 comments

Fstrings.wtf

https://fstrings.wtf/
386•darkamaul•17h ago•121 comments

Hyatt Hotels are using algorithmic Rest “smoking detectors”

https://twitter.com/_ZachGriff/status/1945959030851035223
759•RebeccaTheDev•1d ago•443 comments

Show HN: Display Photos on a World Map

https://worldsnap.surge.sh/
28•stagas•3d ago•2 comments

Bill Banning One-Person Train Operation Would Lock NY Transit in the Past

https://www.etany.org/statements/impeding-progress-costing-riders-opto
70•Ericson2314•2h ago•91 comments

N78 band 5G NR recordings

https://destevez.net/2025/07/n78-band-5g-nr-recordings/
72•Nokinside•2d ago•3 comments
Open in hackernews

Evaluating publicly available LLMs on IMO 2025

https://matharena.ai/imo/
73•hardmaru•13h ago

Comments

blendergeek•13h ago
Related: https://news.ycombinator.com/item?id=44613840
untitled2•13h ago
Exactly. Whom to believe?
changoplatanero•13h ago
Both are true. One spent $400 in compute and the other one spent a lot more.
masterjack•13h ago
Exactly. And presumably had a more sophisticated harness around the model, longer reasoning chains, best of N, self judging, etc
JohnKemeny•13h ago
The last time someone claimed a medal in an olympiad like this, turned out they manually translated the problem into Lean and then ran a brute force search algorithm to find a proof. For 60 hours. On a supercomputer.

Meanwhile high schoolers get a piece of paper and 4.5 hours.

wslh•12h ago
Even though chess is now effectively solved against human players, I still remember Kasparov's suspicion that one of Deep Blue's moves had a human touch. It was never proven or disproven, but I trust Kasparov's deep intuition amplified by Kasparov requesting access to Deep Blue’s logs, and IBM refusing to share them in full. For more discussions see [1][2][3].

[1] https://chess.stackexchange.com/questions/9959/did-deep-blue...

[2] https://nautil.us/why-the-chess-computer-deep-blue-played-li...

[3] https://en.chessbase.com/post/deep-blue-s-cheating-move

throwawaymaths•12h ago
kinda wild that an llm cant translate to lean?
kenjackson•13h ago
OpenAI achieved Gold on an unreleased model. GPT-5. Read the tweets and they explain what they did.
idiotsecant•13h ago
Actually, I did it a year ago but I just don't want to release my model.
senkora•13h ago
Where should I address the billion dollar check?
emp17344•13h ago
My buddy did it 5 years ago. You wouldn’t know him, he lives in Canada.
souldeux•13h ago
my model goes to a different school
esafak•12h ago
The dog ate mine. And the solution didn't fit in the margin, anyway.
e1g•13h ago
OpenAI explicitly said it’s not GPT-5 but another experimental research model https://x.com/alexwei_/status/1946477756738629827?s=46
kenjackson•12h ago
Thanks. I parsed that wrong. In either case not the same thing Math Arena used.
raincole•13h ago
Note that it's two different things:

This OP claims the publicly available models all failed to get Bronze.

OpenAI tweet claims there is an unreleased model that can get Gold.

dmitrygr•13h ago
My (unreleased) cat did even better than the OpenAI model. No you cannot see. Yes you have to trust me. Now gimme more money.
raincole•12h ago
I don't know the details (of course, it's unreleased), but note that MathArena evaluated "average of 4 attempts", and limited token usages to 64k.

OpenAI likely had unlimited tokens, and evaluated "best of N attempts."

amelius•12h ago
That's a claim that is far less plausible. OpenAI could have thrown more resources at the problem and I would be surprised if that didn't improve the results.
klabb3•12h ago
Wow, that’s incredible. Cats are progressing so fast, especially unreleased cats seem to be doing much better. My two orange kitties aren’t doing well on math problems but obviously that’s because I’m not prompting the right way – any day now. If I ever get it to work, I’ll be sure to share the achievements on X, while carefully avoiding explaining how I did it or provide any data that can corroborate the claims.
sigmoid10•12h ago
I'd also be highly wary of the method they used because of statements like this:

>we note that the vast majority of its answers simply stated the final answer without additional justification

While the reasoning steps are obviously important for judging human participant answers, none of the current big-game providers disclose their actual reasoning tokens. So unless they got direct internal access to these models from the big companies (which seems highly unlikely), this might be yet another failed study designed to (of which we have seen several in recent months, even by serious parties).

bgwalter•12h ago
The model did not fit in the margin.

We'll never know how many GPUs and other assistance (like custom code paths) this model got.

chvid•13h ago
In a few months (weeks, days - maybe it has already happened) models will have much better performance on this test.

Not because of actual increased “intelligence” but because the test would be included in model’s training data - either directly or indirectly where model developers “tune” their model to give better performance on this particular attention driving test.

sorokod•13h ago
From the post: "Evaluation began immediately after the 2025 IMO problems were released to prevent contamination."

Doe this address your concern?

os2warpman•13h ago
What they mean is that in a couple of weeks there are going to be stories titled "LLMS NOW BETTER THAN HUMANS AT 2025 INTERNATIONAL MATH OLYMPIAD" (stories published as thinly-veiled investment solicitations) but in reality they're still shitty-- they've just had the answers fed in to be spit back out.
sorokod•12h ago
Companies would game metrics whenever they have the opportunity. What else is new?
esafak•12h ago
I suppose what's new is that the models aren't as smart as their companies claimed.
chvid•13h ago
Not really.
yunwal•13h ago
Luckily there’s a new set of problems every year
chvid•13h ago
You can really only do a fair reproducible test if the models are static and not sitting behind an api where you have no idea on how they are updated or continuously tweaked.
chvid•13h ago
This particular test is heralded as some sort of breakthrough and the companies in this field are raising billions of dollars from investors and paying their star employees tens of millions.

The economic incentives to tweak, tune, or cheat are through the roof.

WD-42•13h ago
> Gemini 2.5 Pro achieved the highest score with an average of 31% (13 points). While this may seem low, especially considering the $400 spent on generating just 24 answers

What? That’s some serious cash for mostly wrong answers.

john-h-k•12h ago
The time investment a human has to make to get 31% on the IMO is worth far more than $400
WD-42•12h ago
The human still has to put in that time. How would you know what 31% is correct?
wiremine•13h ago
How quickly we shift our expectations. If you told me 5 years ago we'd have technology that can do this, I wouldn't believe you.

This isn't to say we shouldn't think critically about the use and performance of models, but "Not Even Bronze..." turned me off to this critique.

raincole•13h ago
In 2024 AlphaProof got Silver level, so people righteously expect a lot now.

(It's specifically trained on formalized math problems, unlike most LLM, so it's not an apple to apple comparison.)

wat10000•12h ago
LLMs are really good with words and kind of crap at “thinking.” Humans are wired to see these two things as tightly connected. A machine that thinks poorly and talks great is inherently confusing. A lot of discussion and disputes around LLMs comes down to this.

It wasn’t that long ago that the Turing Test was seen as the gold standard of whether a machine was actually intelligent. LLMs blew past that benchmark a year or two ago and people barely noticed. This might be moving the goalposts, but I see it as a realization that thought and language are less inherently connected than we thought.

So yeah, the fact that they even do this well is pretty amazing, but they sound like they should be doing so much better.

thaumasiotes•12h ago
> LLMs are really good with words and kind of crap at “thinking.” Humans are wired to see these two things as tightly connected. A machine that thinks poorly and talks great is inherently confusing. A lot of discussion and disputes around LLMs comes down to this.

It's not an unfamiliar phenomenon in humans. Look at Malcolm Gladwell.

ipsin•13h ago
I was hoping to see the questions (which I can probably find online), but also the answers from models and the judge's scores! Am I missing a link? Without that I can't tell whether I should be impressed or not.
raincole•12h ago
https://matharena.ai/

On their website you can see the full answers LLM gave ("click cells to see...")

gcanyon•13h ago
99.99+% of all problems humans face do not require particularly original solutions. Determining whether LLMs can solve truly original (or at least obscure) problems is interesting, and a problem worth solving, but ignores the vast majority of the (near-term at least) impact they will have.
lottin•12h ago
15 years ago they were predicting that AI would turn everything upside down in 15 years time. It hasn't.
HEmanZ•12h ago
People who say this don’t understand the breakthrough we had in the last couple of years. 15 years ago I was laughing at people predicting AI would turn everything upside down soon. I’m not laughing anymore. I’ve been around long enough to see some AI hype cycles and this time it is different.

15 years ago I, working on AI systems at a FAANG, would have told you “real” AI probably wasn’t coming in my lifetime. 15 years ago the only engineers I knew who thought AI was coming soon were dreamers and Silicon Valley koolaiders. The rest of us saw we needed a step-function break through that may not even exist. But it did, and we got there, a couple of years ago.

Now I’m telling people it’s here. We’ve hit a completely different kind of technology, and it’s so clear to people working in the field. The earthquake has happened and the tsunami is coming.

csa•11h ago
Thank you for sharing your experience. It makes the impact of the recent advances palpable.
wat10000•12h ago
I really doubt a contest for high schoolers contains any truly original problems.
gcanyon•4h ago
"or at least obscure"
Barrin92•12h ago
the value of human beings isn't in their capacity to do routine tasks but to respond with some common sense to all the critical issues in the 2% at the tail.

This is why original problems are important, it's a measure of how sensible something is in an open-ended environment, and here they're completely useless, not just because they fail but how they fail. The fact that these LLMS according to the article "invent non-existent math theorems", i.e. gibberish instead of even being able to know what they don't know, is an indication of how limited this still is.

wavemode•10h ago
To be frank, I take precisely the opposite view. Most people solve novel problems every day, mostly without thinking much about it. Our inability to perceive the immense complexity of the things we do every day is merely due to familiarity. In other words we're blind to the details because our brain handles them automatically, not because they don't exist.

Software engineers understand this better than most - describing a task in general terms, and doing it yourself, can be incredibly easy, even while writing the code to automate the task is difficult or impossible, because of all the devilish details we don't often think about.

gcanyon•4h ago
I work with developers every day. Between us we often give the AI directions like:

   * Write a query to link table X to table Y across this schema, returning all the unique entries related to X.id 1234
   * Write code add an editable comment list to this UI
   * Give me a design to visually manage statuses for this list
   * Look at this UI and give me five ideas for improving it
Some of those work better than others, but none of them are guaranteed failures.
magicalhippo•13h ago
One interesting takeaway for me, a non-practitioner, was that the models appears to be fairy decent at judging their own output.

They used best-of-32 and used the same model to judge a "tournament" to find the best answer. Seems like something that could be boltet on reasonably easy, eg in say WebUI.

edit: forgot to add that I'm curious if this translates to smaller models as well, or if it requires these huge models.

ysofunny•13h ago
this makes me really wonder about what is the underlying practical mathematical skill?

intuition????

samat•12h ago
plus a little of skills
wrsh07•13h ago
> Each model was run with the recommended hyperparameters and a maximum token limit of 64,000. No models needs more than this number of tokens

I'm a little confused by this. My assumptions (possibly incorrect!): 64k tokens per prompt, they are claiming the model wouldn't need more tokens even for reasoning

Is that right? Would be helpful to see how many tokens the models actually used.

throwawaymaths•12h ago
they didn't even do a (non-ml) agentic descent? like have a quicky api that requeries itself generating new context?

"ok here is my strategy here are the five steps", then requery with a strategy or proof of step 1, 2, 3...

in a dfs

akomtu•13h ago
Easy benchmark that's hard to fake: data compression. Intelligence is largely about creating compact predictive models and so is data compression. The output should be a program generating the sequence or the dataset, based on entry id or nearby data points. Typical LLM bullshit won't work here because the output isn't English prose that can fool a human.
esjeon•12h ago
> For Problem 5, models often identified the correct strategies but failed to prove them, which is, ironically, the easier part for an IMO participant. This contrast ... suggests that models could improve significantly in the near future if these relatively minor logical issues are addressed.

Interesting but I'm not sure if this is really due to "minor logical issues". This sounds like a failure due to the lack of the actual understanding (the world model problem). Perhaps the actual answers from AIs might have some hints, but I can't find them.

(EDIT: ooops, found the output on the main page of their website. Didn't expect that.)

> Best-of-n is Important ... the models are surprisingly effective at identifying the relative quality of their own outputs during the best-of-n selection process and are able to look past coherence to check for accuracy.

Yes, it's always easier to be a backseat driver.

Lerc•5h ago
>Yes, it's always easier to be a backseat driver

Any model that can identify the correct answer reliably can arrive at the correct answer given enough time and stochasticity.

the8472•4h ago
NP
daedrdev•12h ago
Here are the IMO problems if you want to give them a try:

https://www.imo-official.org/year_info.aspx?year=2025 (download page)

They are very difficult.

strangescript•12h ago
"You know that really hard test thing that most humans on the planet can't do, or even understand, yeah, LLMs kind of suck at it too"

Meanwhile Noam "well aschtually..."

I love how people are still betting against AI, its hilarious. Please write more 2000-esk "The internet is a fad" articles

boringg•12h ago
Its quite reasonable. We have yet to meet anything more intelligent than humans so why do we think we can create something more intelligent than us when we don't fully understand the complexities how we work?

AI still has a long way to go, though it has proven to be a useful tool at this point.

strangescript•12h ago
who said anything about creating something more intelligent than us, these articles have the air of "why are we wasting our time on this stuff", people like gary marcus link them, meanwhile they get better week over week
AndrewKemendo•12h ago
Can someone tell me where your average every day human that’s walking around and has a regular job and kids and a mortgage would land on this leaderboard? That’s who we should be comparing against.

The fact that the only formal comparisons for AI systems that are ever done are explicitly based on the highest performing narrowly focused humans, tells me how unprepared society is for what’s happening.

Appreciate that: at the point in which there is unambiguous demonstration of superhuman level performance across all human tasks by a machine, (and make no mistake, that *is the bar that this blog post and every other post about AI sets*) it’s completely over for the human race; unless someone figures out an entirely new economic system.

raincole•12h ago
> average every day human

Average math major can't get Brozne.

pphysch•12h ago
Machines have always had superhuman capabilities in narrow domains. The LLM domain is quite broad but it's still just a LLM, beholden to its training.

The average everyday human does not have the time to read all available math texts. LLMs do, but they still can't get bronze. What does that say about them?

zdragnar•12h ago
The average person is bad at literally almost everything.

If I want something done, I'll seek out someone with a skill set that matches the problem.

I don't want AI to be as good as an average person. I want AI to be better than the person I would go to for help. A person can talk with me, understand where I've misunderstood my own problem, can point out faulty assumptions, and may even tell me that the problem isn't even a problem that needs solving. A person can suggest a variety of options and let me decide what trade-offs I want to make.

If I don't trust the AI to do that, then I'm not sure why I'd use it for anything other than things that don't need to be done at all, unless I can justify the chance that maybe it'll be done right, and I can afford the time lost getting it done right without the AI afterwards.

SirFatty•12h ago
"The average person is bad at literally almost everything."

Wow... that's quite a generalization. And not my experience at all.

Retric•12h ago
The average person can’t play 99% of all musical instruments, speak 99% of all languages, do 99% of surgeries, recite 99% of all poems from memory etc.

We don’t ask the average person to do most things, either finding a specialist or providing training beforehand.

krapp•12h ago
One cannot be bad at the things one doesn't even do. None of this demonstrates that humans are bad at "literally almost everything."
Retric•12h ago
> One cannot be bad at the things one doesn't even do.

??? If you don’t know how to do something you’re really bad at it. I’m not sure what that sentence is even trying to convey.

krapp•12h ago
> Obviously you could train someone to recite the The Raven from memory, but they can’t do it now.

That doesn't make them bad at reciting The Raven from memory. Being trained to recite The Raven from memory and still being unable to do so would be a proper application of the term. There is an obvious difference between the two states of being and conflating them is specious.

If you want to take seriously the premise that humans are bad at almost everything because most humans haven't been trained at doing almost everything humans can do, then you must apply the same rubric to LLMs, which are only capable of expressions within their specific dataset (and thus not the entire corpus of data on which they haven't been trained) and even then which tend to confabulate far more frequently than human beings at even simple tasks.

edit: never mind, I guess you aren't willing to take this conversation on good faith.

mysterydip•11h ago
Didn't this start with "Can someone tell me where your average every day human that’s walking around and has a regular job and kids and a mortgage would land on this leaderboard? That’s who we should be comparing against."

And the average person would do poorly. Not because they couldn't be trained to do it, but because they haven't.

krapp•11h ago
It's obvious that the average person would do bad at the International Math Olympiad. Although I don't know why the qualifiers of "regular job and kids and a mortgage" are necessary, except as a weird classist signifier. I strongly suspect most people on HN, who consider themselves set apart from the average, with some also having a regular job, kids and a mortgage, would also not do well at the International Math Olympiad.

But that isn't the claim I'm objecting to. The claim I'm objecting to is "The average person is bad at literally almost everything," which is not an equivalent claim to "people who aren't trained at math would be bad at math at a competitive level," because it implicitly includes everything that a person is trained in and is expected to be qualified to do.

It was just bad, cynical hyperbole. And it's weird that people are defending it so aggressively.

rahimnathwani•12h ago
It's obvious that 'bad at' in this context means 'incapable of doing well'.

Nitpicking language doesn't help to move the conversation. One thing most humans are good at is understanding meaning even when the speaker wasn't absolutely precise.

gundmc•12h ago
You and the parent poster seem to be conflating the ideas of:

- Does not have the requisite skills and experiences to do X successfully

- Inherently does not have the capacity to do X

I think the former is a reasonable standard to apply in this context. I'd definitely say I would be bad if I tried to play the guitar, but I'm not inherently incapable of doing it. It's just not very useful to say "I could be good at it if I put 1000 hours of practice in."

zdragnar•11h ago
That's why there's the qualifier of "average person". If one learns to play the guitar well, they are no longer the average person in the context of guitar playing.
rahimnathwani•12h ago
More than 50% of people cannot write a 'hello world' program in any programming language.

More than 50% of people employed as software engineers cannot read an academic paper in a field like education, and explain whether the conclusions are sound, based on the experiment description and included data.

More than 50% of people cannot interpret an X-ray.

csa•11h ago
> More than 50% of people employed as software engineers cannot read an academic paper in a field like education, and explain whether the conclusions are sound, based on the experiment description and included data.

I know this was meant as a dig, but I’m actually guessing that software engineers score higher on this task than non-engineers who hold M.Ed. degrees.

rahimnathwani•11h ago
Agreed! Probably 3% of software could do it, vs 1% for M.Ed holders.

The only reason I chose software engineers is because I was trying to show that people who can write 'hello world' programs (first example) are not good at all intellectual tasks.

AndrewKemendo•8h ago
Which proves my point precisely that unless you’re superhuman in this definition, you’re obsolete.

Nothing new really, but there’s no where left to go for human labor and even that concept is being jeered at as a fantasy despite this attitude.

zdragnar•2h ago
I really don't think it does, because we disagree on what the upper bound of an LLM is capable of reasoning about.

An average human may not be suitable for a given task, but a person with specialized skills will be. More than that, I believe they will continue to outperform LLMs on solving unbounded problems- i.e. those problems without an obvious, algorithmic solution.

Anything that requires brute force computation can be done by an LLM more quickly, assuming you have humans you trust to validate the output, but that's about the extent of what I'm expecting them to achieve.

baobabKoodaa•12h ago
Average human would score exactly 0 at IMO.
bgwalter•12h ago
Average humans, no. Mathematicians with enough time and a well indexed database of millions of similar problems, probably.

We don't allow chess players to access a Syzygy tablebase in a tournament.

pragmatic•10h ago
That’s not how modern societies/economies work.

We have specialists everywhere.

AndrewKemendo•10h ago
My literal last sentence addresses this
bgwalter•12h ago
So the gold medal claims in https://news.ycombinator.com/item?id=44613840 look exaggerated.

The whole competition is unfair anyway. An "AI" has access to millions of similar problems stolen and encoded in the model. Humans would at least need access to a similar database; think open database exam, a nuclear version of open book exam.