I've "solved" many math problems with LLMs, with LLMs giving full confidence in subtly or significantly incorrect solutions.
I'm very curious here. The Open AI memory orders and claims about capacity limits restricting access to better models are interesting too.
"Very nice! ... actually the thing that impresses me more than the proof method is the avoidance of errors, such as making mistakes with interchanges of limits or quantifiers (which is the main pitfall to avoid here). Previous generations of LLMs would almost certainly have fumbled these delicate issues.
...
I am going ahead and placing this result on the wiki as a Section 1 result (perhaps the most unambiguous instance of such, to date)"
The pace of change in math is going to be something to watch closely. Many minor theorems will fall. Next major milestone: Can LLMs generate useful abstractions?
"On following the references, it seems that the result in fact follows (after applying Rogers' theorem) from a 1936 paper of Davenport and Erdos (!), which proves the second result you mention. ... In the meantime, I am moving this problem to Section 2 on the wiki (though the new proof is still rather different from the literature proof)."
We are very close.
(by the way; I like writing code and I still do for fun)
Isn't that a perfectly reasonable metric? The topic has been dominated by hype for at least the past 5 if not 10 years. So when you encounter the latest in a long line of "the future is here the sky is falling" claims, where every past claim to date has been wrong, it's natural to try for yourself, observe a poor result, and report back "nope, just more BS as usual".
If the hyped future does ever arrive then anyone trying for themselves will get a workable result. It will be trivially easy to demonstrate that naysayers are full of shit. That does not currently appear to be the case.
The ability to make money proves you found a good market, it doesn't prove that the new tools are useful to others.
I'm beginning to think the Bitter Lesson applies to organic intelligence as well, because basic pattern matching can be implemented relatively simply using very basic mathematical operations like multiply and accumulate, and so it can scale with massive parallelization of relatively simple building blocks.
But I think the trend line unmistakably points to a future where it can be MORE intelligent than a human in exactly the colloquial way we define "more intelligent"
The fact that one of the greatest mathematicians alive has a page and is seriously bench marking this shows how likely he believes this can happen.
The answer is yes. Assume, for the sake of contradiction, that there exists an \(\epsilon > 0\) such that for every \(k\), there exists a choice of congruence classes \(a_1^{(k)}, \dots, a_k^{(k)}\) for which the set of integers not covered by the first \(k\) congruences has density at least \(\epsilon\).
For each \(k\), let \(F_k\) be the set of all infinite sequences of residues \((a_i)_{i=1}^\infty\) such that the uncovered set from the first \(k\) congruences has density at least \(\epsilon\). Each \(F_k\) is nonempty (by assumption) and closed in the product topology (since it depends only on the first \(k\) coordinates). Moreover, \(F_{k+1} \subseteq F_k\) because adding a congruence can only reduce the uncovered set. By the compactness of the product of finite sets, \(\bigcap_{k \ge 1} F_k\) is nonempty.
Choose an infinite sequence \((a_i) \in \bigcap_{k \ge 1} F_k\). For this sequence, let \(U_k\) be the set of integers not covered by the first \(k\) congruences, and let \(d_k\) be the density of \(U_k\). Then \(d_k \ge \epsilon\) for all \(k\). Since \(U_{k+1} \subseteq U_k\), the sets \(U_k\) are decreasing and periodic, and their intersection \(U = \bigcap_{k \ge 1} U_k\) has density \(d = \lim_{k \to \infty} d_k \ge \epsilon\). However, by hypothesis, for any choice of residues, the uncovered set has density \(0\), a contradiction.
Therefore, for every \(\epsilon > 0\), there exists a \(k\) such that for every choice of congruence classes \(a_i\), the density of integers not covered by the first \(k\) congruences is less than \(\epsilon\).
\boxed{\text{Yes}}
I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.
EDIT: After reading a link someone else posted to Terrance Tao's wiki page, he has a paragraph that somewhat answers this question:
> Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools. Unfortunately, it is hard to tell in advance which category a given problem falls into, short of an expert literature review. (However, if an Erdős problem is only stated once in the literature, and there is scant record of any followup work on the problem, this suggests that the problem may be of the second category.)
from here: https://github.com/teorth/erdosproblems/wiki/AI-contribution...
This is no longer true, a prior solution has just been found[1], so the LLM proof has been moved to the Section 2 of Terence Tao's wiki[2].
[1] - https://www.erdosproblems.com/forum/thread/281#post-3325
[2] - https://github.com/teorth/erdosproblems/wiki/AI-contribution...
This aligns nicely with the rest of the canon. LLMs are just stochastic parrots. Fancy autocomplete. A glorified Google search with worse footnotes. Any time they appear to do something novel, the correct explanation is that someone, somewhere, already did it, and the model merely vibes in that general direction. The fact that no human knew about it at the time is a coincidence best ignored.
The same logic applies to code. “Vibe coding” isn’t real programming. Real programming involves intuition, battle scars, and a sixth sense for bugs that can’t be articulated but somehow always validates whatever I already believe. When an LLM produces correct code, that’s not engineering, it’s cosplay. It didn’t understand the problem, because understanding is defined as something only humans possess, especially after the fact.
Naturally, only senior developers truly code. Juniors shuffle syntax. Seniors channel wisdom. Architecture decisions emerge from lived experience, not from reading millions of examples and compressing patterns into a model. If an LLM produces the same decisions, it’s obviously cargo-culting seniority without having earned the right to say “this feels wrong” in a code review.
Any success is easy to dismiss. Data leakage. Prompt hacking. Cherry-picking. Hidden humans in the loop. And if none of those apply, then it “won’t work on a real codebase,” where “real” is defined as the one place the model hasn’t touched yet. This definition will be updated as needed.
Hallucinations still settle everything. One wrong answer means the whole system is fundamentally broken. Human mistakes, meanwhile, are just learning moments, context switches, or coffee shortages. This is not a double standard. It’s experience.
Jobs are obviously safe too. Software engineering is mostly communication, domain expertise, and navigating ambiguity. If the model starts doing those things, that still doesn’t count, because it doesn’t sit in meetings, complain about product managers, or feel existential dread during sprint planning.
So yes, the Erdos situation is resolved. Nothing new happened. No reasoning occurred. Progress remains hype. The trendline is imaginary. And any discomfort you feel is probably just social media, not the ground shifting under your feet.
All of that is going away so the best way to deal with it is to call it a stochastic parrot and deny reality.
The only possible explanation is people say things they don't believe out of FUD. Literally the only one.
(edit: fixed link)
I'm pretty sure it's like "can it run DOOM" and someone could make an LLM that passes this that runs on an pregnancy test
It wasn't AI generated. But if it was there is currently no way for anyone to tell the difference.
This is false. There are many human-legible signs, and there do exist fairly reliable AI detection services (like Pangram).
And even odder that the proof was by Erdos himself and yet he listed it as an open problem!
https://mehmetmars7.github.io/Erdosproblems-llm-hunter/probl...
https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...
carbocation•1h ago