frontpage.

Show HN: Another experiment with an Erdos problem and LLMs

1•ilitirit•1h ago

Background: I am a coder, not a mathematician, but I was quite entertained by this story:

https://news.ycombinator.com/item?id=47903126

I wondered how far I could get by just choosing a random open problem and throwing it at LLMs.

Disclosure: I have no idea what the problem even refers to, let alone whether or not the output is even remotely correct. My interest is purely for testing capabilities of various models, curiousity, and entertainment.

Problem: https://www.erdosproblems.com/691

  Given A\subseteq \mathbb{N} let M_A=\{ n \geq 1 : a\mid n\textrm{ for some }a\in A\} be the set of multiples of A. 
  Find a necessary and sufficient condition on A for M_A to have density 1.

My approach: I used DeepSeek in Expert mode, using the same prompt as in the linked HN submission. It thought for a very long time, but I was doing other things in the background so I didn't really time it. I pressed "Continue" twice over the space of maybe 60mins. The output says it thought for about 46mins.

Once it generated a proof, I asked Opus 4.7 to review it, and then entered the review into DeepSeek which made edits, corrections and refinements. This back-and-forth continued till Opus 4.7 was reasonably happy. At that point, I called in Gemini 3.1 Pro Preview, which raised issues which Opus missed. Opus acknowledged the feedback, and then I placed its feedback into Deepseek for a final round. Essentially, what Opus says Deepseek generated was a "clean exposition of a D[avenport]-E[rdos] corollary", not a new result. In all likelihood this result may already be known (Deepseek was not allowed to use the internet for this phase), or even wrong.

In "simple" terms:

  The argument actually proves a stronger fact for every set \( A \) of natural numbers:  
  The upper density of the set \( M_A \) equals the largest possible lower density you can get from finite subsets of \( A \), and that also equals the lower density of \( M_A \).
  When the upper density is 1, it forces the lower density to also be 1, so the natural (ordinary) density exists and equals 1 automatically, without needing any extra conditions.
  The only non-basic part of the proof is the Davenport–Erdős theorem; everything else is simple.

In any case, these were my takeaways:

- These new models seem to be surprisingly capable especially when used to in conjunction with each other, even with fairly simple prompts

- I am quite impressed by Deepseek. I'm going to review its coding ability, and may even switch completely from Anthropic

- This was a genuinely interesting exercise, even if I have no idea if any of it is correct or useful

Some other observations:

- Opus was really fast at reviewing Deepseek's output. Literally seconds

- Gemini had trouble figuring out what "Erdos 691" referred to

- The free version of ChatGPT of generated mostly useless output. I didn't include it.

Chat links below:

https://chat.deepseek.com/share/hpguvrhcxn226bi3hn

https://claude.ai/share/4f3ccad1-d862-4e37-8333-8a1ebd84b38f

https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...

Google's best practices document for designing AI products

Excited Delirium[audio]

North Korean IT workers are stealing remote jobs: Americans are helping them

New Bible TUI App Releases v1.0.0

Agent Harness Engineering

The Normal Work of Creating Reliability

Everything that went wrong with Claude

How to hire people who are better than you

Terraform is dead

Chernobyl disaster (April 26th, 1986)

But what is L0-L2 processing for satellite data?

Don't Confuse Computer Science with Coding

AI can cost more than human workers now

Conversations with Cosmos

I Am Doing This: The Origin Story of Project-AI

Lipovive Review: Effective Formula for Fitness

OpenAI boss 'deeply sorry' for not telling police of mass shooter's account

An AI driven WP theming workflow

Savings Hacks You Must Know

Claude 4.7 vs. ChatGPT 5.5

Claude Platform on AWS (Coming Soon)

Txtfold – summarize large files for LLMs

The Silencing Engine

Show HN: ChatForm – Create an AI chat form in 1 minute

Draft's knowledge graph engine – deterministic codebase understanding for AI

Why the Future Doesn't Need US

The Publishing Mystery That No One Wants to Talk About

AMD's Zen: Coming Back from the Dead

Coyote vs. Acme (1990)

Learning About FPGAs in Finance