frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

AI-powered text correction for macOS

https://taipo.app/
1•neuling•1m ago•1 comments

AppSecMaster – Learn Application Security with hands on challenges

https://www.appsecmaster.net/en
1•aqeisi•2m ago•1 comments

Fibonacci Number Certificates

https://www.johndcook.com/blog/2026/02/05/fibonacci-certificate/
1•y1n0•4m ago•0 comments

AI Overviews are killing the web search, and there's nothing we can do about it

https://www.neowin.net/editorials/ai-overviews-are-killing-the-web-search-and-theres-nothing-we-c...
2•bundie•9m ago•0 comments

City skylines need an upgrade in the face of climate stress

https://theconversation.com/city-skylines-need-an-upgrade-in-the-face-of-climate-stress-267763
3•gnabgib•10m ago•0 comments

1979: The Model World of Robert Symes [video]

https://www.youtube.com/watch?v=HmDxmxhrGDc
1•xqcgrek2•14m ago•0 comments

Satellites Have a Lot of Room

https://www.johndcook.com/blog/2026/02/02/satellites-have-a-lot-of-room/
2•y1n0•15m ago•0 comments

1980s Farm Crisis

https://en.wikipedia.org/wiki/1980s_farm_crisis
3•calebhwin•15m ago•1 comments

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•20m ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•27m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•35m ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•35m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
2•rolph•38m ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•38m ago•2 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•40m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•42m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•43m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•44m ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
3•rolph•44m ago•1 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•47m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•51m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
5•cratermoon•52m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•52m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•52m ago•1 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
2•hhs•56m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

2•vampiregrey•58m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•59m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
3•hhs•1h ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•1h ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

6•Philpax•1h ago•1 comments
Open in hackernews

Ask HN: The Proof or Bluff paper. Can "AI" do math?

1•henryjcee•7mo ago
Over the past 12 months I've seen lots of comments all over the place (here, X, legacy media, blogs etc.) making the case for "AI" performance on the IMO (Math Olympiad) being evidence for continued rapid increases in LLM performance. I've heard my friends who work in AI safety quote these results pretty often whenever they encounter scepticism about the coming AI singularity.

It seems to me that these comments stem from the DeepMind results from last summer[0] and February this year[1]. As I understand it, the models they're using for these tasks are very specialised to the task and also only accept formal language as input (i.e. not a textual or visual representation that a large multi-modal model could use).

I was having a read through the Proof or Bluff paper[2] this morning and while I don't think it's been reproduced yet, they found that none of the tested SOTA LLMs were able to make any meaningful progress (none scored over 5%) on solving questions in their test set. This corresponds with my limited experience in using LLMs for similar tasks. Needless to say I've not heard a peep about this paper from my AI safety friends.

My question is: How should I interpret the above? Maybe it's too cynical but my current thesis is that the DeepMind results are convenient headline-grabbers for the AI safety crowd, who are conflating the performance of a task-specific model with more general LLMs in order to make an unsubstantiated claim about progress in generalisable AI. Is that reasonable? What am I missing?

If the authors of Proof or Bluff are in here I'd also like to say thanks for doing the work on this. I can imagine that work like this isn't the sexiest but it is so refreshing seeing people take the time and care to generate some hard data about how good these models actually are. As someone considering a career switch at the moment, data like this is really useful context when trying to evaluate what the next few decades might look like.

[0] https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level

[1] https://techcrunch.com/2025/02/07/deepmind-claims-its-ai-performs-better-than-international-mathematical-olympiad-gold-medalists

[2] https://arxiv.org/abs/2503.21934v1