frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

18th-century mechanical volcano roars to life 250 years later

https://www.sciencedaily.com/releases/2026/05/260502015359.htm
1•samizdis•1m ago•0 comments

WeSearch

https://wesearch.press/
1•EGCstudy•2m ago•1 comments

Making 10 apps in 30 Days

https://bendansby.com/posts/10-apps-30-days.html
1•webwielder2•2m ago•0 comments

Iceland's Pools and Hot Tubs Now UNESCO-Recognized. Some Locals Aren't Thrilled.

https://www.nytimes.com/2026/04/30/world/europe/iceland-hot-tub-pools-tourism.html
1•bookofjoe•2m ago•1 comments

Show HN: Predicting the 2026 Kentucky Derby with 1T Monte Carlo Sims on Burla

https://burla-cloud.github.io/examples/kentucky-derby-demo/
1•Jack_at_Burla•4m ago•0 comments

AI talks draw backlash from Mass. state lawmakers

https://www.politico.com/news/2026/05/01/ai-backlash-massachusetts-lawmakers-00903440
1•1vuio0pswjnm7•6m ago•0 comments

Life update: Zig, AI, unemployment, and more [video]

https://www.youtube.com/watch?v=DhhPUrizZcw
1•rubenflamshep•7m ago•0 comments

How Oregon's Data Center Boom Is Supercharging a Water Crisis

https://waterwatch.org/how-oregons-data-center-boom-is-supercharging-a-water-crisis/
1•therobots927•8m ago•0 comments

Palantir Comes to Campus

https://nymag.com/intelligencer/article/palantir-yale-conference-ai.html
1•jbegley•9m ago•0 comments

Shitpostmodernism: Understanding the Slopgeneration

https://www.spikeartmagazine.com/articles/essay-shitpostmodernism
1•thinkingemote•9m ago•0 comments

AI Agents Are the Mass-Produced Cars of Software

https://telegraphic.substack.com/p/ai-agents-are-the-mass-produced-cars
1•telegrahi•9m ago•0 comments

Opioid maker Purdue Pharma shuts down as part of $7.4B deal

https://www.usatoday.com/story/news/nation/2026/05/01/purdue-pharma-shuts-down-opioid-crisis-oxyc...
1•geox•11m ago•0 comments

Disneyland Now Uses Face Recognition on Visitors

https://www.wired.com/story/security-news-this-week-disneyland-now-uses-face-recognition-on-visit...
2•Brajeshwar•15m ago•0 comments

Digital Ecosystems: Interactive Multi-Agent Neural Cellular Automata

https://pub.sakana.ai/digital-ecosystem/
1•jarmitage•17m ago•0 comments

How are Life-Size Figures Created at hololive production?

https://coveredge.cover-corp.com/en/list/4759
1•ai_slop_hater•17m ago•0 comments

Vibecoded my dream game, GeoGuesser for guns, now its helping with student bills

https://gunguesser.com
4•salad_vr•17m ago•5 comments

The Railway and the Balloon

https://netwars.pelicancrossing.net/2026/05/01/the-railway-and-the-balloon/
1•ColinWright•21m ago•0 comments

Floating Armoury

https://en.wikipedia.org/wiki/Floating_armoury
1•jjmarr•24m ago•0 comments

Customizing Claude Code spinner verbs

https://www.augmentedswe.com/p/customizing-claude-code-spinner-verbs
1•wordsaboutcode•24m ago•0 comments

Back end-for-Front end: The most secure architecture for browser-based apps

https://fusionauth.io/blog/backend-for-frontend-security-architecture
2•mooreds•29m ago•0 comments

Voyager and the Art of Graceful Degradation

https://www.flyingbarron.com/2026/04/voyager-and-art-of-graceful-degradation.html
1•mooreds•29m ago•0 comments

Did I photograph the Aurora or was it something else? (2016)

https://wp.lancs.ac.uk/aurorawatchuk/2016/03/16/did-i-photgraph-the-aurora-or-was-it-something-else/
1•susam•31m ago•0 comments

Upcoming Blender Development Fund and AI Policies

https://www.blender.org/news/upcoming-blender-development-fund-and-ai-policies/
2•sensanaty•32m ago•0 comments

The Annoying Usefulness of Emacs [video]

https://www.youtube.com/watch?v=DMbrNhx2zWQ
2•susam•32m ago•0 comments

The Sky Tonight

https://theskylive.com/guide
2•susam•34m ago•0 comments

New US phone network for Christians to block porn and gender-related content

https://www.technologyreview.com/2026/05/01/1136739/a-new-t-mobile-network-for-christians-aims-to...
8•thinkingemote•37m ago•2 comments

Making Your Writing Work Harder for You

https://training.kalzumeus.com/newsletters/archive/content-marketing-strategy
2•eigenBasis•39m ago•0 comments

Show HN: TradingAgents without the API bill – run multi agents in Claude Code

https://github.com/lucemia/trading-agents-plugin
1•lucemia51•43m ago•0 comments

Stop Supplying. Start Owning

https://allensthoughts.com/2026/05/01/stop-supplying-start-owning/
2•herbertl•44m ago•0 comments

Uber wants to turn its drivers into a sensor grid for AV companies

https://techcrunch.com/2026/05/01/uber-wants-to-turn-its-millions-of-drivers-into-a-sensor-grid-f...
6•nickvec•45m ago•1 comments
Open in hackernews

Chinese AI models are ~8 months behind and falling further behind

https://twitter.com/scaling01/status/2050395242663223751
2•enraged_camel•1h ago

Comments

giardini•1h ago
No problem: they're always at most just one theft away from you!8-)
jqpabc123•1h ago
Chinese models are cheaper and likely to remain so due to lower energy costs.
tokkkie•1h ago
chinese models feel strong in japan — kanji. but outside language? maybe ... max sonnet 4.5 level.

do benchmarks reflect that gap in english region?

allears•54m ago
Not everybody needs cutting-edge performance. Cost per token is turning out to be more important.
ilia-a•26m ago
That doesn't seem right and seems to miss GLM 5.1 and Kimi 2.6. Not to mention there is the whole argument of cost/value that Chinese OSS models have vs GPT/Claude.
ollin•15m ago
The source here is "CAISI Evaluation of DeepSeek V4 Pro" [1]; the US NIST ran their own benchmarks (including several internal ones) and reported the following table:

    | Domain               | Benchmark              | Model (reasoning level) |                             |                          |                       |
    |--:-------------------|------------------------|-------------------------|-----------------------------|--------------------------|-----------------------|
    |                      |                        | OpenAI GPT-5.5 (xhigh)  | OpenAI GPT-5.4 mini (xhigh) | Anthropic Opus 4.6 (max) | DeepSeek V4 Pro (max) |
    | Cyber                | CTF-Archive-Diamond    | **71%**                 | 32%                         | 46%                      | 32%                   |
    | Software Engineering | SWE-Bench Verified*    | **81%**                 | 73%                         | 79%                      | 74%                   |
    |                      | PortBench              | **78%**                 | 41%                         | 60%                      | 44%                   |
    | Natural Sciences     | FrontierScience        | **79%**                 | 74%                         | 72%                      | 74%                   |
    |                      | GPQA-Diamond           | **96%**                 | 87%                         | 91%                      | 90%                   |
    | Abstract Reasoning   | ARC-AGI-2 semi-private | **79%**                 | –                           | 63%                      | 46%                   |
    | Mathematics          | OTIS-AIME-2025         | **100%**                | 90%                         | 92%                      | 97%                   |
    |                      | PUMaC 2024             | **96%**                 | 93%                         | 95%                      | **96%**               |
    |                      | SMT 2025               | **99%**                 | 92%                         | 94%                      | 96%                   |
    | IRT-Estimated Elo    | **IRT-Estimated Elo**  | **1260 ± 28**           | 749 ± 46                    | 999 ± 27                 | 800 ± 28              |
Notably, two of the benchmarks with the biggest capability gap are CAISI-internal/private ones (CTF-Archive-Diamond, PortBench). I read this as "DeepSeek is well-tuned for public benchmarks, and less generally intelligent than GPT5.5 on held-out tasks" but a less-charitable reading would be "US government reports US models do best on benchmarks that only the US government can run". Agent benchmarking is fraught with peril [2] and an impartial benchmarker (who disproportionately overlooks bugs/issues in their evaluation of certain models) can absolutely tilt the scales, so I would not be surprised if a PRC-led benchmarking of frontier models came to the opposite conclusion.

[1] https://www.nist.gov/news-events/news/2026/05/caisi-evaluati...

[2] https://epoch.ai/gradient-updates/why-benchmarking-is-hard