frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

The AI Nerf Is Real

https://isitnerfed.org
4•rumble_poster•2h ago

Comments

rumble_poster•2h ago
Hello everyone, we’re working on a project called IsItNerfed, where we monitor LLMs in real time.

We run a variety of tests through Claude Code and the OpenAI API (using GPT-4.1 as a reference point for comparison). We also have a Vibe Check feature that lets users vote whenever they feel the quality of LLM answers has either improved or declined. Over the past few weeks of monitoring, we’ve noticed just how volatile Claude Code’s performance can be.

Chart is here: https://i.snipboard.io/RydmH7.jpg

1) Up until August 28, things were more or less stable.

2) On August 29, the system went off track — the failure rate doubled, then returned to normal by the end of the day.

3) The next day, August 30, it spiked again to 70%. It later dropped to around 50% on average, but remained highly volatile for nearly a week.

4) Starting September 4, the system settled into a more stable state again.

It’s no surprise that many users complain about LLM quality and get frustrated when, for example, an agent writes excellent code one day but struggles with a simple feature the next. This isn’t just anecdotal — our data clearly shows that answer quality fluctuates over time.

By contrast, our GPT-4.1 tests show numbers that stay consistent from day to day.

And that’s without even accounting for possible bugs or inaccuracies in the agent CLIs themselves (for example, Claude Code), which are updated with new versions almost every day.

What’s next: we plan to add more benchmarks and more models for testing. Share your suggestions and requests — we’ll be glad to include them and answer your questions.

https://isitnerfed.org

68000 – The CPU ahead of its time

https://www.youtube.com/watch?v=njGWWg69B4A
1•doener•51s ago•0 comments

You may not be interested in climate change, but it is interested in you

https://www.defenseone.com/ideas/2025/09/you-may-not-be-interested-climate-change-it-interested-y...
1•mooreds•1m ago•0 comments

I'm build a skill Match-3 game with Chess-style Elo ranking (Browser/Mobile)

https://guivo.io/?sid=ae033d81-a7a3-4a75-86e7-ef3c7259fa24
1•ivorcosta•4m ago•1 comments

An Inline Cache Isn't Just a Cache

https://www.mgaudet.ca/technical/2018/6/5/an-inline-cache-isnt-just-a-cache
1•achierius•4m ago•0 comments

Fraudulent Publishing in the Mathematical Sciences

https://arxiv.org/abs/2509.07257
2•bikenaga•8m ago•0 comments

FastComments is Now Globally Distributed (and more rusty)

https://blog.fastcomments.com/(9-10-2025)-fastcomments-is-now-globally-distributed.html
1•winrid•10m ago•1 comments

Dependabot Support for Vcpkg

https://devblogs.microsoft.com/cppblog/dependabot-support-for-vcpkg/
1•mariuz•11m ago•0 comments

Has Google ended support for plain HTML search?

https://www.google.com/httpservice/retry/enablejs
2•hackerb9•12m ago•1 comments

Android 16 QPR1 source code is nowhere to be found but Google swears it's coming

https://www.androidauthority.com/android-16-qpr1-source-code-delay-3596650/
1•cdesai•14m ago•1 comments

Vercel Updates Pro Pricing

https://vercel.com/blog/new-pro-pricing-plan
1•aosaigh•16m ago•1 comments

SourceForge Sunsets Developer Web Hosting

https://sourceforge.net/blog/sunsetting-developer-web-user-web/
1•henry_flower•17m ago•0 comments

Overview of the DiskANN Project (2018–present)

https://harsha-simhadri.org/diskann-overview.html
1•fzliu•17m ago•0 comments

ChatGPT 5 marginalizing Gelman's measurement error model in Stan

https://statmodeling.stat.columbia.edu/2025/09/09/show-dont-tell-chatgpt-5-marginalizing-gelmans-...
1•momeara•19m ago•0 comments

PgEdge Goes Open Source

https://www.pgedge.com/blog/pgedge-goes-open-source
2•atombender•20m ago•0 comments

Lessons from Hidden Satoshi Gold Book on Crypto and AI

https://satoshigoldbook.com/
1•jamnicabpbnik•20m ago•1 comments

HiTex: A spam factory for AI-generated books

https://laurent.le-brun.eu/blog/hitex-a-spam-factory-for-ai-generated-books
1•laurentlb•24m ago•0 comments

Is Apple's iPhone 17 launch a win for India?

https://restofworld.org/2025/is-apples-iphone-17-launch-a-win-for-india-we-asked-experts/
1•colinprince•25m ago•0 comments

Trial and Error Driven Development

https://www.stevenoxley.com/blog/2025/09/09/trial-and-error-driven-development/
1•xonev•26m ago•0 comments

Exploratorium Cookbook Set: Volumes I, II and III

https://www.exploratoriumstore.com/products/exploratorium-cookbook-set
1•mhb•27m ago•1 comments

NATO's Chemical, Biological, Radiological and Nuclear (CBRN) Defence Policy

https://www.nato.int/cps/en/natohq/official_texts_197768.htm
1•type0•29m ago•0 comments

Senator: FTC should investigate Microsoft for dangerous and insecure software

https://www.wyden.senate.gov/news/press-releases/wyden-calls-for-ftc-investigation-of-microsoft-f...
2•Improvement•29m ago•0 comments

'China Is the Engine' Driving Nations Away from Fossil Fuels, Report Says

https://www.nytimes.com/2025/09/08/climate/china-clean-energy-fossil-fuel-research.html
3•bookofjoe•30m ago•1 comments

Show HN: HumanAlarm – Real people knock on your door to wake you up

https://humanalarm.com
1•soelost•30m ago•0 comments

The rules behing Rust functions

https://blog.cuongle.dev/p/the-hidden-rules-behind-rust-functions
2•gidellav•31m ago•0 comments

Launching Bottlenecks Institute

https://www.bottlenecksinstitute.com/
1•parnibrk•33m ago•0 comments

In 1979 one of the best guitar solos recorded was cut for radio time

https://www.seekhifi.com/my-sharona-by-the-knack/
3•wmeredith•34m ago•1 comments

Lifetime Starlink Deal? Nope, It's Just a Scam Circulating on Facebook

https://www.pcmag.com/news/lifetime-starlink-deal-nope-its-just-a-scam-circulating-on-facebook
2•rolph•34m ago•0 comments

Understanding Motion and Relativity with Spacetime Diagrams

https://steuard.github.io/spacetime/intro.html
4•Steuard•35m ago•1 comments

Coffee naps might be the weirdest–and smartest–way to recharge

https://www.nationalgeographic.com/health/article/caffeine-nap-explained
2•manveerc•35m ago•1 comments

How do we decide if a tax is good or bad?

https://www.theguardian.com/australia-news/2025/aug/21/how-do-we-decide-if-a-tax-is-good-or-bad-a...
1•PaulHoule•40m ago•0 comments