frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Trying to make an Automated Ecologist: A first pass through the Biotime dataset

https://chillphysicsenjoyer.substack.com/p/trying-to-make-an-automated-ecologist
1•crescit_eundo•52s ago•0 comments

Watch Ukraine's Minigun-Firing, Drone-Hunting Turboprop in Action

https://www.twz.com/air/watch-ukraines-minigun-firing-drone-hunting-turboprop-in-action
1•breve•1m ago•0 comments

Free Trial: AI Interviewer

https://ai-interviewer.nuvoice.ai/
1•sijain2•1m ago•0 comments

FDA Intends to Take Action Against Non-FDA-Approved GLP-1 Drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
1•randycupertino•3m ago•0 comments

Supernote e-ink devices for writing like paper

https://supernote.eu/choose-your-product/
1•janandonly•5m ago•0 comments

We are QA Engineers now

https://serce.me/posts/2026-02-05-we-are-qa-engineers-now
1•SerCe•5m ago•0 comments

Show HN: Measuring how AI agent teams improve issue resolution on SWE-Verified

https://arxiv.org/abs/2602.01465
2•NBenkovich•5m ago•0 comments

Adversarial Reasoning: Multiagent World Models for Closing the Simulation Gap

https://www.latent.space/p/adversarial-reasoning
1•swyx•6m ago•0 comments

Show HN: Poddley.com – Follow people, not podcasts

https://poddley.com/guests/ana-kasparian/episodes
1•onesandofgrain•14m ago•0 comments

Layoffs Surge 118% in January – The Highest Since 2009

https://www.cnbc.com/2026/02/05/layoff-and-hiring-announcements-hit-their-worst-january-levels-si...
7•karakoram•14m ago•0 comments

Papyrus 114: Homer's Iliad

https://p114.homemade.systems/
1•mwenge•14m ago•1 comments

DicePit – Real-time multiplayer Knucklebones in the browser

https://dicepit.pages.dev/
1•r1z4•14m ago•1 comments

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
2•PaulHoule•16m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
2•dshearer•17m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•18m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•19m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•22m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
1•SirLJ•24m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
3•randycupertino•24m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
3•breve•30m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•30m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
3•ks2048•30m ago•0 comments

Was going to share my work

1•hiddenarchitect•34m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•34m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
3•mltvc•38m ago•1 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•39m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•39m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
3•SchwKatze•40m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•40m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
7•guerrilla•42m ago•1 comments
Open in hackernews

The AI Nerf Is Real

https://isitnerfed.org
8•rumble_poster•5mo ago

Comments

rumble_poster•5mo ago
Hello everyone, we’re working on a project called IsItNerfed, where we monitor LLMs in real time.

We run a variety of tests through Claude Code and the OpenAI API (using GPT-4.1 as a reference point for comparison). We also have a Vibe Check feature that lets users vote whenever they feel the quality of LLM answers has either improved or declined. Over the past few weeks of monitoring, we’ve noticed just how volatile Claude Code’s performance can be.

Chart is here: https://i.snipboard.io/RydmH7.jpg

1) Up until August 28, things were more or less stable.

2) On August 29, the system went off track — the failure rate doubled, then returned to normal by the end of the day.

3) The next day, August 30, it spiked again to 70%. It later dropped to around 50% on average, but remained highly volatile for nearly a week.

4) Starting September 4, the system settled into a more stable state again.

It’s no surprise that many users complain about LLM quality and get frustrated when, for example, an agent writes excellent code one day but struggles with a simple feature the next. This isn’t just anecdotal — our data clearly shows that answer quality fluctuates over time.

By contrast, our GPT-4.1 tests show numbers that stay consistent from day to day.

And that’s without even accounting for possible bugs or inaccuracies in the agent CLIs themselves (for example, Claude Code), which are updated with new versions almost every day.

What’s next: we plan to add more benchmarks and more models for testing. Share your suggestions and requests — we’ll be glad to include them and answer your questions.

https://isitnerfed.org

bartx•4mo ago
Hey, this looks cool! Are you planning to open-source the project by any chance? It would be useful to see or contribute to the code generation evaluations you use