frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
1•PaulHoule•1m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
1•dshearer•3m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•3m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•5m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•7m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
1•SirLJ•9m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
1•randycupertino•10m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
2•breve•15m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•16m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
1•ks2048•16m ago•0 comments

Was going to share my work

1•hiddenarchitect•19m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•19m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
3•mltvc•24m ago•1 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•24m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•25m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
2•SchwKatze•25m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•26m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
1•guerrilla•27m ago•0 comments

Y Combinator Founder Organizes 'March for Billionaires'

https://mlq.ai/news/ai-startup-founder-organizes-march-for-billionaires-protest-against-californi...
1•hidden80•28m ago•2 comments

Ask HN: Need feedback on the idea I'm working on

1•Yogender78•28m ago•0 comments

OpenClaw Addresses Security Risks

https://thebiggish.com/news/openclaw-s-security-flaws-expose-enterprise-risk-22-of-deployments-un...
2•vedantnair•29m ago•0 comments

Apple finalizes Gemini / Siri deal

https://www.engadget.com/ai/apple-reportedly-plans-to-reveal-its-gemini-powered-siri-in-february-...
1•vedantnair•29m ago•0 comments

Italy Railways Sabotaged

https://www.bbc.co.uk/news/articles/czr4rx04xjpo
8•vedantnair•30m ago•2 comments

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•fanf2•31m ago•0 comments

Nintendo Wii Themed Portfolio

https://akiraux.vercel.app/
2•s4074433•35m ago•2 comments

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•37m ago•1 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

2•amichail•38m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•44m ago•2 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•46m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
2•spenvo•46m ago•0 comments
Open in hackernews

The AI Nerf Is Real

https://isitnerfed.org
8•rumble_poster•5mo ago

Comments

rumble_poster•5mo ago
Hello everyone, we’re working on a project called IsItNerfed, where we monitor LLMs in real time.

We run a variety of tests through Claude Code and the OpenAI API (using GPT-4.1 as a reference point for comparison). We also have a Vibe Check feature that lets users vote whenever they feel the quality of LLM answers has either improved or declined. Over the past few weeks of monitoring, we’ve noticed just how volatile Claude Code’s performance can be.

Chart is here: https://i.snipboard.io/RydmH7.jpg

1) Up until August 28, things were more or less stable.

2) On August 29, the system went off track — the failure rate doubled, then returned to normal by the end of the day.

3) The next day, August 30, it spiked again to 70%. It later dropped to around 50% on average, but remained highly volatile for nearly a week.

4) Starting September 4, the system settled into a more stable state again.

It’s no surprise that many users complain about LLM quality and get frustrated when, for example, an agent writes excellent code one day but struggles with a simple feature the next. This isn’t just anecdotal — our data clearly shows that answer quality fluctuates over time.

By contrast, our GPT-4.1 tests show numbers that stay consistent from day to day.

And that’s without even accounting for possible bugs or inaccuracies in the agent CLIs themselves (for example, Claude Code), which are updated with new versions almost every day.

What’s next: we plan to add more benchmarks and more models for testing. Share your suggestions and requests — we’ll be glad to include them and answer your questions.

https://isitnerfed.org

bartx•4mo ago
Hey, this looks cool! Are you planning to open-source the project by any chance? It would be useful to see or contribute to the code generation evaluations you use