frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

https://arxiv.org/abs/2601.14340
1•PaulHoule•2m ago•0 comments

Show HN: AI Agent Tool That Keeps You in the Loop

https://github.com/dshearer/misatay
1•dshearer•3m ago•0 comments

Why Every R Package Wrapping External Tools Needs a Sitrep() Function

https://drmowinckels.io/blog/2026/sitrep-functions/
1•todsacerdoti•3m ago•0 comments

Achieving Ultra-Fast AI Chat Widgets

https://www.cjroth.com/blog/2026-02-06-chat-widgets
1•thoughtfulchris•5m ago•0 comments

Show HN: Runtime Fence – Kill switch for AI agents

https://github.com/RunTimeAdmin/ai-agent-killswitch
1•ccie14019•8m ago•1 comments

Researchers surprised by the brain benefits of cannabis usage in adults over 40

https://nypost.com/2026/02/07/health/cannabis-may-benefit-aging-brains-study-finds/
1•SirLJ•9m ago•0 comments

Peter Thiel warns the Antichrist, apocalypse linked to the 'end of modernity'

https://fortune.com/2026/02/04/peter-thiel-antichrist-greta-thunberg-end-of-modernity-billionaires/
1•randycupertino•10m ago•2 comments

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
2•breve•15m ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•16m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
1•ks2048•16m ago•0 comments

Was going to share my work

1•hiddenarchitect•20m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•20m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
3•mltvc•24m ago•1 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•25m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•25m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
2•SchwKatze•25m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•26m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
1•guerrilla•27m ago•0 comments

Y Combinator Founder Organizes 'March for Billionaires'

https://mlq.ai/news/ai-startup-founder-organizes-march-for-billionaires-protest-against-californi...
1•hidden80•28m ago•2 comments

Ask HN: Need feedback on the idea I'm working on

1•Yogender78•28m ago•0 comments

OpenClaw Addresses Security Risks

https://thebiggish.com/news/openclaw-s-security-flaws-expose-enterprise-risk-22-of-deployments-un...
2•vedantnair•29m ago•0 comments

Apple finalizes Gemini / Siri deal

https://www.engadget.com/ai/apple-reportedly-plans-to-reveal-its-gemini-powered-siri-in-february-...
1•vedantnair•29m ago•0 comments

Italy Railways Sabotaged

https://www.bbc.co.uk/news/articles/czr4rx04xjpo
8•vedantnair•30m ago•2 comments

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•fanf2•31m ago•0 comments

Nintendo Wii Themed Portfolio

https://akiraux.vercel.app/
2•s4074433•35m ago•2 comments

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•38m ago•1 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

2•amichail•38m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•45m ago•2 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•47m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
2•spenvo•47m ago•0 comments
Open in hackernews

The End of the Train-Test Split

https://folio.benguzovsky.com/train-test
35•gmays•2mo ago

Comments

elpakal•2mo ago
> Since the data will always be flawed and the test set won't be blind, the machine learning engineer's priority should be spent working with policy teams to improve the data.

It's interesting to watch this dynamic change from data set size measuring contests to quality and representativeness. In "A small number of samples can poison LLMs of any size" from Claude they hit on the same shift, but their position is more about security considerations than quality.

https://www.anthropic.com/research/small-samples-poison

henning•2mo ago
> Two months later, you've cracked it

Hehe.

roadside_picnic•2mo ago
> You make an LLM decision tree, one LLM call per policy section, and aggregate the results.

I can never understand why people jump to these weird direct calls to the LLM rather than working with embeddings for classification tasks.

I have a hard time believing that

- the context text embedding

- the image vector representation

- the policy text embedding(s)

Cannot be combined to create a classification model is likely several orders of magnitude faster than chaining calls to an LLM, and I wouldn't be remotely surprised to see it perform notably better on the task described.

I have used LLM as classifier and it does make sense in cases of extremely limited data (though they rarely work well enough), but if you're going to be calling the LLM in such complex ways it's better to stop thinking of this as a classic ML problem and rather think of it as an agentic content moderator.

In this case you can ignore the train/test split in favor of evals which you would create as you would for any other LLM agent workflow.

stephantul•2mo ago
I don’t really believe this is a paradigm shift with regards to train/test splits.

Before LLMs you would do a lot of these things, it’s just become a lot easier to get started and not train. What the author describes is very similar to the standard ml product loop in companies, including it being very difficult to “beat” the incumbent model because it has been overfit on the test set that is used compare the incumbent to your own model.