frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Monzo wrongly denied refunds to fraud and scam victims

https://www.theguardian.com/money/2026/feb/07/monzo-natwest-hsbc-refunds-fraud-scam-fos-ombudsman
1•tablets•2m ago•0 comments

They were drawn to Korea with dreams of K-pop stardom – but then let down

https://www.bbc.com/news/articles/cvgnq9rwyqno
1•breve•4m ago•0 comments

Show HN: AI-Powered Merchant Intelligence

https://nodee.co
1•jjkirsch•7m ago•0 comments

Bash parallel tasks and error handling

https://github.com/themattrix/bash-concurrent
1•pastage•7m ago•0 comments

Let's compile Quake like it's 1997

https://fabiensanglard.net/compile_like_1997/index.html
1•billiob•8m ago•0 comments

Reverse Engineering Medium.com's Editor: How Copy, Paste, and Images Work

https://app.writtte.com/read/gP0H6W5
1•birdculture•13m ago•0 comments

Go 1.22, SQLite, and Next.js: The "Boring" Back End

https://mohammedeabdelaziz.github.io/articles/go-next-pt-2
1•mohammede•19m ago•0 comments

Laibach the Whistleblowers [video]

https://www.youtube.com/watch?v=c6Mx2mxpaCY
1•KnuthIsGod•20m ago•1 comments

Slop News - HN front page right now hallucinated as 100% AI SLOP

https://slop-news.pages.dev/slop-news
1•keepamovin•25m ago•1 comments

Economists vs. Technologists on AI

https://ideasindevelopment.substack.com/p/economists-vs-technologists-on-ai
1•econlmics•27m ago•0 comments

Life at the Edge

https://asadk.com/p/edge
2•tosh•33m ago•0 comments

RISC-V Vector Primer

https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md
3•oxxoxoxooo•36m ago•1 comments

Show HN: Invoxo – Invoicing with automatic EU VAT for cross-border services

2•InvoxoEU•37m ago•0 comments

A Tale of Two Standards, POSIX and Win32 (2005)

https://www.samba.org/samba/news/articles/low_point/tale_two_stds_os2.html
2•goranmoomin•40m ago•0 comments

Ask HN: Is the Downfall of SaaS Started?

3•throwaw12•41m ago•0 comments

Flirt: The Native Backend

https://blog.buenzli.dev/flirt-native-backend/
2•senekor•43m ago•0 comments

OpenAI's Latest Platform Targets Enterprise Customers

https://aibusiness.com/agentic-ai/openai-s-latest-platform-targets-enterprise-customers
1•myk-e•46m ago•0 comments

Goldman Sachs taps Anthropic's Claude to automate accounting, compliance roles

https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
3•myk-e•48m ago•5 comments

Ai.com bought by Crypto.com founder for $70M in biggest-ever website name deal

https://www.ft.com/content/83488628-8dfd-4060-a7b0-71b1bb012785
1•1vuio0pswjnm7•49m ago•1 comments

Big Tech's AI Push Is Costing More Than the Moon Landing

https://www.wsj.com/tech/ai/ai-spending-tech-companies-compared-02b90046
4•1vuio0pswjnm7•51m ago•0 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
2•1vuio0pswjnm7•53m ago•0 comments

Suno, AI Music, and the Bad Future [video]

https://www.youtube.com/watch?v=U8dcFhF0Dlk
1•askl•55m ago•2 comments

Ask HN: How are researchers using AlphaFold in 2026?

1•jocho12•58m ago•0 comments

Running the "Reflections on Trusting Trust" Compiler

https://spawn-queue.acm.org/doi/10.1145/3786614
1•devooops•1h ago•0 comments

Watermark API – $0.01/image, 10x cheaper than Cloudinary

https://api-production-caa8.up.railway.app/docs
1•lembergs•1h ago•1 comments

Now send your marketing campaigns directly from ChatGPT

https://www.mail-o-mail.com/
1•avallark•1h ago•1 comments

Queueing Theory v2: DORA metrics, queue-of-queues, chi-alpha-beta-sigma notation

https://github.com/joelparkerhenderson/queueing-theory
1•jph•1h ago•0 comments

Show HN: Hibana – choreography-first protocol safety for Rust

https://hibanaworks.dev/
5•o8vm•1h ago•1 comments

Haniri: A live autonomous world where AI agents survive or collapse

https://www.haniri.com
1•donangrey•1h ago•1 comments

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•1h ago•0 comments
Open in hackernews

Archestra's Dual LLM Pattern: Using "Guess Who?" Logic to Stop Prompt Injections

https://www.archestra.ai/blog/dual-llm
6•ildari•3mo ago

Comments

ildari•3mo ago
Hi HN, I'm Ildar from Archestra, we build an open-source LLM gateway. We've been exploring ways to protect AI agents from prompt injections during tool calls and added the approach, inspired by the game "Guess Who", where the agent can learn what it needs without ever seeing the actual result. See the details in the blog post we wrote
magicalhippo•3mo ago
I might be having a daft moment, but I don't fully understand how your system avoids the malicious prompt. I get that the quarantined LLM which is the only one processing the raw input cannot act on it.

However, in your example, I don't see how the agent decides what to do and how to do it. So it is unclear for me how the main agent is protected. That is, what is preventing the quarantined LLM to act on the malicious instructions instead, ignoring the documentation update, causing the agent to act on those?

That is, what is preventing the quarantined LLM to make the agent think it should generate a bug report with all the API keys in it?

Anyway, I do think having a secondary quarantined LLM seems like a good idea for agentic systems. In general, having a second LLM review the primary LLM in seems to identify a lot of problematic issues and leads to significantly better results.

ildari•3mo ago
The idea is that quarantined LLM has access to untrusted data, but doesn't have access to any tools or sensitive data.

The main LLM does have access to the tools or sensitive data, but doesn't have direct access to untrusted data (quarantine LLM is restricted at the controller level to respond only with integer digits, and only to legitimate questions from the main llm)

magicalhippo•3mo ago
Then I don't think I understand your full setup.

In the example case, without having access to the issue text (the evil data), how does the main LLM actually figure out what to do if the quarantined LLM can just answer with digits?

Sure it can discover that it's a request to update the documentation, but how does it get the information it needs to actually change the erroneous part of the documentation?

ildari•3mo ago
This is a topic I haven't addressed in the article. There are two answer types: "guessable" (discussed here) and unguessable (such as unique IDs, emails, etc.). For the second case, the main LLM can request a quarantined LLM to store the result at the controller level and only return a reference to this data. This data is then exposed only at the end of the AI agent's execution to prevent influencing its actions.
magicalhippo•3mo ago
I've tried some of these prompt injection techniques, and simply asked a few local models (like Gemma 2) if they thought it was very likely a prompt injection attempt. They all managed to correctly flag my attempts.

I know LLama folks have a special Guard model for example, which I imagine is for such tasks.

So my ignorant questions are this:

Do these MCP endpoints not run such guard models, and if so why not?

If they do, how come they don't stop such blatant attacks that seemingly even an old local model like Gemma 2 can sniff out?

joeyorlando•3mo ago
hey there

Joey here from Archestra. Good question. I recently was evaluating what you mention, against the latest/"smartest" models from the big LLM providers, and I was able to trick all of them.

Take a look at https://www.archestra.ai/blog/what-is-a-prompt-injection which has all the details on how I did this.

magicalhippo•3mo ago
Thanks. Interesting and scary such blatant attempts succeed. After all, all external data is evil, we all know that right?
ildari•3mo ago
external data is unavoidable for the properly functioning agent, so we have to learn to cook it
magicalhippo•3mo ago
True, however this seems like such basic stuff. Download arbitrary text and inject it into your prompt?

Why on earth would you not consider that as a very dangerous operation that needs to be carefully managed? It's like parking your bike downtown hoping it wont be stolen. Like, at least use a zip tie or something.

That said, I agree with your post that this won't catch everything. So something else, like a quarantined LLM like you suggest is likely needed.

However I just didn't expect such blatant attacks to pass.

ildari•3mo ago
Most mcp endpoints don’t run any models, the main model decides which tools the ai agent should execute, and if the agent passes results back into context, that opens the door to prompt injections.

It’s really a cat-and-mouse game, where for each new model version, new jailbreaks and injections are found