frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Ask HN: What are you using to mitigate prompt injection?

5•ramoz•9h ago
If anything at all.

Comments

oliver_dr•9h ago
We've been dealing with this at multiple layers. Here's what actually works in production:

Input-side (preventing injection):

- Strict input sanitization with role-boundary enforcement in the system prompt. Sounds basic, but most people skip it.

- Separate "user content" from "system instructions" at the API level. Don't concatenate untrusted input into your system prompt. Use the dedicated `user` role in the messages array.

- For tool-calling agents, validate that tool arguments match expected schemas before execution. An LLM-as-judge approach for tool call safety is expensive but effective for high-stakes actions.

Output-side (catching when injection succeeds):

This is the part most people underinvest in. Even with perfect input filtering, you still need output guardrails:

- Run the LLM output through evaluation metrics that score for factual correctness, instruction adherence, and safety before it reaches the user.

- For RAG systems specifically, verify that the generated answer is actually grounded in the retrieved context, not fabricated or influenced by injected instructions.

The "defense in depth" framing matters here. Input filtering alone has a ceiling because adversarial prompts evolve faster than regex rules. Output evaluation catches the failures that slip through. We use DeepRails' Defend API for this layer - it scores outputs on correctness, completeness, and safety, then auto-remediates failures before they reach end users. But the principle applies regardless of tooling: treat output verification as a first-class concern, not an afterthought.

Simon Willison's work on dual-LLM patterns is also worth reading if you haven't: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

raw_anon_1111•2h ago
Absolutely nothing.

Most of my use cases for using LLMs in production is call centers

https://news.ycombinator.com/item?id=47241412

Where basically it’s: accepting user input -> LLM to figure out which tool to call (user’s intent) -> JSON -> call API with strict security boundaries just like with a web app.

What’s the worse that could happen that couldn’t happen with a web app if I have bad security around the underlying API?

I’m sure that if you did successfully break my systems they could get it to say inappropriate things back to them for the lulz, but who cares?

X is selling existing users' handles

158•hac•11h ago•77 comments

Tell HN: Apple development certificate server seems down?

107•strongpigeon•1d ago•39 comments

Ask HN: Is Claude down again?

84•coderbants•17h ago•71 comments

Ask HN: What Are You Working On? (March 2026)

285•david927•3d ago•1098 comments

Ask HN: How are people doing AI evals these days?

30•yelmahallawy•2d ago•33 comments

Ask HN: How do we build a new Human First online community in the LLM age?

7•bluefirebrand•8h ago•7 comments

Ask HN: Remember Fidonet?

119•ukkare•1d ago•67 comments

Ask HN: What are you using to mitigate prompt injection?

5•ramoz•9h ago•2 comments

Ask HN: How to be alone?

680•sillysaurusx•3d ago•556 comments

Ask HN: What on this "List of Unsolved Problems in Physics" Has Your Attention?

3•ghastmaster•6h ago•1 comments

Ask HN: Please restrict new accounts from posting

711•Oras•3d ago•505 comments

Ask HN: Most beautiful personal blog UI you have ever seen?

145•ms7892•3d ago•54 comments

Happy Birthday YC/HN

6•ellis0n•7h ago•4 comments

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

15•albert_e•3d ago•20 comments

Tell HN: I'm 60 years old. Claude Code has re-ignited a passion

1069•shannoncc•5d ago•978 comments

IdeaRank – Startup Analysis Engine

2•TMDev•18h ago•0 comments

Maybe we can keep on coding? pseudo code project

8•EmptyDrum•1d ago•13 comments

Ask HN: How do you review gen-AI created code?

6•captainkrtek•1d ago•10 comments

Ask HN: How are people forecasting AI API costs for agent workflows?

5•Barathkanna•1d ago•20 comments

Why is GPT-5.4 obsessed with Goblins?

16•pants2•2d ago•11 comments

Ask HN: Is GitHub getting less reliable, or is it just me?

13•_pdp_•2d ago•9 comments

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

6•crummy•1d ago•3 comments

Ask HN: Is Starlink still being jammed in Iran?

3•Jblx2•1d ago•1 comments

Ask HN: How to "make it" as a newlygrad/junior?

4•kartoffelsaft•1d ago•11 comments

Ask HN: Finding a purpose after tech layoffs

8•fud101•16h ago•8 comments

Ask HN: Does automatic multilingual support make sense for a launch platform?

2•LeanVibe•1d ago•3 comments

Unlocked SaaS, file source as truth?

2•abmmgb•1d ago•1 comments

The Architecture of an Exit Scam: A Technical Audit of Zszrun

5•cappyfjao•1d ago•0 comments

Ask HN: Since a week HN keeps logging me off every few days, why?

5•epolanski•1d ago•2 comments

Ask HN: What AI content automation stack are you using in 2026?

3•jackcofounder•1d ago•3 comments