Ask HN: What are you using to mitigate prompt injection?

5•ramoz•9h ago

If anything at all.

Comments

oliver_dr•9h ago

We've been dealing with this at multiple layers. Here's what actually works in production:

Input-side (preventing injection):

- Strict input sanitization with role-boundary enforcement in the system prompt. Sounds basic, but most people skip it.

- Separate "user content" from "system instructions" at the API level. Don't concatenate untrusted input into your system prompt. Use the dedicated `user` role in the messages array.

- For tool-calling agents, validate that tool arguments match expected schemas before execution. An LLM-as-judge approach for tool call safety is expensive but effective for high-stakes actions.

Output-side (catching when injection succeeds):

This is the part most people underinvest in. Even with perfect input filtering, you still need output guardrails:

- Run the LLM output through evaluation metrics that score for factual correctness, instruction adherence, and safety before it reaches the user.

- For RAG systems specifically, verify that the generated answer is actually grounded in the retrieved context, not fabricated or influenced by injected instructions.

The "defense in depth" framing matters here. Input filtering alone has a ceiling because adversarial prompts evolve faster than regex rules. Output evaluation catches the failures that slip through. We use DeepRails' Defend API for this layer - it scores outputs on correctness, completeness, and safety, then auto-remediates failures before they reach end users. But the principle applies regardless of tooling: treat output verification as a first-class concern, not an afterthought.

Simon Willison's work on dual-LLM patterns is also worth reading if you haven't: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

raw_anon_1111•2h ago

Absolutely nothing.

Most of my use cases for using LLMs in production is call centers

https://news.ycombinator.com/item?id=47241412

Where basically it’s: accepting user input -> LLM to figure out which tool to call (user’s intent) -> JSON -> call API with strict security boundaries just like with a web app.

What’s the worse that could happen that couldn’t happen with a web app if I have bad security around the underlying API?

I’m sure that if you did successfully break my systems they could get it to say inappropriate things back to them for the lulz, but who cares?

X is selling existing users' handles

Tell HN: Apple development certificate server seems down?

Ask HN: Is Claude down again?

Ask HN: What Are You Working On? (March 2026)

Ask HN: How are people doing AI evals these days?

Ask HN: How do we build a new Human First online community in the LLM age?

Ask HN: Remember Fidonet?

Ask HN: What are you using to mitigate prompt injection?

Ask HN: How to be alone?

Ask HN: What on this "List of Unsolved Problems in Physics" Has Your Attention?

Ask HN: Please restrict new accounts from posting

Ask HN: Most beautiful personal blog UI you have ever seen?

Happy Birthday YC/HN

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

Tell HN: I'm 60 years old. Claude Code has re-ignited a passion

IdeaRank – Startup Analysis Engine

Maybe we can keep on coding? pseudo code project

Ask HN: How do you review gen-AI created code?

Ask HN: How are people forecasting AI API costs for agent workflows?

Why is GPT-5.4 obsessed with Goblins?

Ask HN: Is GitHub getting less reliable, or is it just me?

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

Ask HN: Is Starlink still being jammed in Iran?

Ask HN: How to "make it" as a newlygrad/junior?

Ask HN: Finding a purpose after tech layoffs

Ask HN: Does automatic multilingual support make sense for a launch platform?

Unlocked SaaS, file source as truth?

The Architecture of an Exit Scam: A Technical Audit of Zszrun

Ask HN: Since a week HN keeps logging me off every few days, why?

Ask HN: What AI content automation stack are you using in 2026?

Ask HN: What are you using to mitigate prompt injection?

Comments

X is selling existing users' handles

Tell HN: Apple development certificate server seems down?

Ask HN: Is Claude down again?

Ask HN: What Are You Working On? (March 2026)

Ask HN: How are people doing AI evals these days?

Ask HN: How do we build a new Human First online community in the LLM age?

Ask HN: Remember Fidonet?

Ask HN: What are you using to mitigate prompt injection?

Ask HN: How to be alone?

Ask HN: What on this "List of Unsolved Problems in Physics" Has Your Attention?

Ask HN: Please restrict new accounts from posting

Ask HN: Most beautiful personal blog UI you have ever seen?

Happy Birthday YC/HN

Ask HN: Can I repurpose a Bluetooth voice remote as input device for a PC?

Tell HN: I'm 60 years old. Claude Code has re-ignited a passion

IdeaRank – Startup Analysis Engine

Maybe we can keep on coding? pseudo code project

Ask HN: How do you review gen-AI created code?

Ask HN: How are people forecasting AI API costs for agent workflows?

Why is GPT-5.4 obsessed with Goblins?

Ask HN: Is GitHub getting less reliable, or is it just me?

Tell HN: Vertical tabs has arrived (behind a flag) in Chrome stable

Ask HN: Is Starlink still being jammed in Iran?

Ask HN: How to "make it" as a newlygrad/junior?

Ask HN: Finding a purpose after tech layoffs

Ask HN: Does automatic multilingual support make sense for a launch platform?

Unlocked SaaS, file source as truth?

The Architecture of an Exit Scam: A Technical Audit of Zszrun

Ask HN: Since a week HN keeps logging me off every few days, why?

Ask HN: What AI content automation stack are you using in 2026?