frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Wolf Defender, a open-weight prompt-injection detection model

https://huggingface.co/patronus-studio/wolf-defender-prompt-injection
2•patronusprotect•2h ago
Hi HN,

We’ve been working on Patronus Protect, an on-device security layer for AI systems that aims to detect prompt injections and prevent sensitive data from leaving the device.

As part of that work we trained a prompt-injection detection model and decided to release a smaller version of it publicly.

Wolf Defender is a lightweight BERT-style model trained on roughly 5% of our full internal dataset. Despite the reduced training set it already performs competitively with several existing open-source prompt-injection detectors.

One issue we observed with many detectors is that they overfit to obvious trigger phrases like “Ignore previous instructions”. Many real attacks avoid these patterns through obfuscation.

To address this, the training data includes heavy augmentation designed to cover different prompt-injection styles, including:

- unicode and homoglyph perturbations - encoded payloads (e.g. base64) - HTML and code comment injections - structural wrappers like “User:” or “System:” - spacing and casing perturbations

The idea is to train the model to recognize structural characteristics of prompt-injection attacks rather than memorizing specific prompts.

Internally we use a larger version of this model as part of Patronus Protect. Wolf Defender is trained on a much smaller subset of the data and released to make prompt-injection research more accessible.

Curious to hear feedback from people working on LLM security.

We have more privacy controls yet less privacy

https://www.bbc.com/news/articles/c4gj39zk1k0o
1•1vuio0pswjnm7•52s ago•0 comments

MacBook Neo: Commenting from Privilege?

https://twitter.com/mufasaYC/status/2030908794180633010
1•tosh•1m ago•0 comments

Zuckerberg is done with Alexandr Wang

https://old.reddit.com/r/ArtificialInteligence/comments/1rl65kj/mark_zuckerberg_is_done_with_the_...
1•Insanity•1m ago•0 comments

Leading Frontier Firm Transformation with Microsoft 365 E7

https://partner.microsoft.com/en-us/blog/article/agent-365-announcement
1•mindracer•2m ago•0 comments

The Cost of Indirection in Rust

https://blog.sebastiansastre.co/posts/cost-of-indirection-in-rust/
1•sebastianconcpt•3m ago•0 comments

Startup Wants to Launch a Space Mirror

https://www.nytimes.com/2026/03/09/climate/space-mirror-satellite-solar.html
1•cyunker•3m ago•0 comments

Ask HN: Is Cloudflare Down Again?

2•pocksuppet•3m ago•0 comments

Show HN: ROLV – 20x faster MoE FFN inference on Llama 4 Maverick vs. cuBLAS

https://rolv.ai
1•heggenhougen•4m ago•1 comments

Show HN: IceCubes – speaker-attributed meeting transcripts without a bot

https://icecubes.app
1•Nandita_Arora•4m ago•0 comments

Approximately 40% of prepaid value is never used

https://www.nber.org/papers/w34918
1•neehao•4m ago•0 comments

Wegovy and Ozempic owner dealt blow as next drug is branded 'obsolete'

https://www.theguardian.com/business/2026/feb/23/wegovy-ozempic-weight-loss-drug-novo-nordisk-cag...
2•PaulHoule•5m ago•0 comments

How I Built Brickonomics: Smart Algorithms to Save Money on Lego

https://thebrickblogger.com/2026/03/how-i-built-brickonomics-smart-algorithms-to-save-money-on-lego/
1•abnercoimbre•5m ago•0 comments

Iran Air and Missile War – Ballistic, Interceptors and Munition Stockpiles [video]

https://www.youtube.com/watch?v=mP_rr859r8w
1•cwillu•6m ago•0 comments

GNU, and the AI Reimplementations

https://antirez.com/news/162
2•antirez•7m ago•0 comments

AI agents now help attackers, including North Korea, manage their drudge work

https://www.theregister.com/2026/03/08/deploy_and_manage_attack_infrastructure/
2•johnshades•8m ago•0 comments

Show HN: Monetize APIs for agentic commerce without accounts using Stripe

https://github.com/stripe402/stripe402
2•whatl3y•9m ago•0 comments

Florida Judge Rules Red Light Camera Tickets Are Unconstitutional

https://cbs12.com/news/local/florida-news-judge-rules-red-light-camera-tickets-unconstitutional
2•1970-01-01•11m ago•0 comments

$100 Oil Now Means Bigger Buybacks with Fewer Jobs and Babies Than Ever Before

https://www.governance.fyi/p/wall-street-killed-the-wildcatters
2•toomuchtodo•11m ago•1 comments

Test Data Management with Greenmask and OpenEverest

https://www.greenmask.io/blog/greenmask-openeverest-automating-safe-production-data
1•woyten•11m ago•0 comments

Where to See Cherry Blossoms in the Bay Area This Spring

https://www.kqed.org/science/2000203/where-to-see-cherry-blossoms-2026-san-francisco-bay-area-map
1•zuhayeer•13m ago•0 comments

Aaron Levie: Building for trillions of agents

https://twitter.com/levie/status/2030714592238956960
1•elsewhen•14m ago•0 comments

Learn about Steam

https://www.spiraxsarco.com/learn-about-steam?sc_lang=en-GB
1•flowingfocus•15m ago•0 comments

Indo-European Explorer: A 6k-Year Journey

https://indo-european-explorer.com/
1•gmays•15m ago•1 comments

AI Assistants Are Moving the Security Goalposts

https://krebsonsecurity.com/2026/03/how-ai-assistants-are-moving-the-security-goalposts/
1•GTP•16m ago•0 comments

Anthropic sues Trump administration after clash over AI use

https://abcnews.com/Business/anthropic-sues-trump-administration-after-clash-ai/story?id=130905672
2•thm•18m ago•1 comments

A Dev's Checklist for MCP Security and Compliance

https://composio.dev/blog/mcp-vulnerabilities-every-developer-should-know
1•alokDT•19m ago•0 comments

Vibe Coding and the Death of Craftsmanship (Personal Essay)

https://www.umangsinha.in/blog/vibe-coding-and-the-death-of-craftsmanship
1•umang-sinha•19m ago•5 comments

Show HN: I built an analytics engine for my OpenClaw usage

https://clawhub.ai/AjmeraParth132/agnost-ai
2•prrthh132•20m ago•0 comments

Reflections on Vibe Coding an iOS App

https://taylor.fausak.me/2026/03/09/vibing/
1•taylorfausak•21m ago•0 comments

A neural signature of adaptive mentalization

https://www.nature.com/articles/s41593-026-02219-x
3•bookofjoe•21m ago•0 comments