frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

OpenAI Privacy Filter

https://openai.com/index/introducing-openai-privacy-filter/
114•tanelpoder•3d ago

Comments

hiAndrewQuinn•3d ago
I'm surprised nobody else has commented on this. This is a very straightforward and useful thing for a small locally runnable model to do.
ashwindharne•2d ago
Same here, this is an incredibly useful thing to have in the toolkit
apothegm•2d ago
And also something that it’s dangerous to try to do stochastically.
hiAndrewQuinn•2d ago
It's going to be stochastic in some sense whether you want it to be or not, human error never reaches zero percent. I would bet you a penny you'd get better results doing one two-second automated pass + your usual PII redaction than your PII redaction alone.
cyanydeez•2d ago
I think the problem is most secrets arn't stochastic; they're determinant. When the user types in the wrong password, it should be blocked. Using a probabilistic model suggests an attacker only now needs to be really close, but not correct.

Sure, there's some math that says being really close and exact arn't a big deal; but then you're also saying your secrets don't need to be exact when decoding them and they absolutely do atm.

Sure looks like a weird privacy veil that sorta might work for some things, like frosted glass, but think of a toilet stall with all frosted glass, are you still comfortable going to the bathroom in there?

CityOfThrowaway•13m ago
I dunno what use case you're thinking this is for.

The use case for this is that many enterprise customers want SaaS products to strip PII from ingested content, and there's no non-model way to do it.

Think, ingesting call transcripts where those calls may include credit card numbers or private data. The call transcripts are very useful for various things, but for obvious reasons we don't want to ingest the PII.

moralestapia•2d ago
The alternative being?
7777777phil•2d ago
> The model is available today under the Apache 2.0 license on Hugging Face (opens in a new window) and Github (opens in a new window).

Bringing back the Open to OpenAI..

stratos123•2d ago
There's some interesting technical details in this release:

> Privacy Filter is a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint and is then adapted into a token classifier over a fixed taxonomy of privacy labels. Instead of generating text token by token, it labels an input sequence in one pass and then decodes coherent spans with a constrained Viterbi procedure.

> The released model has 1.5B total parameters with 50M active parameters.

> [To build it] we converted a pretrained language model into a bidirectional token classifier by replacing the language modeling head with a token-classification head and post-training it with a supervised classification objective.

LatencyKills•55m ago
Couldn't this be used to locate private data in unstructured text without having to rely on other means of PII detection?

1. Pass the raw text through the filter to obtain the spans.

2. Map all the spans back to the original text.

Now you have all the PII information.

Everdred2dx•13m ago
Yep, and already has been done.

https://github.com/chiefautism/privacy-parser

Havoc•2d ago
50M effective parameters is impressively light. Is there a similarly light model on the prompt injection side? Most of the mainstream ones seem heavier
ndom91•2d ago
Where's the gguf from Unsloth and co?
mplanchard•2d ago
It would be nice if their examples weren’t mostly things that are easy to catch with regex, but it’s cool to see if released as an open, local model.
JLO64•58m ago
For my customers I use regexes to block them from potentially publishing personal emails/phone numbers to their websites but I really wouldn't mind running this in addition just for the extra peace of mind. I don't have a GPU on our server, but I hope this is light enough of a model to handle CPU only inference on less than 2k tokens at a time.
aubinkure•2d ago
Exciting! I took a look through the code and found what appear to be the entity types for future releases - this release (V2 config) supports 8 entity types, but the V4 and V7 taxonomies have >20, mostly more personal ID types. Given this is a preview release, I imagine they'll release these.

Details in my review article here: https://piieraser.ai/blog/openai-privacy-filter. Disclaimer: I also build PII detection systems.

mentalgear•1d ago
SuperagentLM made available on-edge PPI redaction models already a few years ago in sizes 20B, 3B, 200M. They still seem to be available via their legacy API - well worth checking out to compare against this one. https://docs.superagent.sh/legacy/llms/superagent-lm-redact-...
freakynit•5m ago
Can someone explaon how can I reconstruct the original entities back if there are, for example, more than one person names?

Why has there been so little progress on Alzheimer's disease?

https://freakonomics.com/podcast/why-has-there-been-so-little-progress-on-alzheimers-disease/
91•chiefalchemist•2h ago•40 comments

USB Cheat Sheet (2022)

https://fabiensanglard.net/usbcheat/index.html
177•gwerbret•4h ago•46 comments

Amateur armed with ChatGPT solves an Erdős problem

https://www.scientificamerican.com/article/amateur-armed-with-chatgpt-vibe-maths-a-60-year-old-pr...
33•pr337h4m•9h ago•3 comments

The Free Universal Construction Kit

https://fffff.at/free-universal-construction-kit/
280•robinhouston•3d ago•55 comments

OpenAI Privacy Filter

https://openai.com/index/introducing-openai-privacy-filter/
114•tanelpoder•3d ago•19 comments

Flickr: The first and last great photo platform

https://petapixel.com/2026/04/22/flickr-the-first-and-last-great-photo-platform/
51•Nrbelex•3d ago•16 comments

1-Bit Hokusai's "The Great Wave" (2023)

https://www.hypertalking.com/2023/05/08/1-bit-pixel-art-of-hokusais-the-great-wave-off-kanagawa/
528•stephen-hill•3d ago•88 comments

Using coding assistance tools to revive projects you never were going to finish

https://blog.matthewbrunelle.com/its-ok-to-use-coding-assistance-tools-to-revive-the-projects-you...
193•speckx•10h ago•117 comments

America's Geothermal Breakthrough

https://oilprice.com/Alternative-Energy/Geothermal-Energy/Americas-Geothermal-Breakthrough-Could-...
74•sleepyguy•7h ago•85 comments

The Joy of Folding Bikes

https://blog.korny.info/2026/04/19/the-joy-of-folding-bikes
89•pavel_lishin•3d ago•57 comments

Math Is Hard – OpenBSD Stories

http://miod.online.fr/software/openbsd/stories/vaxfp.html
40•signa11•2d ago•0 comments

Optimizing Datalog for the GPU

https://dl.acm.org/doi/10.1145/3669940.3707274
21•tosh•2d ago•1 comments

The Long Reply

https://ironicsans.ghost.io/the-long-reply/
12•NaOH•2d ago•0 comments

New 10 GbE USB adapters are cooler, smaller, cheaper

https://www.jeffgeerling.com/blog/2026/new-10-gbe-usb-adapters-cooler-smaller-cheaper/
549•calcifer•20h ago•322 comments

Simulacrum of Knowledge Work

https://blog.happyfellow.dev/simulacrum-of-knowledge-work/
102•thehappyfellow•9h ago•37 comments

How Hard Is It to Open a File?

https://blog.sebastianwick.net/posts/how-hard-is-it-to-open-a-file/
66•ffin•2d ago•10 comments

Mine, an IDE for Coalton and Common Lisp

https://coalton-lang.github.io/mine/
75•varjag•8h ago•26 comments

What async promised and what it delivered

https://causality.blog/essays/what-async-promised/
168•zdw•3d ago•187 comments

Desmond Morris has died

https://www.bbc.com/news/articles/c51y797v200o
109•martey•5d ago•19 comments

Martin Galway's music source files from 1980's Commodore 64 games

https://github.com/MartinGalway/C64_music
162•ingve•15h ago•24 comments

Her Life Savings Mysteriously Disappeared After a Systems Glitch

https://www.nytimes.com/2026/04/25/your-money/fidelity-investments-fraud-alert.html
43•danso•3h ago•34 comments

Show HN: Kloak, A secret manager that keeps K8s workload away from secrets

https://getkloak.io/
45•neo2006•7h ago•36 comments

Discret 11, the French TV encryption of the 80s

https://fabiensanglard.net/discret11/
149•adunk•15h ago•27 comments

Lute: A Standalone Runtime for Luau

https://lute.luau.org/
63•vrn-sn•3d ago•11 comments

GPT‑5.5 Bio Bug Bounty

https://openai.com/index/gpt-5-5-bio-bug-bounty/
133•Murfalo•12h ago•98 comments

Colorado Adds Open-Source Exemption to Age-Verification Bill

https://fosstodon.org/@carlrichell/116460505717380644
55•terminalbraid•4h ago•9 comments

Can you stop beans from making you gassy?

https://www.seriouseats.com/how-to-reduce-bean-gas-tested-11883862
112•jstrieb•6h ago•84 comments

Which one is more important: more parameters or more computation? (2021)

https://parl.ai/projects/params_vs_compute/
51•jxmorris12•1d ago•9 comments

A web-based RDP client built with Go WebAssembly and grdp

https://github.com/nakagami/grdpwasm
114•mariuz•15h ago•44 comments

Insights into firewood use by early Middle Pleistocene hominins

https://www.sciencedirect.com/science/article/pii/S0277379126001824
50•wslh•3d ago•20 comments