frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

GLiNER2-PII: 0.3B open-source PII model outperforms OpenAI's Privacy Filter

https://pioneer.ai/research/gliner2-pii-a-multilingual-model-for-personally-identifiable-information-extraction
2•neon_share1•1h ago

Comments

neon_share1•1h ago
Hi HackerNews,

We’re Ash and George from Fastino Labs, and today we’re releasing GLiNER2-PII, an 0.3B parameter open source encoder model for PII detection.

Removing personal identifiable information (PII) from documentation and data sources continues to be a challenge. Since PII can look different depending on the country, context, and document type, it’s difficult for most models to keep up.

GLiNER2-PII overcomes this with a compact 0.3B parameter encoder architecture that is outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants

In addition to supporting zero-shot extraction of unseen entity types, it was also fine-tuned on 42 fine-grained entity types across seven semantic categories:

- API keys, Passwords and Credentials - Person & Identity - Contact & Location - Government & Tax Identifiers - Banking & Payment - Digital Identity - Sensitive Dates

On the SPY benchmark, GLiNER2-PII achieves the highest span-level F1 (0.471) across legal and medical documents, outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants. Notably, it maintains high recall (0.722 legal / 0.681 medical) while preserving competitive precision.

Training data was generated synthetically using our Pioneer Agent framework, producing multilingual annotated examples across document types, locales, and entity distributions.

GLiNER2-PII is part of the GLiNER family of models for named entity recognition, text classification, and structured extraction: (link to gliner page maybe?)

We are happy to release GLiNER2-PII to the open source community under the Apache 2.0 license.

Model weights are available now on Hugging Face.

Model: https://huggingface.co/fastino/gliner2-privacy-filter-PII-mu... Read the blog: https://pioneer.ai/research/gliner2-pii-a-multilingual-model...

GitHub Copilot's new desktop app

https://github.com/github/app
1•prosim•36s ago•1 comments

Bun's Rust rewrite has been merged

https://old.reddit.com/r/rust/comments/1tcrmjs/rewrite_bun_in_rust_has_been_merged/
1•ale•1m ago•0 comments

The founder's playbook: Building an AI-native startup – Claude

https://claude.com/blog/the-founders-playbook
1•salkahfi•2m ago•0 comments

AI, open code and vulnerability risk in the public sector (UK)

https://www.gov.uk/guidance/ai-open-code-and-vulnerability-risk-in-the-public-sector
1•RobinL•3m ago•0 comments

How the Bird Eye Was Pushed to an Evolutionary Extreme

https://www.quantamagazine.org/how-the-bird-eye-was-pushed-to-an-evolutionary-extreme-20260513/
2•Brajeshwar•3m ago•0 comments

Why Do We Interface?

https://whydoweinterface.com/
2•structuredPizza•4m ago•0 comments

Jane Street Interview Simulator

https://janestreet.gg/
1•Jeanbu•5m ago•0 comments

A Single Infusion Could Suppress HIV for Years

https://www.nytimes.com/2026/05/11/health/hiv-infusion-immunotherapy.html
1•gmays•5m ago•0 comments

Discover Crosspad the best finger drumming web app

https://crosspad.app/
1•Brosper•9m ago•0 comments

Physics Guarantees the Datasphere Keeps Expanding (and What It Means for Agents)

https://twitter.com/i/status/2054961517767061668
1•dataranger•11m ago•0 comments

Show HN: BlitzGraph – Supabase for graphs, designed for LLM agents

https://blitzgraph.com
1•lveillard•11m ago•0 comments

Ambient Intents

https://xcancel.com/timourxyz/status/2054589504934273373
1•yurivish•12m ago•0 comments

Cannabis and driving? Studies reveal big risks

https://news.cuanschutz.edu/news-stories/cannabis-and-driving-studies-reveal-big-risks
1•PaulHoule•12m ago•0 comments

AI models are being used to predict conflict

https://www.economist.com/science-and-technology/2026/05/13/ai-models-are-being-used-to-predict-c...
1•Brajeshwar•13m ago•0 comments

Entire - How We Improved Agentic Search

https://entire.io/blog/improving-agentic-search-in-coding-agents
1•tanishqkanc•14m ago•0 comments

Claude Code cost observability to prevent tokenmaxxing

https://github.com/delta-hq/cc-ledger
1•tsv650•14m ago•1 comments

Which programming language is fastest?

https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html
1•tosh•14m ago•0 comments

Synthetic evaluation datasets for testing AI agents before production deployment

https://paixblox.github.io/learned/
1•cemillxchange•15m ago•0 comments

What's in a GGUF, besides the weights – and what's still missing?

https://nobodywho.ooo/posts/whats-in-a-gguf/
2•bashbjorn•17m ago•0 comments

The coming AI jobs-pocalypse

https://katecarruthers.com/ai-jobs-future/
1•speckx•18m ago•2 comments

Show HN: Pokémon SVG Generation LLM Benchmark

https://svg-bench.fenx.work/
2•haxfenx•18m ago•0 comments

New Nginx Exploit

https://github.com/DepthFirstDisclosures/Nginx-Rift
26•hetsaraiya•21m ago•8 comments

Gemini Android App User Hostile Behavior

1•morpheos137•21m ago•0 comments

Neanderthals Mastered Dentistry

https://nautil.us/how-neanderthals-mastered-dentistry-1280722
1•Brajeshwar•22m ago•1 comments

SED_Model – Observation <-> Theory Machine

https://github.com/nialljmiller/SED_Model
1•nialljmiller•22m ago•0 comments

Catch Flakes on Main

https://matklad.github.io/2026/05/14/catch-flakes-on-main.html
2•surprisetalk•23m ago•0 comments

One engine, many tools – Introducing Rubydex

https://railsatscale.com/2026-05-12-one-engine-many-tools/
1•ksec•26m ago•0 comments

Software Engineers Are Obsolete

https://idiallo.com/blog/everyone-is-better-than-you
1•speckx•26m ago•2 comments

Google says it disrupted an AI-driven effort to exploit a software bug

https://apnews.com/article/google-ai-cybersecurity-exploitation-mythos-926aea7f7dc5e0e61adce3273c...
1•gmays•29m ago•0 comments

I used acoustic physics and Whisper to automate video editing

https://github.com/DeegoFronk/Auto-Vod-Trimmer
1•DeegoFronk•29m ago•0 comments