frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
289•theblazehen•2d ago•97 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
21•alainrk•1h ago•11 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
35•AlexeyBrin•1h ago•5 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
15•onurkanbkrc•1h ago•1 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
717•klaussilveira•16h ago•218 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
978•xnx•21h ago•562 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
94•jesperordrup•6h ago•35 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
4•nar001•35m ago•2 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
138•matheusalmeida•2d ago•36 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
74•videotopia•4d ago•11 comments

Ga68, a GNU Algol 68 Compiler

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
16•matt_d•3d ago•4 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
46•helloplanets•4d ago•46 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
242•isitcontent•16h ago•27 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
242•dmpetrov•16h ago•128 comments

Cross-Region MSK Replication: K2K vs. MirrorMaker2

https://medium.com/lensesio/cross-region-msk-replication-a-comprehensive-performance-comparison-o...
4•andmarios•4d ago•1 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
344•vecti•18h ago•153 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
510•todsacerdoti•1d ago•248 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
393•ostacke•22h ago•101 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
309•eljojo•19h ago•192 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•187 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
437•lstoll•22h ago•286 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
33•1vuio0pswjnm7•2h ago•31 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
73•kmm•5d ago•11 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
26•bikenaga•3d ago•13 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
98•quibono•4d ago•22 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
278•i5heu•19h ago•227 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
43•gmays•11h ago•15 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1088•cdrnsf•1d ago•469 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
312•surprisetalk•3d ago•45 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
36•romes•4d ago•3 comments
Open in hackernews

Show HN: Local Privacy Firewall-blocks PII and secrets before ChatGPT sees them

https://github.com/privacyshield-ai/privacy-firewall
111•arnabkarsarkar•1mo ago
OP here.

I built this because I recently caught myself almost pasting a block of logs containing AWS keys into Claude.

The Problem: I need the reasoning capabilities of cloud models (GPT/Claude/Gemini), but I can't trust myself not to accidentally leak PII or secrets.

The Solution: A Chrome extension that acts as a local middleware. It intercepts the prompt and runs a local BERT model (via a Python FastAPI backend) to scrub names, emails, and keys before the request leaves the browser.

A few notes up front (to set expectations clearly):

Everything runs 100% locally. Regex detection happens in the extension itself. Advanced detection (NER) uses a small transformer model running on localhost via FastAPI.

No data is ever sent to a server. You can verify this in the code + DevTools network panel.

This is an early prototype. There will be rough edges. I’m looking for feedback on UX, detection quality, and whether the local-agent approach makes sense.

Tech Stack: Manifest V3 Chrome Extension Python FastAPI (Localhost) HuggingFace dslim/bert-base-NER Roadmap / Request for Feedback: Right now, the Python backend adds some friction. I received feedback on Reddit yesterday suggesting I port the inference to transformer.js to run entirely in-browser via WASM.

I decided to ship v1 with the Python backend for stability, but I'm actively looking into the ONNX/WASM route for v2 to remove the local server dependency. If anyone has experience running NER models via transformer.js in a Service Worker, I’d love to hear about the performance vs native Python.

Repo is MIT licensed.

Very open to ideas suggestions or alternative approaches.

Comments

itopaloglu83•1mo ago
It wasn’t very clear in the video, does it trigger on paste event or when the page is activated?

There are a lot of websites that scans the clipboard to improve user experience, but also pose a great risk to users privacy.

cjonas•1mo ago
Curious about how much latency this adds (per input token)? Obviously depends on your computer, but it's it ~10s or ~1s?

Also, how does this deal with inquiries when piece of PII is important to the task itself? I assume you just have to turn it off?

willwade•1mo ago
can i have this between my machine and git please.. Like its twice now I've commmited .env* and totally passed me by (usually because its to a private repo..) then later on we/someone clears down the files.. and forgets to rewrite git history before pushing live.. it should never have got there in the first place.. (I wish github did a scan before making a repo public..)
acheong08•1mo ago
GitHub does warn you when you have API keys in your repo. Alternatively, there are CLI tools such as TruffleHog you can put in pre-commit hooks to run before commits automatically
cwinq•1mo ago
You can try GitGuardian, it is very powerful and free for individual developers and small teams. It has precommit hooks, detection in IDE and all.
hombre_fatal•1mo ago
At least you can put .env in the global gitignore. I haven’t committed DS_Store in 15 years because of it - its secrets will die with me.
willwade•1mo ago
sorry.. global gitignore.. what have i been doing..
mh-•1mo ago
You can use git hooks. Pre-commit specifically.

https://git-scm.com/docs/githooks

ComputerGuru•1mo ago
Already mentioned it in another reply, but .env and passing secrets as environment variables are a tragedy. Take a look at how SecureStore stores secrets encrypted at rest, and you’re even advised to commit them to git!

https://github.com/neosmart/securestore-rs

willwade•1mo ago
I wonder if this would have been useful https://github.com/microsoft/presidio - its heavy but looks really good. There is a lite version..
threecheese•1mo ago
Looks like it uses Googles Langextract, which uses only LLMs for NLP, while OP is using a small NER model that runs locally.
shaoz•1mo ago
I've used it, lots of false positives out of the box, you need to do a ton of tuning or put a transformer/BERT model with it, but then at that point it's basically the same thing as the OP's project.
postalcoder•1mo ago
Very neat, but recently I've tried my best to reduce my extension usage across all apps (browsers/ide).

I do something similar locally by manually specifying all the things I want scrubbed/replaced and having keyboard maestro run a script on my system keyboard whenever doing a paste operation that's mapped to `hyperkey + v`. The plus side of this is that the paste is instant. The latency introduced by even the littlest of inference is enough friction to make you want to ditch the process entirely.

Another plus of the non-extension solution is that it's application agnostic.

informal007•1mo ago
Smart idea! Thanks for sharing.

If we move the detection and modification process from paste to copy operation, that will reduce in-use latency

postalcoder•1mo ago
That's a great idea. My original excuse to not do that was because I copy so many things but, duh, I could just key the sanitizing copy to `hyperkey + c`.
fmkamchatka•1mo ago
Could this run at the network level (like TripMode)? So it would catch usage from web based apps but also the ChatGPT app, Codex CLI etc?
p_ing•1mo ago
Deploy a TLS interceptor (forward proxy). There are many out there, both free and paid for solutions; there are also agent-based endpoint solutions like Netskope which do this so you don't have to route traffic through an internal device.
robertinom•1mo ago
That would be a great way to get some revenue from "enterprise" customers!
dwa3592•1mo ago
Neat - I built something similar - https://github.com/deepanwadhwa/zink?tab=readme-ov-file#3-sh...
sailfast•1mo ago
How do you prevent these models from reading secrets in your repos locally?

It’s one thing for the ENVs to be user pasted but typically you’re also giving the bots access to your file system to interrogate and understand them right? Does this also block that access for ENVs by detecting them and doing granular permissions?

woodrowbarlow•1mo ago
by putting secrets in your environment instead of in your files, and running AI tools in a dedicated environment that has its own set of limited and revocable secrets.
sailfast•1mo ago
Yes - separate secrets always - but you've still got local or dev secrets. Seems like the above permissions are the right way to go in the end. Thanks.
SparkyMcUnicorn•1mo ago
I configure permission settings within projects.

https://code.claude.com/docs/en/settings#permission-settings

sailfast•1mo ago
Ah yes - this is the way. Thanks.
woodrowbarlow•1mo ago
this prevents claude from directly reading certain files, but doesn't prevent claude from running a command that dumps the file on stdout and then reading stdout... claude will just try to "cat" the file if it decides it wants to see it.
sailfast•1mo ago
Yeah - that’s kinda what I was thinking. Unless you’re doing quite granular approvals it gets tricky.
jedisct1•1mo ago
LLMs don't need your secret tokens (but MCP servers hand them over anyway): https://00f.net/2025/06/16/leaky-mcp-servers/

Encrypting sensitive data can be more useful than blocking entire requests, as LLMs can reason about that data even without seeing it in plain text.

The ipcrypt-pfx and uricrypt prefix-preserving schemes have been designed for that purpose.

greenbeans12•1mo ago
This is pretty cool. I barely use the web UIs for LLMs anymore. Any way you could make a wrapper for Claude Code/Cursor/Gemini CLI? Ideally it works like github push protection in GH advanced security.
sciencesama•1mo ago
Develop a pihole style adblock
accrual•1mo ago
I feel it's not really applicable here. Pihole has the advantage of funneling all DNS traffic (typically UDP/53) to a single endpoint and making decisions about the request.

A user using an LLM is probably talking directly to the service inside a TLS connection (TCP/443) so there's not a lot of room to inspect the prompt at the same layer a Pihole might (unless you MITM yourself).

I think OP has the right idea to approach this from the application layer in the browser where the contents of the page are available. But to me it feels like a stopgap, something that fixes a specific scenario (copy/pasted private data into a web browser form), and not a proper service-level solution some have proposed (swap PII at the endpoint, or have a client that pre-filters).

throwaway613745•1mo ago
Maybe you should fix your logging to not output secrets in plaintext? Every single modern logging utility has this ability.
lurking_swe•1mo ago
so what happens if you are running an agent locally and it helpfully tries to write a script that prints the environment variables, for debugging purposes?
throwaway613745•1mo ago
You run your agent in a container and you only give it access to agent-specific secrets that can be rotated easily.
ttul•1mo ago
This should be a native feature of the native chat apps for all major LLM providers. There’s no reason why PII can’t be masked from the API endpoint and then replaced again when the LLM responds. “Mary Smith” becomes “Samantha Robertson” and then back to “Mary Smith” on responses from the LLM. A small local model (such as the BERT model in this project) detects the PII.

Something like this would greatly increase end user confidence. PII in the input could be highlighted so the user knows what is being hidden from the LLM.

mentalgear•1mo ago
Neat!

There's also:

- https://github.com/superagent-ai/superagent

- https://github.com/superagent-ai/vibekit

NJL3000•1mo ago
This is a great idea of using a BERT model for DLP at the door. Have you thought integrating this into semantic router as an option leaving the look-ahead ? Maybe a smaller code base ?
gnarlouse•1mo ago
I'd like to see this as a Windsurf plugin.
idiotsecant•1mo ago
This is a concept that I firmly believe will be a fundamental feature of the medium-term future. Personal memetic firewalls.

As AI gets better and cheaper there will absolutely be influence campaigns conducted at the individual level for every possible thing anyone with money might want, and those campaigns will be so precisely targeted and calibrated by autonomous influencer AI that know so much about you that they will convince you to do the thing they want, whether by emotional manipulation, subtle blackmail, whatever.

It will also be extraordinarily easy to emit subliminal or unconscious signals that will encode a great deal more of our internal state than we want them to.

It will be necessary to have a 'memetic firewall' that reduces our unintentional outgoing informational cross section, while also preventing contamination by the torrent of ideas trying to worm their way into our heads. This firewall would also need to be autonomous, but by exploiting the inherent information asymmetry (your firewall would know you very well) it need not be as powerful as the AI that are trying to exploit you.

upghost•1mo ago
Ok what I would really love is something like this but for the damn terminal. No, I don't store credentials in plaintext, but when they get pulled into memory after being decrypted you really gotta watch $TERMINAL_AGENT or it WILL read your creds eventually and it's ever so much fun explaining why you need to rotate a key.

Sure go ahead and roast me but please include full proof method you use to make sure that never happens that still allows you to use credentials for developing applications in the normal way.

ComputerGuru•1mo ago
If you store passwords encrypted at rest à la my SecureStore, this isn’t an issue.

https://github.com/neosmart/securestore-rs

smaughk•1mo ago
I've had a similar thoughts! I just put together a document with four sections (original, sanitized, output, unsantized) and built a little command-line tool to automatically filter and copy content between them. For now, my tool uses simple regex and specific keywords, but I really like the approach you're taking!! This is definitely an interesting problem that needs a good solution. I'm excited to see your WASM implementation!
password-app•1mo ago
This is a great approach. We took a similar philosophy building password automation - the AI agent never sees actual passwords.

Credentials are injected through a separate secure channel while the agent only sees placeholders like "[PASSWORD]". The AI handles navigation and form detection, but sensitive data flows through an isolated path.

For anyone building AI tools that touch PII: separating the "thinking" layer from the "data" layer is essential. Your LLM should never need to see the actual sensitive values to do its job.