frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
1•andreabat•2m ago•0 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
1•mgh2•8m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•10m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•15m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•17m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•17m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•20m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•21m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
1•birdculture•23m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•24m ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
1•ramenbytes•27m ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•28m ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•31m ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•32m ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
3•cinusek•32m ago•0 comments

Starter Template for Ory Kratos

https://github.com/Samuelk0nrad/docker-ory
1•samuel_0xK•34m ago•0 comments

LLMs are powerful, but enterprises are deterministic by nature

2•prateekdalal•37m ago•0 comments

Make your iPad 3 a touchscreen for your computer

https://github.com/lemonjesus/ipad-touch-screen
2•0y•42m ago•1 comments

Internationalization and Localization in the Age of Agents

https://myblog.ru/internationalization-and-localization-in-the-age-of-agents
1•xenator•43m ago•0 comments

Building a Custom Clawdbot Workflow to Automate Website Creation

https://seedance2api.org/
1•pekingzcc•45m ago•1 comments

Why the "Taiwan Dome" won't survive a Chinese attack

https://www.lowyinstitute.org/the-interpreter/why-taiwan-dome-won-t-survive-chinese-attack
2•ryan_j_naughton•46m ago•0 comments

Xkcd: Game AIs

https://xkcd.com/1002/
1•ravenical•47m ago•0 comments

Windows 11 is finally killing off legacy printer drivers in 2026

https://www.windowscentral.com/microsoft/windows-11/windows-11-finally-pulls-the-plug-on-legacy-p...
1•ValdikSS•48m ago•0 comments

From Offloading to Engagement (Study on Generative AI)

https://www.mdpi.com/2306-5729/10/11/172
1•boshomi•50m ago•1 comments

AI for People

https://justsitandgrin.im/posts/ai-for-people/
1•dive•51m ago•0 comments

Rome is studded with cannon balls (2022)

https://essenceofrome.com/rome-is-studded-with-cannon-balls
1•thomassmith65•56m ago•0 comments

8-piece tablebase development on Lichess (op1 partial)

https://lichess.org/@/Lichess/blog/op1-partial-8-piece-tablebase-available/1ptPBDpC
2•somethingp•57m ago•0 comments

US to bankroll far-right think tanks in Europe against digital laws

https://www.brusselstimes.com/1957195/us-to-fund-far-right-forces-in-europe-tbtb
4•saubeidl•58m ago•0 comments

Ask HN: Have AI companies replaced their own SaaS usage with agents?

1•tuxpenguine•1h ago•0 comments

pi-nes

https://twitter.com/thomasmustier/status/2018362041506132205
1•tosh•1h ago•0 comments
Open in hackernews

VaultGemma: The most capable differentially private LLM

https://research.google/blog/vaultgemma-the-worlds-most-capable-differentially-private-llm/
125•meetpateltech•4mo ago

Comments

ForHackernews•4mo ago
Can someone explain what this actually means? I assume this still runs on Google's cloud so it's not 'private' in any meaningful sense.
stephantul•4mo ago
It does not run on Google’s cloud. You can download the model and host it yourself, locally or using a provider you trust.
ForHackernews•4mo ago
That's actually great. I didn't realize Google had any models that could be self-hosted.
pkaye•4mo ago
The Gemma models are available for self hosting. I've used these one on the ollama website myself.

https://ollama.com/library/gemma3

porridgeraisin•4mo ago
Differentially private means that:

training_algorithm(training data with a row that has "ForHackernews blood test report...") hard to distinguish from training_algorithm(training data without that) upto a factor of epsilon. They have explained further in the article itself with concrete values for epsilon.

drdaeman•4mo ago
I got that from the article, but I'm not getting what does it means in practice? What's the use case?
porridgeraisin•4mo ago
It is very difficult for someone to coax the model into regurgitating a sequence from the training data. So as you can imagine, the first usecase is going to be google training on your gmail inbox without me being able to prompt your emails out of it.

User-level DP on the other hand, which the article alludes to near the end, would mean that it's very difficult to make the model regurgitate a particular user's data.

Since this is a theoretical guarantee, you can do whatever prompt engineering you like, it will be really difficult all the same.

How difficult it is depends on a bunch of quantitative factors. Mostly, the value of epsilon.

You might think this would be useful for copyright protection as well, but there is a subtle difference. It's been a while and I'm hazy on the details, so I'll refer you to the Near Access Freeness paper which discusses it in detail and proposes another framework for that.

Workaccount2•4mo ago
If I am understanding this correctly, this is pretty damn cool. I got 15 minutes of research on it, but no better way to get corrected than be wrong on the internet.

Essentially it seems that they can statistical magic "fuzz" the training set in such a way that it becomes very difficult for the model to leak information from the training set, while still providing the same output whether or not that exact info was in the training set. So I suppose the goal would be something like the ability to train on medical data, while making it so the model won't be able to complete the prompt "Workaccount 2 has a serious medical condition called ______" and would give the same response regardless of whether or not I was present in the database.

porridgeraisin•4mo ago
Yes.

prob(training_process(data)(Work account 2 has a serious medical condition called) = anaemia) <= e^epsilon * prob(training_process(data without that piece of information)(Work account 2 has a serious medical condition called) = anaemia)) + delta

Here epsilon = 2, and delta is small. Basically, there is a theoretical guarantee that if it had trained on that sentence, it would be no more than 7x as likely to output that in response to any prompt, compared to when it hadn't trained on that sentence at all. Sentence here is defined to be 1024 tokens long[1].

You might think 7x is not that big of a deal, but note that this is a theoretical guarantee( and with some mathematics it's possible to get an even tighter bound(see: Renyi DP)). In practice, actually getting private data out of a DP-trained model is difficult even for epsilon=8 (corresponds to 2000x likely!).

Edit: [1] this can be problematic, if a piece of information greater than 1024 tokens long gets split into two sentences, then there is no theoretical guarantee across sequences. However this is an implementation detail of this model, I've yet to see the effect of increasing this number to a more reasonable value.

freedomben•4mo ago
Thanks, that's quite exciting, because personally the thing I'm most excited about AI is the medical and scientific research capabilities. Exciting times!
diggan•4mo ago
The actual weights: https://huggingface.co/google/vaultgemma-1b

> VaultGemma is a variant of the Gemma family of lightweight, state-of-the-art open models from Google. It is pre-trained from the ground up using Differential Privacy (DP). This provides strong, mathematically-backed privacy guarantees for its training data, limiting the extent to which the model's outputs can reveal information about any single training example.

> VaultGemma was trained using Tensor Processing Unit (TPU) hardware TPUv6e. Training large language models with the significant computational overhead of differential privacy requires specialized hardware. TPUs are designed to handle the massive computations involved, offering the performance, memory, and scalability necessary to train models like VaultGemma efficiently and sustainably.

Seems like it requires TPUs to run, as DP has a huge performance impact, so we're unlikely to see this in homelabs and similar environments, as far as I understand.

Edit: On second read, the TPUs were only used for training, but no description if anything specific for the hardware is needed, so assuming it's fine with a regular GPU?

Mond_•4mo ago
So far Gemma models were capable of running on ordinary GPUs or CPUs, and I think it's safe to assume that this trend is continuing here.
HenryMulligan•4mo ago
Ignoring what this model architecture could do and just considering what this model does do, why would I (or anyone) want to run this model (locally) to do <insert use-case>? Is it entirely a proof-of-concept for future training on medical data? Are they looking to use this to attempt to ethically justify training on (free-tier) user's personal data via the application of noise to the training data?
floridianfisher•4mo ago
The purpose is research
porridgeraisin•4mo ago
It's the last option.

The whole framing of DP is:

Probability that you reveal private info is same whether or not you train on a particular users data.

It is useful in many cases, but google the product company specifically is going to use it for ads.

malfist•4mo ago
You can hide that you pirated content for training
astrange•4mo ago
You can't hide that. You can't use technical measures to hide from discovery.

I think an entire book is a little too large to mask with this method and still end up learning anything.

faangguyindia•4mo ago
U can avoid book publisher lawsuit which Anthropic is dealing with using this approach
adt•4mo ago
https://lifearchitect.ai/models-table/
woah•4mo ago
This could be very good for scaling data while avoiding copyright claims since the copyright argument is a lot weaker (at least to the layman) if no memorization is happening. It even may open the door to Snow Crash like distributed training where people feed the model continuous streams of data of their computer use or even daily lives without worrying about PII leakage
Ossi61•4mo ago
Yes
Testor007•4mo ago
Will it leak data if you fine tune with DP logic ?