frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Pgbackrest is no longer being maintained

https://github.com/pgbackrest/pgbackrest
247•c0l0•3h ago•110 comments

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

https://github.com/dirac-run/dirac
96•GodelNumbering•2h ago•30 comments

4TB of voice samples just stolen from 40k AI contractors at Mercor

https://app.oravys.com/blog/mercor-breach-2026
117•Oravys•4h ago•30 comments

Fully Featured Audio DSP Firmware for the Raspberry Pi Pico

https://github.com/WeebLabs/DSPi
149•BoingBoomTschak•2d ago•27 comments

Men Who Stare at Walls

https://www.alexselimov.com/posts/men_who_stare_at_walls/
112•aselimov3•3h ago•56 comments

Flipdiscs

https://flipdisc.io
406•skogstokig•4d ago•68 comments

FDA Approves First-Ever Gene Therapy for Treatment of Genetic Hearing Loss

https://www.fda.gov/news-events/press-announcements/fda-approves-first-ever-gene-therapy-treatmen...
62•JeanKage•4h ago•20 comments

Running Local LLMs Offline on a Ten-Hour Flight

https://deploy.live/blog/running-local-llms-offline-on-a-ten-hour-flight/
9•darccio•1h ago•2 comments

Tendril – a self-extending agent that builds and registers its own tools

https://github.com/serverless-dna/tendril
9•walmsles•1h ago•2 comments

I bought Friendster for $30k – Here's what I'm doing with it

https://ca98am79.medium.com/i-bought-friendster-for-30k-heres-what-i-m-doing-with-it-d5e8ddb3991d
962•ca98am79•18h ago•487 comments

Microsoft to Stop Sharing Revenue with Main AI Partner OpenAI

https://www.bloomberg.com/news/articles/2026-04-27/microsoft-to-stop-sharing-revenue-with-main-ai...
66•helsinkiandrew•1h ago•9 comments

Understanding the short circuit in solid-state batteries

https://www.mpie.de/5151287/short-circuit-solid-state-batteries
12•hhs•1d ago•0 comments

AI should elevate your thinking, not replace it

https://www.koshyjohn.com/blog/ai-should-elevate-your-thinking-not-replace-it/
686•koshyjohn•18h ago•487 comments

Quarkdown – Markdown with Superpowers

https://quarkdown.com/
80•amai•5h ago•12 comments

TurboQuant: A first-principles walkthrough

https://arkaung.github.io/interactive-turboquant/
236•kweezar•12h ago•52 comments

Show HN: A terminal spreadsheet editor with Vim keybindings

https://github.com/garritfra/cell
32•garritfra•3h ago•13 comments

Self-updating screenshots

https://interblah.net/self-updating-screenshots
388•bjhess•1d ago•62 comments

The Prompt API

https://developer.chrome.com/docs/ai/prompt-api
206•gslin•12h ago•111 comments

Getting my daily news from a dot matrix printer 2024

https://aschmelyun.com/blog/getting-my-daily-news-from-a-dot-matrix-printer/
38•xupybd•2d ago•5 comments

Windows 11's second-chance setup dialogs hurt IT, drain productivity

https://www.theregister.com/2026/04/26/windows_second_chance_setup/
59•geekinchief•1h ago•32 comments

Managing the Unmanaged Switch

https://watchmysys.com/blog/2026/03/managing-the-unmanaged-switch/
3•luu•2d ago•0 comments

Branimir Lambov from IBM on Cassandra

https://theconsensus.dev/p/2026/04/26/branimir-lambov-from-ibm-on-cassandra.html
33•eatonphil•1d ago•2 comments

It's OK to abandon your side-project (2024)

https://robbowen.digital/wrote-about/abandoned-side-projects/
145•hisamafahri•6h ago•68 comments

France's Mistral Built a $14B AI Empire by Not Being American

https://www.forbes.com/sites/iainmartin/2026/04/16/how-frances-mistral-built-a-14-billion-ai-empi...
141•rzk•4h ago•87 comments

Fast16: High-precision software sabotage 5 years before Stuxnet

https://www.sentinelone.com/labs/fast16-mystery-shadowbrokers-reference-reveals-high-precision-so...
300•dd23•18h ago•75 comments

Electrostatics and High Voltage Links

http://amasci.com/static/electrostatic1.html
25•ludicrousdispla•3d ago•3 comments

A Guide to CubeSat Mission and Bus Design

https://pressbooks-dev.oer.hawaii.edu/epet302/
60•o4c•1d ago•3 comments

Three constraints before I build anything

https://jordanlord.co.uk/blog/3-constraints/
279•nervous_north•1d ago•44 comments

SWE-bench Verified no longer measures frontier coding capabilities

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
332•kmdupree•1d ago•172 comments

Box to save memory in Rust

https://dystroy.org/blog/box-to-save-memory/
158•emschwartz•3d ago•47 comments
Open in hackernews

4TB of voice samples just stolen from 40k AI contractors at Mercor

https://app.oravys.com/blog/mercor-breach-2026
112•Oravys•4h ago

Comments

Oravys•4h ago
Author here. Wrote this after watching Lapsus$ post the Mercor archive on their leak site earlier this month. The thing that struck me is the combination: voice samples paired with ID document scans. Most breaches leak one or the other. This one ships a deepfake-ready kit. Tried to keep the writeup practical: what an attacker can actually do with this combo (banking voiceprint bypass, Arup-style video calls, insurance fraud), and a 5-step checklist for the contractors who were in the dump.

  Happy to discuss the forensic detection side. AudioSeal
  watermarks, AASIST anti-spoofing, and how the detection landscape changes
  once voice biometrics start leaking at scale.
davsti4•16m ago
Interesting - thanks for the rabbit hole today. ;)

Mercer hasn't released many public statements over the incident. Social media posts aren't necessarily public; but I did find this breach notification sample filed with CA - https://oag.ca.gov/ecrime/databreach/reports/sb24-621099 . I guess we'll see if our legislators finally take data privacy seriously.

eqvinox•1h ago
The only data that cannot be stolen or leaked is data that doesn't exist. Hard lesson for both users and companies.

Germans (because of course) have a word for this: "Datensparsamkeit". Being frugal with your data.

wlesieutre•1h ago
I miss the pre-LLM days when you could make a decent argument that having any unnecessary data was just a liability. Now all anybody thinks is “more data for the AI!”
CincinnatiMan•1h ago
Were you not around for the Big Data heyday a decade ago?
varispeed•53m ago
Until thumb drives became large enough to fit most datasets it stopped becoming Big Data. Just normal data.
citrin_ru•50m ago
Data hoarding predates LLMs. There where other machine learning methods which also needed data for training.
Forgeties79•37m ago
“Before LLM’s there was_____”

I see this whenever an LLM’s impact is assessed. We know. The issue is scale and the ability for smaller and smaller groups (down to individuals) to execute at scale.

Fake news always existed. Now one dude in India can flood multiple sock puppet media accounts with right wing content/images (actual example) at a scale previously unimaginable.

dpoloncsak•16m ago
Do LLMs require that much more data than the tradional ML approaches we've seen over the years?
sigmoid10•6m ago
Yes. This is pretty well established. Neural networks in general are considerably less sample-efficient than traditional ML methods. The reason they became so successful is that they scale better as you increase training data and model size. But only with modern compute power they became useful outside of academic toy model examples.
b00ty4breakfast•7m ago
I really hate this when it's something negative that humans also do. It's like, yeah, people do do that, but why are we automating {negativeTrait}?
jacquesm•1h ago
You could have seen this coming a mile away. So far I have gotten away with never uploading my ID and/or interacting with one of those companies (though one idiot working for some VC thought it was ok to sign a document on my behalf by uploading my signature!!, never mind a bit of fraud) but it is getting harder and harder. Banks and in some cases even governments forcing you to send data to these operators is a very bad idea. But hey, who ever got hurt by some security theater?

I've had to open a bank account for a company here a few years ago and that was right on the bubble of this happening and they still had an option to come by in person with the proper documentation, which I did, now it is all outsourced.

These companies are the fattest targets and they're run by incompetents. You should assume that anything you give them will eventually be part of some hack.

Schlagbohrer•8m ago
Tell us more about that fraud story! Was the person your attorney or accountant? Or just some "smart" person who decided to wisely save time by doing fraud?
josefritzishere•1h ago
This kind of event is the best argument against needless data hoarding. But it would help if the law better provided for some kind of consequences for negligence.
VladVladikoff•1h ago
Man that’s pretty shitty that Mercor tricked 40k contractors, and then did a poor job of securing their data. There should be stronger consequences for stuff like this.
Havoc•1h ago
I love how the check if your affected involves giving a voice sample to whatever the fuck that website is
throw0101c•1h ago
"My voice is my passport. Verify Me.

:)

java-man•18m ago
HSBC did that. I could never understand that - the exact phrase was in the movie!
NitpickLawyer•10m ago
Someone probably did it for an internal demo, as a joke. Then people pushed it upwards, until someone clueless approved it.
amarcheschi•56m ago
I've been doing similar things on a different platform because as a uni student the pay is kinda nice, but I limit myself to task without voice/video and just input from mouse/keyboard to do reinforcement learning/data tagging. No way I'm trusting these companies or the companies they contract the work with
embedding-shape•41m ago
I wonder how many of the current text-to-speech ML models have large parts of leaked or "stolen" data in their training data? Almost none of the TTS releases seem to talk about exactly where they get their training data from, for some reason. I also wonder if we'll see an explosion in SOTA TTS in ~6 months from now.
hirako2000•34m ago
It's already there. And keeps moving.

Even have a nice UI on top.

https://voicebox.sh/

jubilanti•3m ago
Not really, Mozilla Common Voice (the ImageNet of speech) is larger than this. Their English database has 3814 hours, 1.6 million sentences, from 100k speakers.

https://commonvoice.mozilla.org/en/languages

john_strinlai•38m ago
>Set up a verbal codeword with family and finance contacts. Pick a phrase that has never been spoken on a recording and never typed in chat. Brief the people who handle money on your behalf. If a call ever asks for a transfer, the codeword is mandatory.

good luck with this. most finance people deal with hundreds to thousands of clients. they obviously cant remember everyones code word. commonly used finance systems arent setup to securely store these codewords. they dont have processes or policies in place to implement or adhere to any sort of codeword verification.

>Rotate where voiceprints are still in use. [...] Do that now, ideally from a new recording in a different acoustic environment than the leaked sample.

would this even have an effect? i have never heard of "rotating" a voice print. isnt the whole point of a voice print that you cant really change it? if simply switching your environment completely changes your voice print, that would make voice prints utterly useless to begin with.

iterateoften•21m ago
Yeah seems like nonsense advise. Have a code word that was never recorded? I don’t see how that would tote y anything. Like the point of these systems is they can say stuff you never said convincingly
wongarsu•18m ago
Someone who has hundreds or thousands of clients presumably couldn't remember every client's voice either, so no meaningful security is lost. They are approximately as secure or insecure as before
john_strinlai•17m ago
>presumably couldn't remember every client's voice either, so no meaningful security is lost

there are automated systems for this already. my bank, isp, etc. use them when you call in to skip the traditional verification steps. this fact is also highlighted in the article.

the problem is that there isnt typically a system in place for setting up or validating code words, so the advice given is not practical to implement.

barrenko•5m ago
It more looks like the purpose of such company was to steal such data.