frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

What's the best way to build trust in digital insurance?

https://e-mai.ma/a-propos-de-mai/
1•MAI_inssurance•2m ago•0 comments

Good vibrations: Scientists use imaging technology to visualize heat

https://techxplore.com/news/2025-07-good-vibrations-scientists-imaging-technology.html
1•PaulHoule•2m ago•0 comments

Drop visual annotations for your coding agent

https://github.com/RaphaelRegnier/vibe-annotations
1•RaphR•3m ago•0 comments

California Central Valley keeps sinking and it's taking home values down with it

https://www.sfgate.com/realestate/article/californias-central-valley-sinks-home-values-20809200.php
2•Stratoscope•4m ago•0 comments

Meteorite that punched through Georgia roof may be older than Earth itself

https://www.space.com/stargazing/meteorite-that-punched-a-hole-through-georgia-roof-may-be-older-than-earth-itself
1•Stratoscope•5m ago•0 comments

Mark Zuckerberg angers locals in Silicon Valley enclave over 11-home compound

https://nypost.com/2025/08/11/real-estate/mark-zuckerberg-angers-silicon-valley-locals-over-11-home-110m-compound/
1•Stratoscope•6m ago•0 comments

Bird signs and cycles, February, 2024

https://subject.space/projects-static/winter-bird-cycles/
1•sjmulder•8m ago•0 comments

D-cysteine impairs tumour growth by inhibiting cysteine desulfurase NFS1

https://www.nature.com/articles/s42255-025-01339-1
1•bookofjoe•8m ago•0 comments

Relying on AI in Colonoscopies May Erode Clinicians' Skills

https://www.medpagetoday.com/gastroenterology/coloncancer/116968
1•jtbayly•10m ago•0 comments

His psychosis was a mystery–until doctors learned about ChatGPT's health advice

https://www.psypost.org/his-psychosis-was-a-mystery-until-doctors-learned-about-chatgpts-health-advice/
2•01-_-•11m ago•1 comments

Free Online Markdown to PDF Converter – Live Preview and Export

https://www.ftmi.info/en/markdown-to-pdf.html
1•york_ren•12m ago•0 comments

Linus Torvalds blasts kernel dev for making the world worse with garbage patches

https://www.zdnet.com/article/linus-torvalds-blasts-kernel-dev-for-making-the-world-worse-with-garbage-patches/
1•isaacfrond•13m ago•0 comments

Localhost: Omar and Andrés on the Folk Computer Gadget [video]

https://www.youtube.com/watch?v=hrXEtG3JILo
1•surprisetalk•17m ago•0 comments

Help improve federal mass transit policy

https://www.slowboring.com/p/help-improve-federal-mass-transit
1•surprisetalk•17m ago•0 comments

Payload Fraction

https://en.wikipedia.org/wiki/Payload_fraction
2•surprisetalk•17m ago•0 comments

Two brothers' archive of 1990s Star Wars images made on MS Paintbrush (2014)

https://www.itsnicethat.com/articles/star-wars
2•Michelangelo11•17m ago•0 comments

Wplace – Paint the World

https://wplace.live
1•surprisetalk•17m ago•0 comments

Secret Messengers: Disseminating SIGINT in the Second World War [pdf]

https://media.defense.gov/2025/Jul/25/2003761271/-1/-1/0/SECRET_MESSENGERS.PDF
1•almost-exactly•20m ago•0 comments

MCP to Play your favorite Spotify tracks as Claude Code completion notifications

https://github.com/denar90/suzu-mcp
1•denar90•21m ago•0 comments

New glasses will supercharge hearing with AI

https://www.hw.ac.uk/news/2025/new-glasses-will-supercharge-hearing-with-ai
1•geox•22m ago•1 comments

UK expands police facial recognition rollout with 10 new facial recognition vans

https://www.theregister.com/2025/08/13/uk_expands_police_facial_recognition/
2•rntn•23m ago•0 comments

Is Perplexity's $34B offer to buy Chrome real or a marketing stunt?

https://www.computerworld.com/article/4038675/is-perplexitys-34-billion-offer-to-buy-chrome-real-or-a-marketing-stunt.html
1•dotcoma•24m ago•0 comments

Technoblogy – A NeoPixel Driver Using AVR Hardware

http://www.technoblogy.com/show?5BGM
2•chrisjj•30m ago•0 comments

Show HN: Play a game and help us better understand how people perceive color

https://www.survey-xact.dk/LinkCollector?key=AW815UMQL23J
2•AndreasM•31m ago•1 comments

Suetopia: Generative AI is a lawsuit waiting to happen to your business

https://www.theregister.com/2025/08/12/genai_lawsuit/
5•chrisjj•31m ago•0 comments

Show HN: Simple and Easy-to-Use Local API Testing Tool

https://github.com/dage212/fire-doc
3•dage212•33m ago•0 comments

Nocturne: New firmware for Spotify's Car Thing

https://usenocturne.com
2•fdb•33m ago•0 comments

We empower communities and nations around the world to map the electrical grid

https://MapYourGrid.org/
3•protontypes•35m ago•0 comments

The World of Quantum Advantage

https://arxiv.org/abs/2508.05720
2•jonbaer•36m ago•0 comments

Technological Folie à Deux:Feedback Loops Between AI Chatbots and Mental Illness

https://arxiv.org/abs/2507.19218
3•pera•37m ago•0 comments
Open in hackernews

FFmpeg 8.0 adds Whisper support

https://code.ffmpeg.org/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
180•rilawa•2h ago

Comments

ggap•1h ago
Very interesting to see this!
zzsshh•1h ago
Does this finally enable dynamically generating subtitles for movies with AI?
diggan•1h ago
Finally? I think VLC demo'd this a while ago at some conference where they had a table, if I remember correctly.
SSLy•1h ago
VLC and ffmpeg are unrelated projects
jeroenhd•1h ago
Docs say:

    If set, the transcription output will be sent to the specified file or URL
    (use one of the FFmpeg AVIO protocols); otherwise, the output will be logged as info messages.
    The output will also be set in the "lavfi.whisper.text" frame metadata.
    If the destination is a file and it already exists, it will be overwritten.

    @item format
    The destination format string; it could be "text" (only the transcribed text will be sent to the destination), "srt" (subtitle format) or "json".
    Default value: @code{"text"}
I don't know if this can embed the subtitles, but it does support generating accompanying srt files.

Of course, you could already do that by just manually calling whisper on files, but now you don't need to export parts or transformed media files to feed into whisper.

regularfry•1h ago
If you have enough processing power. Without a GPU it's going to lag.
KeplerBoy•1h ago
Whisper is pretty fast.
boutell•1h ago
Shut off the broken bot filter so we can read it please
diggan•1h ago
Took my iPhone 12 Mini a whole of 0.1 seconds to pass it. What hardware/OS are you using?
johnisgood•1h ago
Took me 8 seconds on my shitty desktop.
londons_explore•1h ago
Took about 30 secs for me (5 yr old intel cpu). Looked like there was a progress bar, but it didn't progress. Maybe the difficulty varies depending on IP address?
jeroenhd•1h ago
Anubis has config for that: https://anubis.techaro.lol/docs/admin/policies#request-weigh...

It's up to the site admin to configure it that way, but it's possible some IP ranges/user agents are more often used by bots and therefore have an increased weight.

For old browsers there's also an option to use meta refresh instead of JS (https://anubis.techaro.lol/docs/admin/configuration/challeng...) but that's quite a recent addition and not enabled by default.

diggan•39m ago
> Maybe the difficulty varies depending on IP address?

I'm currently roaming in Finland with a Spanish SIM so would have expected the opposite in that case.

politelemon•1h ago
Took me zero seconds to be blocked with invalid response
miloignis•46m ago
It also instantly blocks me on GrapheneOS, both Firefox and Vanadium. Very odd, as I've never had an issue with Anubis before.
blahyawnblah•9m ago
The stock chrome browser Google news uses
jeroenhd•1h ago
Check out commit 13ce36fef98a3f4e6d8360c24d6b8434cbb8869b from https://git.ffmpeg.org/ffmpeg.git if your web browser doesn't support Javascript. The linked page is just a git viewer for that specific commit.
yorwba•1h ago
Or read the documentation for the new whisper filter: https://ffmpeg.org/ffmpeg-filters.html#whisper-1
jeroenhd•1h ago
That also works, I assumed the ffmpeg website would also be behind Anubis if the git server is, but it doesn't actually seem to be.
majewsky•1h ago
Anubis is not all that useful for static websites since serving them does not generate high load (unlike when a bot traverses a Git server UI).
QuantumNomad_•1h ago
Archived snapshots of the linked page:

https://web.archive.org/web/20250813104007/https://code.ffmp...

https://archive.is/dmj17

You can read it on one of these without having to pass that specific bot check

majewsky•1h ago
From experience, these bot filters are usually installed because the site would be down entirely without rejecting AI scrapers, so the argument to shut it off to improve usability is rather silly.
kwar13•1h ago
Fantastic! I am working on a speech-to-text GNOME extension that would immensely benefit from this.

https://github.com/kavehtehrani/gnome-speech2text

lawik•1h ago
I wonder if they'll be satisfied there or add a chunk of others now that they've started. Parakeet is supposed to be good?

Should they add Voice Activity Detection? Are these separate filters or just making the whisper filter more fancy?

shrx•1h ago
Voice Activity Detection support is already included.
voxadam•1h ago
Am I correct in understanding that Whisper is a speech recognition AI model originally created by OpenAI?

https://en.wikipedia.org/wiki/Whisper_(speech_recognition_sy...

acidburnNSA•1h ago
Yes, according to the comments in the patch, you are correct.
kwar13•1h ago
yes.
johnisgood•1h ago
Yes.

From the documentation:

> It runs automatic speech recognition using the OpenAI's Whisper model.

voxadam•1h ago
Thanks, I was being tripped up by DDOS protection on code.ffmpeg.org for a minute and couldn't read the patch. The combo of Firefox and the fact that Quantum/Lumen/CenturyLink seems to get off by rotating my dynamic IP for no reason occasionally triggers various DDOS protections schemes.
Maxious•1h ago
yep, there's a c++ implementation to run it https://github.com/ggml-org/whisper.cpp
oezi•1h ago
Isn't WhisperX the canonical choice for running Whisper?
sampullman•50m ago
Maybe for running locally? whisper.cpp is nice because you can embed it pretty easily in apps for various targets like iOS, OSX, Android, wasm, etc.
0points•48m ago
While whisper and whisperx is python implementations, the whisper.cpp wins the benchmarks.
AlienRobot•1h ago
I think so, if I remember correctly PotPlayer also supports it for automatic subtitling.
cess11•1h ago
Kind of, it's a family of audio transcription models.

https://huggingface.co/search/full-text?q=whisper

londons_explore•1h ago
Does this have the ability to edit historic words as more info becomes available?

Eg. If I say "I scream", it sounds phonetically identical to "Ice cream".

Yet the transcription of "I scream is the best dessert" makes a lot less sense than "Ice cream is the best dessert".

Doing this seems necessary to have both low latency and high accuracy, and things like transcription on android do that and you can see the adjusting guesses as you talk.

ph4evers•1h ago
Whisper works on 30 second chunks. So yes it can do that and that’s also why it can hallucinate quite a bit.
jeroenhd•1h ago
The ffmpeg code seems to default to three second chunks (https://ffmpeg.org/ffmpeg-filters.html#whisper-1):

    queue
    
         The maximum size that will be queued into the filter before processing the audio with whisper. Using a small value the audio stream will be processed more often, but the transcription quality will be lower and the required processing power will be higher. Using a large value (e.g. 10-20s) will produce more accurate results using less CPU (as using the whisper-cli tool), but the transcription latency will be higher, thus not useful to process real-time streams. Consider using the vad_model option associated with a large queue value. Default value: "3"
londons_explore•1h ago
so if "I scream" is in one chunk, and "is the best dessert" is in the next, then there is no way to edit the first chunk to correct the mistake? That seems... suboptimal!

I don't think other streaming transcription services have this issue since, whilst they do chunk up the input, past chunks can still be edited. They tend to use "best of N" decoding, so there are always N possible outputs, each with a probability assigned, and as soon as one word is the same in all N outputs then it becomes fixed.

The internal state of the decoder needs to be duplicated N times, but that typically isn't more than a few kilobytes of state so N can be hundreds to cover many combinations of ambiguities many words back.

miki123211•1h ago
The right way to do this would be to use longer, overlapping chunks.

E.g. do thranscription every 3 seconds, but transcribe the most recent 15s of audio (or less if it's the beginning of the recording).

This would increase processing requirements significantly, though. You could probably get around some of that with clever use of caching, but I don't think any (open) implementation actually does that.

superluserdo•19m ago
I basically implemented exactly this on top of whisper since I couldn't find any implementation that allowed for live transcription.

https://tomwh.uk/git/whisper-chunk.git/

I need to get around to cleaning it up but you can essentially alter the number of simultaneous overlapping whisper processes, the chunk length, and the chunk overlap fraction. I found that the `tiny.en` model is good enough with multiple simultaneous listeners to be able to have highly accurate live English transcription with 2-3s latency on a mid-range modern consumer CPU.

llarsson•47m ago
Attention is all you need, as the transformative paper (pun definitely intended) put it.

Unfortunately, you're only getting attention in 3 second chunks.

0points•51m ago
So, yes, and also no.
shaunpud•1h ago
I Scream in the Sun https://carmageddon.fandom.com/wiki/I_Scream_in_the_Sun
DiogenesKynikos•1h ago
This is what your brain does when it processes language.

I find that in languages I don't speak well, my ability to understand degrades much more quickly as the audio quality goes down. But in my native language, even with piss poor audio quality, my brain fills in the garbled words with its prior expectation of what those words should be, based on context.

mockingloris•52m ago
A slight segue to this; I was made aware of the phenomena that - The language in which you think in, sets the constraints to which you level of expanse the brain can think and parse information in.

I think in English fortunately and it's an ever evolving language so, expanding as the world does. That is compared to the majority of people where I'm from; English was a second language they had to learn and the people that thought them weren't well equipped with the resources to do a good job.

│

└── Dey well; Be well

lgessler•46m ago
I recommend having a look at 16.3 onward here if you're curious about this: https://web.stanford.edu/~jurafsky/slp3/16.pdf

I'm not familiar with Whisper in particular, but typically what happens in an ASR model is that the decoder, speaking loosely, sees "the future" (i.e. the audio after the chunk it's trying to decode) in a sentence like this, and also has the benefit of a language model guiding its decoding so that grammatical productions like "I like ice cream" are favored over "I like I scream".

re•1h ago
I've been playing with whisper to try to do local transcription of long videos, but one issue I've found is that long (>15 seconds) spans without any speech tend to send it into a hallucination loops that it often can't recover from. I wonder if, with direct integration into ffmpeg, they will be able to configure it in a way that can improve that situation.
42lux•1h ago
You usually delete silence before using something like whisper.
re•1h ago
I've heard that, but that doesn't sound like a useful approach for videos where (1) non-speech segments can have plenty of other sound (music, noise) and (2) you want timestamps to match up with the original video, like for subtitles. But maybe there are known mitigations for both of those issues that I'm not aware of. And if they do exist maybe they can be included in the ffmpeg whisper integration.
miki123211•1h ago
By "delete", people mostly mean "detect", so that you can avoid processing such segments through Whisper. There's no reason to actually cut the silence out from the original audio file.
hnlmorg•1h ago
This is designed for real time use too. And in such cases, you couldn’t delete the silence before use.
42lux•41m ago
The ffmpeg implementation might be the example was not.
franga2000•1h ago
Whisper is supposed to be used with voice activity detection and all production implementations that I've seen do that. The raw model is known to make up nonsense for silence because, as I understand it, it was never trained not to do that, assuming everyone will use VAD
bondarchuk•1h ago
Can whisper do multilingual yet? Last time I tried it on some mixed dutch/english text it would spit out english translations for some of the dutch text. Strange bug/feature since from all appearances it had understood the dutch text perfectly fine.
ph4evers•1h ago
Whisper-v3 works well for multi-lingual. I tried it with Dutch, German and English
jeroenhd•1h ago
I found that it works quite well for Dutch+English as long as you use one of the larger models. But that may just be luck, I imagine mixing Italian and Swedish will have very different results.
guilamu•1h ago
Whisper has been multilingual for 5 years at least.
bondarchuk•1h ago
I know it is ostensibly multilingual, it's less than a year since I tried, but it does this thing where it then translates everything (or only some things) into a single language regardless with no way to turn it off.
kwar13•1h ago
Best for English, but I've found it pretty decent for Spanish.
clarionbell•1h ago
I think the Dutch/English is probably the worst combination for this. Languages are rather close.
bondarchuk•57m ago
I don't understand how this would happen, though. It's not like it will mishear a dutch sentence as if it's english; it will correctly pick up the dutch sentence, but (since the language is auto-detected as english at the start of the segment), seemingly auto-translate that (correct and correctly heard) dutch text to english. All we need is a way to get the dutch text that's surely somewhere in there, before the translation happens.

Unless it was trained end-to-end on dutch-subtitled english text?? Which might make the translation a somewhat inextricable part of the model..? Does anyone know?

numpad0•1h ago
Isn't that a bit much for ASR models? Humans can't handle simultaneous multilingual dictation task either, I have to stop and reinitialize ears before switching languages between English and my primary one.
bondarchuk•15m ago
Seems like it already has the capability somewhere in the model though - see my reply to clarionbell.
yewenjie•1h ago
I have recently found that parakeet from NVIDIA is way faster and pretty much as correct as Whisper, but it only works with English.
instagraham•1h ago
Does this mean that any software which uses ffmpeg can now add a transcription option? Audacity, Chrome, OBS etc
ks2048•1h ago
If they want to support it out-of-the box, they'll still have to embed a model file (roughly 500 MB - 3GB, varying size and quality)
Lio•1h ago
Once local transcription is in more places hopefully we can persuade content creator not to burn bouncing sub-titles into their videos.

I've seen professionally produced recordings on dry and technical subjects with good sound quality where they've decided to use distracting sub-titles with no way to disable them.

It seems so unnecessary if you're not making novelty videos about cats.

Also local transcription allows for automatic translation and again overlaying subtitles on top of an existing burnt in set is a really poor reading experience.

HPsquared•1h ago
The other problem with burned-in subtitles is you can't change the language.
rkomorn•1h ago
True, but (as someone who not infrequently has to rewind content on just about all streaming apps because it decided one particular subtitle only needed to be display for less than 200ms this time around) sometimes burned-in seems like a good idea.

I don't understand why the problem seems so pervasive (I've seen it on Netflix, Viki, and Apple TV, at least) and so transient.

preisschild•56m ago
They could also just upload those transcriptions as normal closed-captioning srt subtitles...
ambicapter•15m ago
They do that because it increases “engagement”, not because they care about the user’s experience with the subtitles.
zoobab•1h ago
Not sure it will be packaged in Debian, with an external binary model god knows how it was produced...
majewsky•1h ago
It looks like the model file needs to be supplied at invocation time, so the binary blob would not be required for packaging.
martzoukos•56m ago
I guess that there is no streaming option for sending generated tokens to, say, an LLM service to process the text in real-time.
nomad_horse•30m ago
Whisper has the encoder-decoder architecture, so it's hard to run streaming efficiently, though whisper-streaming is a thing.

https://kyutai.org/next/stt is natively streaming STT.

donatj•50m ago
I know nothing about Whisper, is this usable for automated translation?

I own a couple very old and as far as I'm aware never translated Japanese movies. I don't speak Japanese but I'd love to watch them.

A couple years ago I had been negotiating with a guy on Fiver to translate them. At his usual rate-per-minute of footage it would have cost thousands of dollars but I'd negotiated him down to a couple hundred before he presumably got sick of me and ghosted me.

poglet•46m ago
Yep, whisper can do that. You can also try whisperx (https://github.com/m-bain/whisperX) for a possibly better experience with aligning of subtitles to spoken words.
_def•38m ago
May I ask which movies? I'm just curious
trenchpilgrim•29m ago
Whisper has quite bad issues with hallucination. It will inject sentences that were never said in the audio.

It's decent for classification but poor at transcription.

prmoustache•25m ago
My personnal experience trying to transcribe (not translate) was a complete failure. The thing would invent stuff. It would also be completely lost when more than one language is used.

It also doesn't understand contexts so does a lot of errors you see in automatic translations from videos in youtube for example.

mockingloris•46m ago
How could one in theory, use this to train on a new language? Say for a hubby project; I have recordings of some old folks stories in my local dialect.

│

└── Dey well; Be well