I think building some processing off of Vulkan 1.3 was the right move. (Aside, I also just noticed yesterday that Asahi Linux on Mac supports that standard as well.)
FFmpeg arguments, the original prompt engineering
One would use gemini-cli (or claude-cli),
- and give a natural language prompt to gemini (or claude) on what processing needs to be done,
- with the correct paths to FFmpeg and the media file,
- and g-cli (or c-cli) would take it from there.
Is this correct?
ffmpeg right after
The only options you ever need are tar -x, tar -c (x for extract and c for create). tar -l if you wanna list, l for list.
That's really it, -v for verbose just like every other tool if you wish.
Examples:
tar -c project | gzip > backup.tar.gz
cat backup.tar.gz | gunzip | tar -l
cat backup.tar.gz | gunzip | tar -x
You never need anything else for the 99% case.Surely you mean -t if you wanna list, t for lisT.
l is for check-Links.
-l, --check-links
(c and r modes only) Issue a warning message unless all links to each file are archived.
And you don't need to uncompress separately. tar will detect the correct compression algorithm and decompress on its own. No need for that gunzip intermediate step.Whoops, lol.
> on its own
Yes.. I'm aware, but that's more options, unnecessary too, just compose tools.
Principle of least surprise and all that.
I don't use tape, so I don't need a tape archive format.
Gzip only compresses a single file, so .tar.gz lets you bundle multiple files. You can do the same thing with zip, of course, but...
Zip compresses individual files separately in the container, ignoring redundancies between files. But .tar.gz (and .tar.zip, though I've rarely seen that combination) bundles the files together and then compresses them, so can get better compression than .zip alone.
I wasn't expecting the downvotes for an xkcd reference
Examples:
tar -cf archive.tar.gz foo bar # Create archive.tar.gz from files foo and bar.
tar -tvf archive.tar.gz # List all files in archive.tar.gz verbosely.
tar -xf archive.tar.gz # Extract all files from archive.tar.gz
This will create an uncompressed .tar with the wrong name. You need a z option to specify gzip.
gzip -dc backup.tar.gz | tar -x
You can skip a step in your pipeline.fwiw, `tar xzf foobar.tgz` = "_x_tract _z_e _f_iles!" has been burned into my brain. It's "extract the files" spoken in a Dr. Strangelove German accent
Better still, I recently discovered `dtrx` (https://github.com/dtrx-py/dtrx) and it's great if you have the ability to install it on the host. It calls the right commands and also always extracts into a subdir, so no more tar-bombs.
If you want to create a tar, I'm sorry but you're on your own.
"also always extracts into a subdir" sounds like a nice feature though, thanks for sharing another alternative!
You don't need the z, as xf will detect which compression was used, if any.
Creating is no harder, just use c for create instead, and specify z for gzip compression:
tar czf archive.tar.gz [filename(s)]
Same with listing contents, with t for tell: tar tf archive.tar.gz
It’s really the dream UI/UX from sience fiction movies: “take all images from this folder and crop 100px away except on top, saturate a bit and save them as uncompressed tiffs in this new folder, also assemble them in a video loop, encode for web”.
If you don't care enough about potential side effects to read the manual it's fine, but a dream UX it is not because I'd argue that includes correctness.
A prompt to ChatGPT and a command later and all were nicely cropped in a second.
The dread of doing it by hand and having it magically there a minute later is absolutely mind blowing. Even just 5 years ago, I would have just done it manually as it would have definitely taken more to write the code for this task.
This seemed to be interesting to users of this site. tl;dr they added support for whisper, an OpenAI model for speech-to-text, which should allow autogeneration of captions via ffmpeg
yep, finally the deaf will able to read what people are saying in a porno!
This could streamline things
1. Just copy them over from the Bluray. This lacks support in most client players, so you'll either need to download a player that does, or use something like Plex/Jellyfin, which will run FFMpeg to transcode and burn the picture subtitles in before sending it to the client.
2. Run OCR on the Bluray subtitles. Not perfect.
3. Steal subtitles from a streaming service release (or multiple) if it exists.
[0] - https://xkcd.com/2347/
[0] https://link.springer.com/article/10.1007/s11214-020-00765-9
Linux doesn't really have a system codec API though so any Linux video software you see (ex. VLC, Handbrake) is almost certainly using ffmpeg under the hood (or its foundation, libavcodec).
That being said, if you put down a pie chart of media frameworks (especially for transcoding or muxing), ffmpeg would have a significant share of that pie.
It also was originally authored by the same person who did lzexe, tcc, qemu, and the current leader for the large text compression benchmark.
Oh, and for most of the 2010's there was a fork due to interpersonal issues on the team.
It’s exceedingly good software though, and to be fair I think it’s gotten a fair bit of sponsorship and corporate support.
Could be an interesting data source to explore that opinion.
Then it stopped working until I updated youtube-dl and then that stopped working once I lost the incantation :<
It's a great tool. Little long in the tooth these days, but gets the job done.
Past that, I'm on the command line haha
Handbrake and Losslssscut are great too. But in addition to donating to FFmpeg, I pay for ffWorks because it really does offer a lot of value to me. I don’t think there is anything close to its polish on other platforms, unfortunately.
If it was priced 1-5€ would just buy it I guess. But this.
Someone else mentioned Lossless-Cut program, which is pretty good. It has a merge feature that has a compatibility checker ability that can detect a few issues. But I find transcoding the separate videos to MPEG-TS before joining them can get around many problems. If you fire up a RAM-Disk, it's a fast task.
ffmpeg -i video1.mp4 -c copy -start_at_zero -fflags +genpts R:\video1.ts;
ffmpeg -i video2.mp4 -c copy -start_at_zero -fflags +genpts R:\video2.ts;
ffmpeg -i "concat:R:\video1.ts|R:\video2.ts" -c copy -movflags +faststart R:\merged.mp4
> Only codecs specifically designed for parallelised decoding can be implemented in such a way, with more mainstream codecs not being planned for support.
It makes sense that most video codecs aren't amenable to compute shader decoding. You need tens of thousands of threads to keep a GPU busy, and you'll struggle to get that much parallelism when you have data dependencies between frames and between tiles in the same frame.
I wonder whether encoders might have more flexibility than decoders. Using compute shaders to encode something like VP9 (https://blogs.gnome.org/rbultje/2016/12/13/overview-of-the-v...) would be an interesting challenge.
This is great news. I remember being laughed at when I initially asked whether the Vulkan enc/dec were generic because at the time it was all just standardising interfaces for the in-silicon acceleration.
Having these sorts of improvements available for legacy hardware is brilliant, and hopefully a first route that we can use to introduce new codecs and improve everyone's QOL.
When the resulting frame is already in a GPU texture then, displaying it has fairly low overhead.
My question is: how wrong am I?
Motion vectors can be large (for example, 256 pixels for VP8), so you wouldn't get much extra parallelism by decoding multiple frames together.
However, even if the worst-case performance is bad, you might see good performance in the average case. For example, you might be able to decode all of a frame's inter blocks in parallel, and that might unlock better parallel processing for intra blocks. It looks like deblocking might be highly parallel. VP9, H.265 and AV1 can optionally split each frame into independently-coded tiles, although I don't know how common that is in practice.
The ProRes bitstream spec was given to SMPTE [1], but I never managed to find any information on ProRes RAW, so it's exciting to see software and compute implementations here. Has this been reverse-engineered by the FFMPEG wizards? At first glance of the code, it does look fairly similar to the regular ProRes.
[1] https://pub.smpte.org/doc/rdd36/20220909-pub/rdd36-2022.pdf
I'm curious wrt how a WebGPU implementation would differ from Vulkan. Here's mine if you're interested: https://github.com/averne/FFmpeg/tree/vk-proresdec
Initially this was just a vehicle for me to get stuck in and learn some WebGPU, so no doubt I'm missing lots of opportunities for optimisation - but it's been fun as much as frustrating. I leaned heavily on the SMPTE specification document and the FFMPEG proresdec.c implementation to understand and debug.
The old RV40 had some small advantages over H264. At low bitrates, RV40 always seemed to blur instead of block, so it got used a lot for anime content. CPU-only decoding was also more lightweight than even the most optimized H264 decoder (CoreAVC with the inloop deblocking disabled to save even more CPU).
If there's anything that needs audio/video automation, I've always turned to FFmpeg, it's such a crucial and indispensible tool and so many online video tools use it and are generally a UI wrapper around this wonderful tool. TIL - there's FFmpeg.Wasm also [0].
In Jan 2024, I had used it to extract frames of 1993 anime movie in 15 minutes video segments, upscaled it using Real-ESRGAN-ncnn-vulkan [1] then recombining the output frames for final 4K upscaled anime [2]. FWIW, if I had built a UI on this workflow it could've become a tool similar to Topaz AI which is quite popular these days.
[0]: https://github.com/ffmpegwasm/ffmpeg.wasm
[1]: https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan
[2]: https://files.horizon.pics/3f6a47d0-429f-4024-a5e0-e85ceb0f6...
Video2X-x86_64.AppImage -i "$f" \
-c libvpx-vp9 -e crf=34 -o "${f/480p/480p_upscale2x}" \
-p realcugan -s 2 --noise-level 1
To find the best arguments for upscaling (last line from above), I first used ffmpeg to extract a short scene that I encoded with various parameter sets. Then I used ffmpeg to capture still images so that I could find the best set.They wouldn't let us look into the actual codecs or compression, they just wanted us to build a front-end for it.
I got to digging and realized they were just re-encoding the video through FFMpeg with a certain set of flags and options. I was able to replicate their results by just running FFMpeg.
They stopped talking to us.
A new chatbot? Another ChatGPT wrapper. A new Linux Distro. Another Arch with a preinstalled desktop environment. A new video downloader? It's yt-dlp with a GUI.
If they were just honest from the get-go, it'd be fine, but some people aren't.
I am curious about adoption and features that would make big difference to users :)
Are they using wavefront/subgroup operations to parallelize the range decoder across multiple symbols simultaneously? Or exploiting the slice-level parallelism with each workgroup handling independent slices? The arithmetic coding dependency chain has traditionally been the bottleneck for GPU acceleration of these codecs.
I'd love to hear from anyone who's profiled the compute shader implementation - particularly interested in the occupancy vs. bandwidth tradeoff they've chosen for the entropy decoding stage.
When I later wound up managing video post production workflows my CMD line or terminal use dropped a few jaws.
I've since been relying on LLM's to make FFMPEG commands so I don't even think about it.
But I've found it easier to brute force with LLM's because, like, every time I had to do video work it'd be something different. Prompts like 'I need to remove this and this and change the resultion from this to that', 'I need it to be this fps or that, or even I want this file to weigh this much. Or I 'need to split these two' or 'combine those three'. It'll usually get you a chunk of the way there. Another prompt or two of double-checking, copy paste into CMD line or terminal and either brr or error copy paste what does this mean. 3 minutes later it's doing the thing you wanted, and you're more or less understanding what's it giving you.
But I keep an Obsidian file with a bunch commands that made me happy before. Dumping that I to the context window helps.
Another one has been multi camera, multi screen recordings with OBS. I discovered it was easier to do the math, make a big canvas, record all the feeds onto those so I don't have to think about syncing anything later. Then brr an FFMPEG command to output that 1920x1080 and that 3840x2160
Whisper is great with that too - raw recording, output just the audio. 'give me whisper command to get this as srt'. Then 'now render subtitles onto this video'
There was an experiment I tried that kinda almost worked where I had this boring recording of some conversation but needed to extract scattered bits. Used whisper to get transcript, put that into LLM, used that to zero in on the actual bits that were important, then got it to spit out the timecodes. Then hobbled together this janky script that cut out those bits and stitched them together. That was faster than taking the time to do it with a GUI and listening it all through.
Of course there are tools like opus clip that spit that out for you now so...
Although to be honest, when the stakes go high and you're doing something serious that requires quality you do it slow.
The point at which I was doing this most was when I was doing video UX/UI research on a hardware/software product. We would set up multi-cams, set and forget so we could talk to subjects and not think about what's being captured.
Dozens of hours of footage, little clips that would end up as insights on the Product Discovery Jira for the thing. So quality wasn't really important.
Has anyone found a bulletproof recipe for calling ffmpeg with many args (filters) from python? Use r-strings? Heredocs?
oblio•11h ago
Secondly, just curious: any insiders here?
What changed? I see the infrastructure has been upgraded, this seems like a big release, etc. I guess there was a recent influx of contributors? A corporate donation? Something else?
exprez135•11h ago
[1]: https://github.com/ggml-org/whisper.cpp
[2]: https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/13ce36fef98a...
ukuina•11h ago
ranger_danger•10h ago
perihelions•10h ago
https://news.ycombinator.com/item?id=44886647 ("FFmpeg 8.0 adds Whisper support (ffmpeg.org)"—9 days ago, 331 comments)