https://github.com/scottvr/wtffmpeg/blob/main/wtffmpeg.py#L9...
>You are an expert at writing commands for the `ffmpeg` multimedia framework.
>Respond ONLY with the `ffmpeg` command. Do not add any explanations, introductory text, or markdown formatting.
Fragility of it aside, and the fact that more is written to try and force it to do less, this is basically the gist of the whole thing.
- User: "convert input.mov to a web-friendly mp4" - Assistant: ffmpeg -i input.mov -c:v libx264 -preset medium -crf 23 -c:a aac -b:a 128k output.mp4
Isn't exactly a web-friendly mp4. the fast start option is not used. this means the MooV header is at the end of the file instead of the head. that means the entire file must be read/scanned to get to the metadata when the browser requests it which means a long delay depending on the size of the file.
- User: "create a 10-second clip from my_movie.mkv starting at the 1 minute 30 second mark" - Assistant: ffmpeg -i my_movie.mkv -ss 00:01:30 -t 10 -c copy clip.mkv
this is another poor example, as it is again the slowest option by having the -ss after the -i. placing the -ss before the -i will result in the command being faster.
not really sure who is training this system on how to use ffmpeg, but it doesn't fill me with confidence that simple things like this are being missed. after this example, i just stopped looking
Not really, unless your server doesn’t support range requests, browsers are smart enough to request the end of the file where a non-faststart moov atom typically lives. But yes, you should use faststart.
You’re right that this appears to be the work of someone who’s not very adept at ffmpeg. Which shouldn’t be surprising; as a power user, maybe even expert at ffmpeg, unless I need to write a complex filter graph, consulting an LLM will just slow me down—people like me have no need for this.
This way user could directly review if it is suggesting something they want to go on with.
- argument syntax autocorrect
- natural language arguments instead of the actual ones should be accepted
- whenever there's an error executing, instead of just erroring out, the error should go through an LLM and output a proper explanation plus suggested fix
Doing command by command seems the wrong way about it though.
There are plenty of terminal apps with this functionality, e.g. https://www.warp.dev/
If you really want to LLM everything, I'd rather have a dedicated flag that provides correction/explanation of args while doing a dry-run. And another to analyze error messages.
This whole repo is a single 300 LoC Python file over half of which is the system prompt and comments. It's not even a fine-tuned model or something, it's literally just a wrapper around llama-cpp with a very basic prompt tacked on.
I'm sure it's potentially useful and maybe even works, but I'm really sick of seeing these extremely low-effort projects posted and upvoted over and over.
I've seen LLM's do this in other languages as well but didn't realize there was a term for it. Wrapping entire function bodies in try/catch, at the very least please just wrap the caller so you don't have to indent the entire body for no reason. Not to mention a lot of commands inside can't even throw.
{money-mouth face emoji}
Think the fact all the commands are shorthand doesn't help because no matter how many ffmpeg commands you copy and paste in your life unless you put the effort in you're not going to begin to remember what -an means in the sea of all the other two letter switches and the copywriting in the output and error messages is very hard to tell whats going wrong for someone who hasn't used it a long time.
Not saying it should all be super wordy just that it's difficult to pick up things though osmosis when the commands look like this -ss -t -rc:v, respect to anyone who actually learnt how this works so they could type it without sitting there with the documentation and hitting a wall for an hour.
Will say though the raw tech inside ffmpeg has always meant figuring out getting it to do the thing has always been worth it because it's insanely powerful.
You accurately described many "AI apps" of this era.
https://github.com/alfg/ffmpeg-commander
Haven't updated in a while, but it's a simplified web UI with a few example presets.
Edit: no, unrelated. Got confused with https://ffmpeg.app .
English is not my mother tongue but I think the model should correct the user that it should be: "convert my_video.avi to mp4 without sound"
But for the vast majority of folks who only occasionally use ffmpeg to do something, the complexity of it is so outrageous it feels like a parody. Literally (I mean literally) THOUSANDS of options/flags. It's just too much for a human to navigate. Of course we're going to "cheat" or just google up something similar to what we want. If an LLM can handle it, even better.
But the more you familiarize yourself with a/v streaming and transcoding, you soon realize why you need such amount of control.
I mean, with ffmpeg I can easily combine 3 audio clips, 5 subtitles and a separate video, cut away first 25 seconds and the last 5 minutes of the resulting clip, resize it and change the aspect ratio, reduce audio to mono and specify output codecs for audio and video.
And this is still a pretty simple example of what one could want to do.
Ffmpeg has countless other amazing features, demanding more arguments.
How about for example camera stabilization? (-vf deshake)
How would one even start to explain all of this to an app without thousands of command line arguments?
The whole subject is incredibly complex and ffmpeg is by far the most amazing project in this space.
Without ffmpeg, there would be no youtube in 2005, no plex at all and really the whole of modern social web would probably have happened later if not Fabrice was such a fantastic guy :-)
https://ffmpeg.org/ffmpeg.html
Let's be honest, it kinda sucks. The commands are barely explained it feels more left as an exercise for the reader to do the puzzle solving of whats trying to be communicated.
Honestly if these processing chain diagrams just had a rollover where if you roll over parts of the command or the block in the chain and the other part highlighted with a description of what the switch was actually doing then a lot more people would be able to understand this, especially if real world before and after examples of outputs were included.
Instead it's <diagram of the chain> <raw string of the command> "Note: one caveat about something"
>Disclaimer This was largely made to amuse myself; consider it a piece of humorous performance art but it so borders on being actually useful, I went to the trouble to document all of this. YMMV. Use at your own risk. The author is not responsible for any damage or data loss that may occur from using this tool. Always review generated commands before executing them, especially when working with important files.
I'm on the fence on this one as the HN community thrives on novel interesting and sometimes humorous content, yet you got the ire of most.Burying the line about it being a useful yet parody of a project at the very end helps no one understand as exactly for that point, this was sold as the latest and greatest, oh disclaimer, it really isn't.
Don't stop the work, but please remember to keep you intentions clear and your audience will understand.
Best of luck on your future projects!
Edit: Typo
The example itself shows a naive conversion, ending up transcoding to default h.264 params. This should have been -c:v copy for copying the input packets, as-is.
I know this from RFTM. I don't ask LLM to second guess me.
Also: I understand the privacy concerns, but basically any LLM that's large enough can act as a conversational UI to ffmpeg nowadays. Why would I want to add a specific one to do that?
PS: Yeah, ffmpeg is not an easy tool to use.
ghostly_s•6h ago
adithyassekhar•6h ago
NoboruWataya•1h ago