† Or an equivalent special purpose language, but WUFFS is right there
I'm not a codec developer, I'm only coming at this from an outside/intuitive perspective. Generally, performance concerned parties want to minimize heap allocations, so I'm interested in this as how it applies in codec architecture. Codecs seem so complex to me, with so much inscrutable shit going on, but then heap allocations aren't optimized out? Seems like there has to be a very good reason for this.
But to do that they have to keep state and do computations on that state. If you've got frame 47 being a P frame, that means you need frame 46 to decode it correctly. Or frame 47 might be a B frame in which case you need frame 46 and possibly also frame 48 - which means you're having to unpack frames "ahead" of yourself and then keep them around for the next decode.
I think that all counts as "dynamic state"?
I suspect the future of video compression will also include frame generation, like what is currently being done for video games. Essentially you have let's say 12 fps video but your video card can fill in the intermediate frames via what is basically generative AI so you get 120 fps output with smooth motion. I imagine that will never be something that WUFFS is best suited for.
All of these things are bounded for actual codecs. AV1 allows storing at most 8 reference frames. The sequence header will specify a maximum allowable resolution for any frame. The number of motion vectors is fixed once you know the resolution. Film grain requires only a single additional buffer. There are "levels" specified which ensure interoperability at common operating points (e.g., 4k) without even relying on the sequence header (you just reject sequences that fall outside the limits). Those are mostly intended for hardware, but there is no reason a software decoder could not take advantage of them. As long as codecs are designed to be implemented in hardware, this will be possible.
In any case I get what you're saying and I understand why codecs are going to be dynamically allocating memory, so thanks for that.
That's how most video codecs work already. They try to "guess" what the next frame will be, based on past (for P-frames) and future (for B-frames) frames. The difference is that the codec encodes some metadata to help with the process and also the difference between the predicted frame and the real frame.
As for using AI techniques to improve prediction, it is not a new thing at all. Many algorithms optimized for compression ratio use neural nets, but these tend to be too computationally expensive for general use. In fact the Hutter prize considers text compression as an AI/AGI problem.
With the bitrate set to 100MB/s it happily encodes 2160p or even 3240p, the maximum resolution available when using Virtual Super Resolution (which renders at >native res and downsamples, is awesome for titles without resolution scaling when you don't want to use TAA)
I don't know instagram, but I would expect any provider to be handle almost any container/codec/resolution combination going (they likely use ffmpeg underneath) and generate their different output formats at different bitrates for different playback devices.
Either instagram won't accept av1 (seems unlikely) or they just haven't processed it yet as you infer.
I'd love to know why your commend is greyed out.
AV1 hardware decoders are still rare so your device was probably resorting to software decoding, which is not ideal.
They shifted to h.264 successfully, but I haven't heard of any more conferences to move forward in over a decade.
Currently "The Last of US S02E06" only has one AV1 - https://thepiratebay.org/search.php?q=The+Last+of+Us+S02E06 same THMT - https://thepiratebay.org/search.php?q=The+Handmaids+Tale+S06... These are low quality at only ~600MB, not really early adopter sizes.
AV1 beats h.265 but not h.266 - https://www.preprints.org/manuscript/202402.0869/v1 - People disagree with this paper on default settings
Things like getting hardware to The Scene for encoding might help, but I'm not sure of the bottleneck, it might be bureaucratic or educational or cultural.
[edit] "Common Side Effects S01E04" AV1 is the strongest torrent, that's cool - https://thepiratebay.org/search.php?q=Common+Side+Effects+S0...
There is one large exception, but I don't know the current scene well enough to know if it matters: sources that are grainy. I have some DVD and blurays with high grain content and AV1 can work wonders with those thanks to the in-loop grain filter and synthesis -- we are talking half the size for a high-quality encode. If I were to encode them for AVC at any reasonable bitrate, I would probably run a grain-removal filter which is very finicky if you don't want to end up with something that is overly blurry.
Nicholas Nethercote's "How to speed up the Rust compiler" writings[1] fall into this same category for me.
Any others?
Real is about the only other codec I see that could be a name, but nobody uses that anymore.
I've been trying to find that article ever since but I'm not able to. Anyone knows the article I'm talking about?
I mean sure, max performance is great if you control every part of your pipeline, but if you're accepting untrusted data from users-at-large ffmpeg has at least a half-dozen remotely exploitable CVEs a year. Better make sure your sandbox is tight.
https://ffmpeg.org/security.html
I feel like there's a middle ground where everyone works towards a secure and fast solution, rather than whatever position they've staked out here.
† If you're a human. If you're an ostrich this is not impressive, but on the whole ostrichs aren't competing in the Olympic 100 metre sprint.
AVG-SVT-PSY is particularly interesting to read up on as well.
Edit: If I had read the next paragraph, I'd have learn about [1] before commenting
Leading me to the conclusion that Rust is a dubious choice for highly optimized SIMD code.
robertknight•5h ago
ohr•5h ago
Ygg2•5h ago
pornel•3h ago
adgjlsfhk1•4h ago
kukkamario•2h ago
Have you tried manually defining alignment of Rust struct?