Very cool work explained well.
Take something like Rocket League for example. Definitely doesn't have velocity buffers.
Yes even Rocket League has it
How did you reach this conclusion? Rocket League looks like a game that definitely have velocity buffers to me. (Many fast-moving scenarios + motion blur)
Actually I just checked and it does have a motion blur setting... maybe I just turned it off years ago and forgot or something.
Though my understanding is that it helps hide shakier framerates in console land. Which sounds like it could be a thing...
Your vision have motion blur. Staring at your screen at fixed distance and no movement is highly unrealistic and allows you to see crisp 4k images no matter the content. This results in a cartoonish experience because it mimics nothing in real life.
Now you do have the normal problem that the designers of the game/movie can't know for sure what part of the image you are focusing on (my pet peeve with 3D movies) since that affects where and how you would perceive the blur.
Also have the problem of overuse or using it to mask other issues, or just as an artistic choice.
But it makes total sense to invest in a high refresh display with quick pixel transitions to reduce blur, and then selectively add motion blur back artificially.
Turning it off is akin to cranking up the brightness to 400% because otherwise you can't make out details in the dark parts off the game ... thats the point.
But if you prefer it off then go ahead, games are meant to be enjoyed!
Doesn't work for translucency and shader animation. The latter can be made to work if the shader can also calculate motion vectors.
- In a 3d game, a motion vector is the difference between the position of an object in 3d space from the previous to the current frame
- In H.264, the 'motion vector' is basically saying - copy this rectangular chunk of pixels from some point from some arbitrary previous frame and then encode the difference between the reference pixels and the copy with JPEG-like techniques (DCT et al)
This block copying is why H.264 video devolves into a mess of squares once the bandwidth craps out.
In typical video encoding motion compensation of course isn't derived from real 3D motion vectors, it's merely a heuristic based on optical flow and a bag of tricks, but in principle the actual game's motion vectors could be used to guide video's motion compensation. This is especially true when we're talking about a custom codec, and not reusing the H.264 bitstream format.
Referencing previous frames doesn't add latency, and limiting motion to just displacement of the previous frame would be computationally relatively simple. You'd need some keyframes or gradual refresh to avoid "datamoshing" look persisting on packet loss.
However, the challenge is in encoding the motion precisely enough to make it useful. If it's not aligned with sub-pixel precision it may make textures blurrier and make movement look wobbly almost like PS1 games. It's hard to fix that by encoding the diff, because the diff ends up having high frequencies that don't survive compression. Motion compensation also should be encoded with sharp boundaries between objects, as otherwise it causes shimmering around edges.
3D motion vectors always get projected to 2D anyway. They also aren't used for moving blocks of pixels around, they are floating point values that get used along with a depth map to re-rasterize an image with motion blur.
Implementation details are quite different, but for reasons unrelated to motion vectors — the video codecs that are established now were designed decades ago, when use of neural networks was in infancy, and the hardware acceleration for NNs was way outside of the budget of HW video decoders.
I haven't in a while but I used to use https://parsec.app/ on a cheap intel Air to do my STO dailies on vacation. It sends inputs, but gets a compressed stream. Im curious of any OS of something similar.
It was always shocking to me that Stadia was literally making their own games in house and somehow the end result was still just a streamed video and the latency gains were supposed to come from edge deployed gpus and a wifi-connected controller.
Then again, maybe they tried some of this stuff and the gains weren't worth it relative to battle-tested video codecs.
VR games already do something like this, so that when a game runs at below the maximum FPS of the VR headset, it can still respond to your head movements. It's not perfect because there's no parallax and it can't show anything for the region that was previously outside of your field of view, but it still makes a huge difference. (Of course, it's more important for VR because without doing this, any lag spike in a game would instantly induce motion sickness in the player. And if they wanted to, parallax could be faked using a depth map)
Also, for 2d games a simple sideways scrolling game could give very accurate motion vectors for the background and large foreground linearly moving objects.
I'm surprised at the number of people disagreeing with your idea here. I think HN has a lot of "if I can't see how it can be done then it can't be done" people.
Edit: Also any 2d graphical overlays like HUDs, maps, scores, subtitles, menus, etc could be sent as 2d compressed data, which could enable better compression for that data - for example much sharper pixel perfect encoding for simple shapes.
I mean you could also ship the textures ahead of time so that the compressor could look up if something looks like a distorted texture. You could send the geometry of what's being rendered, that would give a lot of info to the decompressor. You could send the HUD separately. And so on.
But here you want something that's high level and works with any game engine, any hardware. The main issue being latency rather than bandwidth, you really don't want to add calculation cycles.
That's a misconception. All modern video codecs (i.e. H.264/AVC, H.265/HEVC, AV1) have explicit, first-class tools, profiles, and reference modes aimed at both low- and high-resolution low‑latency and/or low‑complexity use.
AV1: Improving RTC Video Quality at Scale: https://atscaleconference.com/av1-improving-rtc-video-qualit...
Objective metrics and tools for video encoding and source signal quality: netflix/vmaf, easyVmaf, psy-ex/metrics, ffmpeg-quality-metrics,
Ffmpeg settings for low-latency encoding:
# h264, h265
-preset ultrafast
-tune zerolatency
# AV1
-c:v libsvtav1
-preset 8
-svtav1-params tune=0:latency-mode=1
-g 60
It's possible to follow along with ffmpeg encoding for visual inspection without waiting for the whole job to complete with the tee muxer and ffplay.GPU Screen Recorder and Sunlight server expose some encoder options in GUI forms, but parameter optimization is still manual; nothing does easyVmaf with thumbnails of each rendering parameter set with IDK auto-identification of encoding artifacts.
Ardour has a "Loudness Analyzer & Normalizer" with profiles for specific streaming services.
What are good target bitrates for low-latency livestreaming 4k with h264, h265 (HDR), and AV1?
> The go-to solution here is GPU accelerated video compression
Isn't the solution usually hardware encoding?
> I think this is an order of magnitude faster than even dedicated hardware codecs on GPUs.
Is there an actual benchmark though?
I would have assumed that built-in hardware encoding would always be faster. Plus, I'd assume your game is already saturating your GPU, so the last thing you want to do is use it for simultaneous video encoding. But I'm not an expert in either of these, so curious to know if/how I'm wrong here? Like if hardware encoders are designed to be real-time, but intentionally trade off latency for higher compression? And is the proposed video encoding really is so lightweight it can easily share the GPU without affecting game performance?
Generally, you're right that these hardware blocks favor latency. One example of this is motion estimation (one of the most expensive operations during encoding). The NVENC engine on NVidia GPUs will only use fairly basic detection loops, but can optionally be fed motion hints from an external source. I know that NVidia has a CUDA-based motion estimator (called CEA) for this purpose. On recent GPUs there is also the optical flow engine (another separate block) which might be able to do higher quality detection.
And on a similar note, NvFBC helps a ton with latency but its disabled on a driver level for consumer cards.
They are. That patch doesnt do what you think it does.
It'd be interesting to see benchmarks against H.264/AVC (see example "zero‑latency" ffmpeg settings below) and JPEG XS.
-c:v libx264 -preset ultrafast -tune zerolatency \
-x264-params "keyint=1:min-keyint=1:scenecut=0:rc-lookahead=0" \
-bf 0 -b:v 8M -maxrate 8M -bufsize 1M
FWIW, there's also the non-free JPEG-XS standard [1] which also claims very low latency [2] and might be a safer choice for commercial projects, given that there is a patent pool around it.
https://www.filmlight.ltd.uk/store/press_releases/filmlight-...
We currently use the IntoPIX CUDA encoder/decoder implementation, and SRT for the low-level transport.
You can definitely achieve end-to-end latencies <16ms over decent networks.
We have customers deploying their machines in data centres and using them in their post-production facilities in the centre of town, usually over a 10GbE link. But I've had others using 1GbE links between countries, running at higher compression ratios.
While I have no personal experience on that topic, I'd assume that a codec with a patent pool is a safer bet for a commercial project. Key aspects being protected by patents makes it less likely that some random patent troll or competitor extorts you with some nonsense patent. Also, using e.g., JPEG XS instead of e.g., pyrowave also ensures that you won't be extorted by the JPEG XS patent holders.
One may call this a protection racket - but under the current system, it may make economical sense to pay for a license instead of risking expensive law suits.
Does it? how? Patents can overlap, for example. Unless there's some indemnity or insurance for fighting patent lawsuits as part of the pool, it's a protection only against those patent holders, not other trolls.
Also, modern game textures are a lot of data.
I suppose the issue would be media. Faster to load locally than push it out. Could be semi solved with typical web caching approaches.
Very much a one-trick pony, but probably considerably less bandwidth-intensive than even the original resolution (320x224) under nearly any acceptable bitrate.
You're doing something wrong if nvenc is any slower, the llhp preset should be all you need.
There is zero reason to not be willing to do so on a local network.
> If the clients can even manage it.
They can. I use significantly more than that already.
https://streaminglearningcenter.com/codecs/an-interview-with...
Ultra low latency for streaming.
The two things that increase latency are more advanced processing algorithms, giving the encoder more stuff to do, and schemes that require waiting multiple frames. If you go disable those, the encoder can pretty much start working on your frame the nanosecond the GPU stops rendering to it, and have it encoded in <10ms.
For context, OP achieved 0.13 ms with his codec.
"interesting data point is that transferring a 4K RGBA8 image over the PCI-e bus is far slower than compressing it on the GPU like this, followed by copying over the compressed payload."
"200mbit/s at 60 fps"
It's certainly a very different set of tradeoffs, using a lot more bandwidth.
Wasnt that the point?
> These use cases demand very, very low latency. Every millisecond counts here
> When game streaming, the expectation is that we have a lot of bandwidth available. Streaming locally on a LAN in particular, bandwidth is basically free. Gigabit ethernet is ancient technology and hundreds of megabits over WiFi is no problem either. This shifts priorities a little bit for me at least.
There's a tradeoff between quality and encoding time - for example, if you want your motion vector reference to go back 4 frames, instead of 2, then the encoder will take longer to run, and you get better quality at no extra bitrate, but more runtime.
If your key to-screen latency has an irreducible 50-60ms part of rendering, processing, data transfer, decoding and display, then the extra 10ms is just 15% more latency, but you have to find the correct tradeoff for yourself.
Do not shame this dojo.
One thing to note when designing a new video codec is to carpet bomb around the idea with research projects to stake claim to any possible feature enhancements.
Anything can have an improvement patent filed against, no matter the license.
I think the main advantage is perhaps the robustness against packet drops is better.
In practice, no.
Network latency is the least problematic part of the stack. I consistently get <3ms. It's largely encode/decode time which in my setup sits at around 20ms meaning any work in this area would actually have a a HUGE impact.
Maybe I should try for that next weekend.
You are standing in an open field west of a white house, with a boarded front door. There is a small mailbox here.
Frame 2:
A few blades of grass sway gently in the breeze. The camera begins to drift slightly, as if under player control — a faint ambient sound begins: wind and birds.
If the author is reading this, it would be very interesting to read about the differences between this method and HTJ2K.
For those interesting in the ultra low latency space (where you’re willing to trade a bit of bandwidth to gain quality and minimise latency), VSF have a pretty good wrap up of other common options and what they each optimise for: https://static.vsf.tv/download/technical_recommendations/VSF...
For example, Microsoft's DXT codec lacks most modern features (no entropy coding, motion comp, deblocking, etc.), but delivers roughly 4x to 8x compression and is hardware decodable (saving on decoding and presentation latency).
Of course, once you've tuned the end to end capture-encode-transmit-decode-display loop to sub 10 ms, you then have to contend with the 30-100 ms of video processing latency introduced by the display :-)
Because GeforceNow has best streaming quality in the business and Boosteroid have various problems like stuttering etc.
Fidelix•13h ago
Can't wait until one day this gets into Moonlight or something like it.
cpeth•13h ago