This is getting unreal. They're becoming fast and high fidelity. Once we get better editing capabilities and can shape the Gaussian fields, this will become the prevailing means of creating and distributing media.
Turning any source into something 4D volumetric that you can easily mold as clay, relight, reshape. A fully interactable and playable 4D canvas.
Imagine if the work being done with diffusion models could read and write from Gaussian fields instead of just pixels. It could look like anything: real life, Ghibli, Pixar, whatever.
I can't imagine where this tech will be in five years.
100%. And style-transfer it into steam punk or H.R. Giger or cartoons or anime. Or dream up new fantasy worlds instantaneously. Explore them, play them, shape them like Minecraft-becomes-holodeck. With physics and tactile responses.
I'm so excited for everything happening in graphics right now.
Keep it up! You're at the forefront!
Could you or someone else wise in the ways of graphics give me a layperson's rundown of how this works, why it's considered so important, and what the technical challenges are given that an RGB+D(epth?) stream is the input?
Usually creating a Gaussian splat representation takes a long time and uses an iterative gradient-based optimization procedure. Using RGBD helps me sidestep this optimization, as much of the geometry is already present in the depth channel and so it enables the real-time aspect of my technique.
When you say "big deal", I imagine you are also asking about business or societal implications. I can't really speak on those, but I'm open to licensing this IP to any companies which know about big business applications :)
I'm not aware of other live RGBD visualizations except for direct pointcloud rendering. Compared to pointclouds, splats are better able to render textures, view-dependent effects, and occlusions.
The depth is helpful to properly handle the parallaxing of the scene as the view angle changes. The system should then ideally "in-paint" the areas that are occluded from the input.
You can either guess the input depth from matching multiple RGB inputs or just use depth inputs along with RGB inputs if you have them. It's not fundamental to the process of building the splats either way.
That being said, afaict OP's method is 1000x faster, at 33ms.
I'm also following this work https://guanjunwu.github.io/4dgs/ which produces temporal Gaussian splats but takes at least half an hour to learn the scene.
Is there some temporal accumulation?
Supervised learning actually does work. Suppose you have four cameras. You input the three of them into the net and use the fourth as the ground truth. The live video aspect just emerges from re-running the neural net every frame.
I've considered publishing the source but the source code is is dependent on some proprietary utility libraries from my bigger project and it's hard to fully disentangle it and I'm not sure if this project has some business applications but I'd like to keep that door open at this time.
I wonder if one can go the opposite route and use gaussian splatting or (more likely) some other method to generate 3D/4D scenes from cartoons. Cartoons are famously hard to emulate in 3D even entirely manually; like with traditional realistic renders (polygons, shaders, lighting, post-processing) vs gaussian splats, maybe we need a fundamentally different approach.
However, I have not baked in the size or orientation into the system. Those are "chosen" by the neural net based on the input RGBD frames. The view dependent effects are also "chosen" by the neural net, but not through an explicit radiance field. If you run the application and zoom in, you will be able to see the splats of different sizes pointing in different directions. The system as limited ability to re-adjust the positions and sizes due to the compute budget leading to the pixelated effect.
I actually started with pointclouds for my VR teleoperation system but I hated how ugly it looked. You end up seeing through objects and objects becoming unparseable if you get too close. Textures present in the RGB frame also become very hard to make out because everything becomes "pointilized". In the linked video you can make out the wood grain direction in the splat rendering, but not in the pointcloud rendering.
With framerate, there are two different frame rates that are important. One is the splat construction framerate, which the speed that an entirely new set of Gaussian's can be constructed. LiveSplat can usually maintain 30fps in this case.
The second important splat rendering framerate. In VR this is important to prevent motion sickness. Even if you have a static set of splats, you need the rendering to react to the user's minor head movements at around 90fps for the best in-headset experience.
All these figures are on my setup with a 4090 but I have gotten close results with a 3080 (maybe 70fps splat rendering instead of 90fps).
sreekotay•5h ago
markisus•5h ago