* (Mute it if you don’t like the music, just like the rest of us will if you complain about the music)
fascinating
I wouldn't have normally read this and watched the video, but my Claude sessions were already executing a plan
the tl;dr is that all the actors were scanned in a 3D point cloud system and then "NeRF"'d which means to extrapolate any missing data about their transposed 3D model
this was then more easily placed into the video than trying to compose and place 2D actors layer by layer
Not sure if it's you or the original article but that's a slightly misleading summary of NeRFs.
The way TV/movie production is going (record 100s of hours of footage from multiple angles and edit it all in post) I wonder if this is the end state. Gaussian splatting for the humans and green screens for the rest?
That said, the technology is rapidly advancing and this type of volumetric capture is definitely sticking around.
The quality can also be really good, especially for static environments: https://www.linkedin.com/posts/christoph-schindelar-79515351....
"That data was then brought into Houdini, where the post production team used CG Nomads GSOPs for manipulation and sequencing, and OTOY’s OctaneRender for final rendering. Thanks to this combination, the production team was also able to relight the splats."The gist is that Gaussian splats can replicate reality quite effectively with many 3D ellipsoids (stored as a type of point cloud). Houdini is software that excels at manipulating vast numbers of points, and renderers (such as Octane) can now leverage this type of data to integrate with traditional computer graphics primitives, lights, and techniques.
I am vaguely aware of stuff like Gaussian blur on Photoshop. But I never really knew what it does.
Gaussian splatting is a bit like photogrammetry. That is, you can record video or take photos of an object or environment from many angles and reproduce it in 3D. Gaussians have the capability to "fade" their opacity based on a Gaussian distribution. This allows them to blend together in a seamless fashion.
The splatting process is achieved by using gradient descent from each camera/image pair to optimize these ellipsoids (Gaussians) such that the reproduce the original inputs as closely as possible. Given enough imagery and sufficient camera alignment, performed using Structure from Motion, you can faithfully reproduce the entire space.
Read more here: https://towardsdatascience.com/a-comprehensive-overview-of-g....
If you’re curious start with the Wikipedia article and use an LLM to help you understand the parts that don’t make sense. Or just ask the LLM to provide a summary at the desired level of detail.
tl;dr eli5: Instead of capturing spots of color as they would appear to a camera, they capture spots of color and where they exist in the world. By combining multiple cameras doing this, you can make a 3D works from footage that you can then zoom a virtual camera round.
I'm not up on how things have changed recently
For example, the camera orbits around the performers in this music video are difficult to imagine in real space. Even if you could pull it off using robotic motion control arms, it would require that the entire choreography is fixed in place before filming. This video clearly takes advantage of being able to direct whatever camera motion the artist wanted in the 3d virtual space of the final composed scene.
To do this, the representation needs to estimate the radiance field, i.e. the amount and color of light visible at every point in your 3d volume, viewed from every angle. It's not possible to do this at high resolution by breaking that space up into voxels, those scale badly, O(n^3). You could attempt to guess at some mesh geometry and paint textures on to it compatible with the camera views, but that's difficult to automate.
Gaussian splatting estimates these radiance fields by assuming that the radiance is build from millions of fuzzy, colored balls positioned, stretched, and rotated in space. These are the Gaussian splats.
Once you have that representation, constructing a novel camera angle is as simple as positioning and angling your virtual camera and then recording the colors and positions of all the splats that are visible.
It turns out that this approach is pretty amenable to techniques similar to modern deep learning. You basically train the positions/shapes/rotations of the splats via gradient descent. It's mostly been explored in research labs but lately production-oriented tools have been built for popular 3d motion graphics tools like Houdini, making it more available.
You generate the point clouds from multiple images of a scene or an object and some machine learning magic
Did the Gaussian splatting actually make it any cheaper? Especially considering that it needed 50+ fixed camera angles to splat properly, and extensive post-processing work both computationally and human labour, a camera drone just seems easier.
This tech is moving along at breakneck pace and now we're all talking about it. A drone video wouldn't have done that.
There’s no proof of your claim and this video is proof of the opposite.
Volumetric capture like this allows you to decide on the camera angles in post-production
This approach is 100% flexible, and I'm sure at least part of the magic came from the process of play and experimentation in post.
This is a “Dropbox is just ftp and rsync” level comment. There’s a shot in there where Rocky is sitting on top of the spinning blades of a helicopter and the camera smoothly transitions from flying around the room to solidly rotating along with the blades, so it’s fixed relative to rocky. Not only would programming a camera drone to follow this path be extremely difficult (and wouldn’t look as good), but just setting up the stunt would be cost prohibitive.
This is just one example of the hundreds you could come up with.
No, it’s simply the framerate.
I'm David Rhodes, Co-founder of CG Nomads, developer of GSOPs (Gaussian Splatting Operators) for SideFX Houdini. GSOPs was used in combination with OTOY OctaneRender to produce this music video.
If you're interested in the technology and its capabilities, learn more at https://www.cgnomads.com/ or AMA.
Try GSOPs yourself: https://github.com/cgnomads/GSOPs (example content included).
>Evercoast deployed a 56 camera RGB-D array
Do you know which depth cameras they used?
I recommend asking https://www.linkedin.com/in/benschwartzxr/ for accuracy.
So likely RealSense D455.
I'm curious about the relighting pipeline with Octane...are you deriving surface normals from the splat covariance/densities to drive a standard BRDF or are you mathematically manipulating the spherical harmonics coefficients directly to "fake" the lighting changes?
Also, given the massive 1TB footprint mentioned, how heavy is the attribute overhead when passing those PLY sequences through the solver?
nodra•1h ago
keiferski•49m ago
MuffinFlavored•43m ago
larsmaxfield•37m ago
b00ty4breakfast•14m ago
And it's not always giving in to those voices, sometimes it's going in the opposite direction specifically to subvert those voices and expectations even if that ends up going against your initial instincts as an artist.
With someone like A$AP Rocky, there is a lot of money on the line wrt the record execs but even small indie artists playing to only a hundred people a night have to contend with audience expectation and how that can exert an influence on their creativity.
stickfigure•32m ago
nodra•22m ago
wahnfrieden•12m ago