> The code is being prepared for public release; pretrained weights and full training/inference pipelines are planned.
Any ideas of how it would different and better compared to "traditional" PCG? Seems like it'd give you more resource consumption, worse results and less control, neither of which seem like a benefit.
> We tackle the challenge of generating the infinitely extendable 3D world — large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.
It's about generating interesting virtual space!
Maybe the idea is to create environments for AI robotics traini ng.
I've dreamed of a NeRF-powered backrooms walking simulator for quite a while now. This approach is "worse" because the mesh seems explicit rather than just the world becoming what you look at, but that's arguably better for real-world use cases of course.
jackdoe•1h ago
speedgoose•14m ago