GPU-Driven Clustered Forward Renderer

73•logdahl•7h ago

Comments

zeristor•6h ago

Apostrophe as a number separator?

Where’s that from?

dahart•6h ago

Switzerland and Italy for two. https://en.wikipedia.org/wiki/Decimal_separator#

Also note C++14 introduced the apostrophe in numeric literals! https://en.cppreference.com/w/cpp/language/integer_literal

lacoolj•6h ago

Learn somethin new every day.

And I would never have known this existed without hackernews

logdahl•6h ago

Interesting that Sweden explicitly do NOT use it... Not sure where i picked it up! :-)

qingcharles•5h ago

I've started using the underscore in my code since that is becoming the (non-localized) standard and trendy:

https://en.wikipedia.org/wiki/Integer_literal#Digit_separato...

m-schuetz•25m ago

Apostrophe are nice because they are not ambiguous. Started using them myself after getting used to them from C++ and learning that they are used in switzerland.

unclad5968•6h ago

This is awesome! At the end you mention the 27k dragons and 10k lights just barely fits in 16ms. Do you see any paths to improve performance? I've seen some demos on with tens/hundreds of thousands of moving lights, but hard to tell if they're legit or highly constrained. I'm not a graphics programmer by trade.

I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.

logdahl•6h ago

Well, the core issue is still drawing. I took another look at some profiles again and seems like its not the renderer limiting this to 27k! I still had some stupid scene-graph traversal... But clustering and culling is 53us and 33us respectively, but the draw is 7ms. So a frame (on the GPU-side) is like 7ms, and some 100-200 us on the CPU side.

Should really dive deeper and update the measurements for final results...

gmueckl•6h ago

This seems fairly well optimized. There's probably room to squeeze out some more perf, but not dramatic improvements. Maybe preventing overdraw of shaded pixels by doing a depth prepass would help.

Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.

spookie•3h ago

This is by far the biggest culprit OP, look into this.

You can try to come up with imposters representing these far away dragons, or simple LoD levels. Some games do use particles to represent far away and repeated "meshes" (Ghost of Tsushima does these for soldiers far away).

Lot's of techniques in this area ranging from simple to bananas. LoD levels alone can get you pretty far! Of course, this is at the cost of having more different draw calls, so it is a balancing game.

Think about the topology too, hope these old gems helps getting a grasp on the cost of this:

https://www.humus.name/index.php?page=Comments&ID=228

https://www.g-truc.net/post-0662.html

logdahl•1h ago

Yeah, I use LODs already but as you say, even my lowest lod far away is too many vertices. Imposter rendering seems very interesting but also completely bonkers (viewing angle, lighting)!

zokier•4h ago

Worth noting that the GTX 1070 is nearly 10 year old "mainstream" GPU. I'd imagine a 5090 or something could push the numbers fair bit more higher.

fabiensanglard•6h ago

This website has a beautiful layout ;) !

logdahl•6h ago

Fun to see you ;) Love your site!

rezmason•5h ago

Ten thousand lights! Your utility bill must be enormous

Flex247A•5h ago

Lights in games use real electricity :)

amelius•5h ago

Even the stars use real electricity.

cluckindan•2h ago

Not really, nuclear fusion doesn’t run on electrons.

Veo 3 and Imagen 4, and a new tool for filmmaking called Flow

Litestream: Revamped

Gemma 3n preview: Mobile-first AI

The NSA Selector

Deep Learning Is Applied Topology

Red Programming Language

Semantic search engine for ArXiv, biorxiv and medrxiv

Show HN: 90s.dev – Game maker that runs on the web

Robin: A multi-agent system for automating scientific discovery

"ZLinq", a Zero-Allocation LINQ Library for .NET

Show HN: A Tiling Window Manager for Windows, Written in Janet

Show HN: A Simple Server to Match Long/Lat to a TimeZone

My favourite fonts to use with LaTeX (2022)

A disk is a bunch of bits (2023)

The Dawn of Nvidia's Technology

Show HN: Juvio – UV Kernel for Jupyter

AI's energy footprint

Ashby (YC W19) Is Hiring Engineering Managers

Google AI Ultra

The emoji problem (2022)

Magic of software; what makes a good engineer also makes a good engineering org

Launch HN: Opusense (YC X25) – AI assistant for construction inspectors on site

GPU-Driven Clustered Forward Renderer

The Last Letter

Gail Wellington, former Commodore executive, has died

Google is giving Amazon a leg up in digital book sales

A simple search engine from scratch

Reports of Deno's Demise Have Been Greatly Exaggerated

The Lisp in the Cellar: Dependent types that live upstairs [pdf]

The Fractured Entangled Representation Hypothesis

Veo 3 and Imagen 4, and a new tool for filmmaking called Flow

Litestream: Revamped

Gemma 3n preview: Mobile-first AI

The NSA Selector

Deep Learning Is Applied Topology

Red Programming Language

Semantic search engine for ArXiv, biorxiv and medrxiv

Show HN: 90s.dev – Game maker that runs on the web

Robin: A multi-agent system for automating scientific discovery

"ZLinq", a Zero-Allocation LINQ Library for .NET

Show HN: A Tiling Window Manager for Windows, Written in Janet

Show HN: A Simple Server to Match Long/Lat to a TimeZone

My favourite fonts to use with LaTeX (2022)

A disk is a bunch of bits (2023)

The Dawn of Nvidia's Technology

Show HN: Juvio – UV Kernel for Jupyter

AI's energy footprint

Ashby (YC W19) Is Hiring Engineering Managers

Google AI Ultra

The emoji problem (2022)

Magic of software; what makes a good engineer also makes a good engineering org

Launch HN: Opusense (YC X25) – AI assistant for construction inspectors on site

GPU-Driven Clustered Forward Renderer

The Last Letter

Gail Wellington, former Commodore executive, has died

Google is giving Amazon a leg up in digital book sales

A simple search engine from scratch

Reports of Deno's Demise Have Been Greatly Exaggerated

The Lisp in the Cellar: Dependent types that live upstairs [pdf]

The Fractured Entangled Representation Hypothesis

GPU-Driven Clustered Forward Renderer

Comments