Vector graphics on GPU

https://gasiulis.name/vector-graphics-on-gpu/

170•gsf_emergency_6•1mo ago

Comments

larodi•1mo ago

Really, inst there anything which comes Slug-level of capabilities and is not super expensive?

coffeeaddict1•1mo ago

Vello [0] might suit you although it's not production grade yet.

[0] https://github.com/linebender/vello

miguel_martin•1mo ago

Just use blend2d - it is CPU only but it is plenty fast enough. Cache the rasterization to a texture if needed. Alternatively, see blaze by the same author as this article: https://gasiulis.name/parallel-rasterization-on-cpu/

reallynattu•1mo ago

ThorVG might be worth a look - open source (MIT), ~150KB core, GPU backends (WebGPU, OpenGL).

We are using it as official dotLottie runtimes, now a Linux Foundation project. Handles SVG, Lottie, fonts, effects.

https://github.com/thorvg/thorvg/

coffeeaddict1•4w ago

In terms of performance, it's quite far from something like Blend2D or Vello though.

hermet•4w ago

Blend2D is a CPU-only rendering engine, so I don't think it's a fair comparison to ThorVG. If we're talking about CPU rendering, ThorVG is faster than Skia. (no idea about Blend2d) But at high resolutions, CPU rendering has serious limitations anyway. Blend2D is still more of an experimental project that JIT kills the compatiblity and Vello is not yet production-ready and webgpu only. No point of arguing fast today if it's not usable in real-world scenarios.

Asm2D•4w ago

How JIT kills compatibility if it's only enabled on x86 and aaarch64? You can compile Blend2D without it and it would just work.

So no, it doesn't kill any compatibility - it only shows a different approach.

BTW GPU-only renderers suck, and many renderers that have GPU and CPU engines suck when GPU is not available or have bugs. Strong CPU rendering performance is just necessary for any kind of library if you want true compatibility across various platforms.

I have seen many many times broken rendering on GPU without any ability to switch to CPU. And the biggest problem is that more exotic HW you run it on, less chance that somebody would be able to fix it (talking about GPUs).

hermet•4w ago

Check this out: https://thorvg-perf-test.vercel.app/ https://www.youtube.com/watch?v=jdnnzmtHy9k

badlibrarian•1mo ago

Author uses a lot of odd, confusing terminology and brings CPU baggage to the GPU creating the worst of both worlds. Shader hacks and CPU-bound partitioning and choosing the Greek letter alpha to be your accumulator in a graphics article? Oh my.

NV_path_rendering solved this in 2011. https://developer.nvidia.com/nv-path-rendering

It never became a standard but was a compile-time option in Skia for a long time. Skia of course solved this the right way.

https://skia.org/

bsder•1mo ago

While the author doesn't seem to be aware of state of the art in the field, vector rendering is absolute NOT a solved problem whether on CPU or GPU.

Vello by Raph Levien seems to be a nice combination of what is required to pull this off on GPUs. https://www.youtube.com/watch?v=_sv8K190Zps

lukan•1mo ago

Yeah, I have high hopes for Vello to take off. I could throw away lots of hacks and caching and whatnot if I could do fast vector rendering reliable on the GPU.

I think Rive also does vector rendering on the GPU

https://rive.app/renderer

But it is not really meant (yet?) as a general graphics libary, but just a renderer for the rive design tools.

pier25•1mo ago

AFAIK you can use the Rive renderer in your C++ app.

http://github.com/rive-app/rive-runtime

bean469•1mo ago

> While the author doesn't seem to be aware of state of the art in the field

The blog post is from 2022, though

sirwhinesalot•1mo ago

So what is the right way that Skia uses? Why is there still discussion on how to do vector graphics on the GPU right if Skia's approach is good enough?

Not being sarcastic, genuinely curious.

cyberax•1mo ago

The major unsolved problem is real-time high-quality text rendering on GPU. Skia just renders fonts on the CPU with all kinds of hacks ( https://skia.org/docs/dev/design/raster_tragedy/ ). It then renders them as textures.

Ideally, we want to have as much stuff rendered on the GPU as possible. Ideally with support for glyph layout. This is not at all trivial, especially for complex languages like Devanagari.

In the perfect world, we want to be able to create a 3D cube and just have the renderer put the text on one of its facets. And have it rendered perfectly as you rotate the cube.

exDM69•1mo ago

> NV_path_rendering solved this in 2011.

By no means is this a solved problem.

NV_path_rendering is an implementation of "stencil then cover" method with a lot of CPU preprocessing.

It's also only available on OpenGL, not on any other graphics API.

The STC method scales very badly with increasing resolutions as it is using a lot of fill rate and memory bandwidth.

It's mostly using GPU fixed function units (rasterizer and stencil test), leaving the "shader cores" practically idle.

There's a lot of room for improvement to get more performance and better GPU utilization.

Asm2D•1mo ago

You know nothing.

Skia is definitely not a good example at all. Skia started as a CPU renderer, and added GPU rendering later, which heavily relies on caching. Vello, for example, takes a completely different approach compared to Skia.

NV path rendering is a joke. nVidia though that ALL graphics would be rendered on GPU within 2 years after making the presentation, and it took 2 decades and 2D CPU renderers still shine.

nicoburns•1mo ago

I believe Skia's new Graphite architecture is much more similar to Vello

badlibrarian•1mo ago

Right. The question is does Skia grows its broad and useful toolkit with an eye toward further GPU optimization? Or does Vello (broadened and perhaps burdened by Rust and the shader-obsessive crowd) grow a broad and useful API?

There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution, but I'll leave those discussions to dark-gray Discord forums rendered by Skia in a browser.

coffeeaddict1•1mo ago

> There's also the issue of just how many billions of line segments you really need to draw every 1/120th of a second at 8K resolution

IMO, one of biggest benefit of a high performance renderer would be power savings (very important for laptops and phones). If I can run the same work but use half the power, then by all means I'd be happy to deal with the complications that the GPU brings. AFAIK though, no one really cares about that and even efforts like Vello are just targeting fps gains, which do correlate with reduced power consumption but only indirectly.

badlibrarian•1mo ago

It's an argument you can make in any performance effort. But I think the "let's save power using GPUs" ship sailed even before Microsoft started buying nuclear reactors to power them.

Asm2D•1mo ago

Adding a power draw into the mix is pretty interesting. Just because a GPU can render something 2x faster in a particular test doesn't mean you have consumed 50% less power, especially when we talk about dedicated GPUs that can have power draw in hundreds of watts.

Historically 2D rendering on CPU was pretty much single-threaded. Skia is single-threaded, Cairo too, Qt mostly (they offload gradient rendering to threads, but it's painfully slow for small gradients, worse than single-threaded), AGG is single-threaded, etc...

In the end only Blend2D, Blaze, and now Vello can use multiple threads on CPU, so finally CPU vs GPU comparisons can be made more fairy - and power draw is definitely a nice property of a benchmark. BTW Blend2D was probably the first library to offer multi-threaded rendering on CPU (just an option to pass to the rendering context, same API).

As far as I know - nobody did a good benchmarking between CPU and GPU 2D renderers - it's very hard to do completely unbiased comparison, and you would be surprised how good the CPU is in this mix. Modern CPU cores consume maybe few watts and you can render to a 4K framebuffer with that single CPU core. Put rendering text to the mix and the numbers would start to be very interesting. Also GPU memory allocation should be included, because rendering fonts on GPU means to pre-process them as well, etc...

2D is just very hard, on both CPU and GPU you would be solving a little bit different problems, but doing it right is insane amount of work, research, and experimentation.

nicoburns•1mo ago

It's not a formal benchmark, but my Browser Engine / Webview (https://github.com/DioxusLabs/blitz/) has pluggable rendering backends (via https://github.com/DioxusLabs/anyrender) with Vello (GPU), Vello CPU, Skia (various backends incl. Vulkan, Metal, OpenGL, and CPU) currently implemented

On my Apple M1 Pro, the Vello CPU renderer is competitive with the GPU renderers on simple scenes, but falls behind on more complex ones. And especially seems to struggle with large raster images. This is also without a glyph cache (so re-rasterizing every glyph every time, although there is a hinting cache) which isn't implemented yet. This is dependent on multi-threading being enabled and can consume largish portions of all-core CPU while it runs. Skia raster (CPU) gets similarish numbers, which is quite impressive if that is single-threaded.

Asm2D•4w ago

I think Vello CPU would always struggle with raster images, because it does a bounds check for every pixel fetched from a source image. They have at least described this behavior somewhere in Vello PRs.

The obsession for memory safety just doesn't pay off in some cases - if you can batch 64 pixels at once with SIMD it just cannot be compared to a per-pixel processor that has a branch in a path.

virtualritz•1mo ago

Unless I miss something I think that this describes box filtering.

It should probably mention that that this is only sufficient for some use cases but not for high quality ones.

E.g. if you were to use this e.g. for rendering font glyphs into something like a static image (or a slow rolling title/credits) you probably want a higher quality filter.

jstimpfle•1mo ago

What type of filter do you mean? Unless I'm misunderstanding/missing something, the approach described doesn't go into the details of how coverage is computed. If the input image is only simple lines whose coverage can be correctly computed (don't know how to do this for curves?) then what's missing?

I'd be interested how feasible complete 2D UIs using dynamically GPU rendered vector graphics are. I've played with vector rendering in the past, using a pixel shader that more or less implemented the method described in the OP. Could render the ghost script tiger at good speeds (like 1-digit milliseconds at 4K IIRC), but there is always an overhead to generating vector paths, sampling them into line segments, dispatching them etc... Building a 2D UI based on optimized primitives instead, like axis-aligned rects and rounded rects, mostly will always be faster, obviously.

Text rendering typically adds pixel snapping, possibly using byte code interpreter, and often adds sub-pixel rendering.

jlokier•1mo ago

> If the input image is only simple lines whose coverage can be correctly computed (don't know how to do this for curves?) then what's missing?

Computing pixel coverage accurately isn't enough for the best results. Using it as the alpha channel for blending forground over background colour is the same thing as sampling a box filter applied to the underlying continuous vector image.

But often a box filter isn't ideal.

Pixels on the physical screen have a shape and non-uniform intensity across their surface.

RGB sub-pixels (or other colour basis) are often at different positions, and the perceptual luminance differs between sub-pixels in addition to the non-uniform intensity.

If you don't want to tune rendering for a particular display, there are sometimes still improvements from using a non-box filter

An alternative is to compute the 2D integral of a filter kernel over the coverage area for each pixel. If the kernel has separate R, G, B components, to account for sub-pixel geometry, then you may require another function to optimise perceptual luminance while minimising colour fringing on detailed geometries.

Gamma correction helps, and fortunately that's easily combined with coverage. For example, slow rolling tile/credits will shimmer less at the edges if gamme is applied correctly.

However, these days with Retina/HiDPI-style displays, these issues are reduced.

For example, MacOS removed sub-pixel anti-aliasing from text rendering in recent years, because they expect you to use a Retina display, and they've decided regular whole-pixel coverage anti-aliasing is good enough on those.

dahart•1mo ago

> What type of filter do you mean? […] the approach described doesn’t go into the details of how coverage is computed

This article does clip against a square pixel’s edges, and sums the area of what’s inside without weighting, which is equivalent to a box filter. (A box filter is also what you get if you super-sample the pixel with an infinite number of samples and then use the average value of all the samples.) The problem is that there are cases where this approach can result in visible aliasing, even though it’s an analytic method.

When you want high quality anti-aliasing, you need to model pixels as soft leaky overlapping blobs, not little squares. Instead of clipping at the pixel edges, you need to clip further away, and weight the middle of the region more than the outer edges. There’s no analytic method and no perfect filter, there are just tradeoffs that you have to balance. Often people use filters like Triangle, Lanczos, Mitchell, Gaussian, etc.. These all provide better anti-aliasing properties than clipping against a square.

masswerk•1mo ago

May require "(2022)" in the title.

xattt•1mo ago

Tangential, but was this not the goal of Quartz 2D? The idea of everyday things running on the GPU seemed very attractive.

There is some context in this 13-year-old discussion: https://news.ycombinator.com/item?id=5345905#5346541

I am curious if the equation of CPU-determined graphics being faster than being done on the GPU has changed in the last decade.

Did Quartz 2D ever become enabled on macOS?

kllrnohj•1mo ago

When things like this (or Vello or piet-gpu or etc...) talk about "vector graphics on GPU" they are near exclusively talking only about essentially a full solve solution. A generic solution that handles fonts and svgs and arbitrarily complex paths with strokes and fills and the whole shebang.

These are great goals, but also largely inconsequential with nearly all UI designs. The majority of systems today (like skia) are hybrids. Things like simple shapes (eg, round rects) have analytical shaders on the GPU and complex paths (like fronts) are just done on the CPU once and cached on the GPU in a texture. It's a very robust, fast approach to the wholistic problem, at the cost of not being as "clean" of a solution like a pure GPU renderer would be.

jacobp100•1mo ago

> I am curious if the equation of CPU-determined graphics being faster than being done on the GPU has changed in the last decade

If you look at Blend2D (a CPU rasterizer), they seem to outperform every other rasterizer including GPU-based ones - according to their own benchmarks at least

Asm2D•1mo ago

Blend2D doesn't benchmark against GPU renderers - the benchmarking page compares CPU renderers. I have seen comparisons in the past, but it's pretty difficult to do a good CPU vs GPU benchmarking.

miguel_martin•1mo ago

Blaze outperforms Blend2D - by the same author as the article: https://gasiulis.name/parallel-rasterization-on-cpu/ - but to be fair, Blend2D is really fast.

Asm2D•1mo ago

You need to rerun the benchmarks if you want fresh numbers. The post was written when Blend2D didn't have JIT for AArch64, which penalized it a bit. Also on X86_64 the numbers are really good for Blend2D, which beats Blaze in some tests. So it's not black&white.

And please keep in mind that Blend2D is not really in development anymore - it has no funding so the project is basically done.

coffeeaddict1•1mo ago

> And please keep in mind that Blend2D is not really in development anymore - it has no funding so the project is basically done.

That's such a shame. Thanks a lot for Blend2D! I wish companies were less greedy and would fund amazing projects like yours. Unfortunately, I do think that everyone is a bit obsessed with GPUs nowadays. For 2D rendering the CPU is great, especially if you want predictable results and avoid having to deal with the countless driver bugs that plague every GPU vendor.

miguel_martin•4w ago

That is fair - sorry for spreading mis-information! That's unfortunate to hear about Blend2D.

samiv•1mo ago

The issue is not performance the issue is that pixel precise operations are difficult on the GPU using graphics features such as shaders.

You don't normally work with pixels but you work with polygonal geometry (triangles) and the GPU does the pixel (fragment) rasterization.

zozbot234•4w ago

Surely you could at least draw arbitrary rectilinear polygons and expect that they're going to be pixel perfect? After all the GPU is routinely used for compositing rectangular surfaces (desktop windows) with pixel-perfect results.

pjmlp•1mo ago

Not sure what you mean, it can make use of accelerated graphics,

https://developer.apple.com/library/archive/documentation/Gr...

xattt•4w ago

I’ve explored it for a few years, but all I could tell that it was never actually fully enabled. You can enable it through debugging tools, but it was never on by default for all software.

willtemperley•1mo ago

Quartz 2D is now CoreGraphics. It's hard to find information about the backend, presumably for commercial reasons. I do know it uses the GPU for some operations like magnifyEffect.

Today I was smoothly panning and zooming 30K vertex polygons with SwiftUI Canvas and it was barely touching the CPU so I suspect it uses the GPU heavily. Either way it's getting very good. There's barely any need to use render caches.

nubskr•1mo ago

Turns out the best GPU optimization is just being too scared of graphics drivers to do the fancy stuff, 10-15x faster and you can actually debug it.

jayd16•1mo ago

So without blowing up the traditional shader pipeline, why is it not trivial to add a path stage as an alternative to the vertex stage? It seems like GPUs and shader language could implement a standard way to turn vector paths into fragments and keep the rest of the pipeline.

In fact, you could likely use the geometry stage to create arbitrarily dense vertices based on path data passed to the shader without needing any new GPU features.

Why is this not done? Is the CPU render still faster than these options?

exDM69•1mo ago

> why is it not trivial to add a path stage as an alternative to the vertex stage?

Because paths, unlike triangles are not fixed size or have screen space locality. Paths consist of multiple contours of segments, typically cubic bezier curves and a winding rule.

You can't draw one segment out of a contour on the screen and continue to the next one, let alone do them in parallel. A vertical line segment on the left hand side going bottom to top of your screen will make every pixel to the right of it "inside" the path, but if there's another line segment going top to bottom somewhere the pixel and it's outside again.

You need to evaluate the winding rule for every curve segment on every pixel and sum it up.

By contrast, all the pixels inside the triangle are also inside the bounding box of the triangle and the inside/outside test for a pixel is trivially simple.

There are at least four popular approaches to GPU vector graphics:

1) Loop-Blinn: Use CPU to tessellate the path to triangles on the inside and on the edges of the paths. Use a special shader with some tricks to evaluate a bezier curve for the triangles on the edges.

2) Stencil then cover: For each line segment in a tessellated curve, draw a rectangle that extends to the left edge of the contour and use two sided stencil function to add +1 or -1 to the stencil buffer. Draw another rectangle on top of the whole path and set the stencil test to draw only where the stencil buffer is non-zero (or even/odd) according to the winding rule.

3) Draw a rectangle with a special shader that evaluates all the curves in a path, and use a spatial data structure to skip some. Useful for fonts and quadratic bezier curves, not full vector graphics. Much faster than the other methods for simple and small (pixel size) filled paths. Example: Lengyel's method / Slug library.

4) Compute based methods such as the one in this article or Raph Levien's work: use a grid based system with tessellated line segments to limit the number of curves that have to be evaluated per pixel.

Now this is only filling paths, which is the easy part. Stroking paths is much more difficult. Full SVG support has both and much more.

> In fact, you could likely use the geometry stage to create arbitrarily dense vertices based on path data passed to the shader without needing any new GPU features.

Geometry shaders are commonly used with stencil-then-cover to avoid a CPU preprocessing step.

But none of the GPU geometry stages (geometry, tessellation or mesh shaders) are powerful enough to deal with all the corner cases of tessellating vector graphics paths, self intersections, cusps, holes, degenerate curves etc. It's not a very parallel friendly problem.

> Why is this not done?

As I've described here: all of these ideas have been done with varying degrees of success.

> Is the CPU render still faster than these options?

No, the fastest methods are a combination of CPU preprocessing for the difficult geometry problems and GPU for blasting out the pixels.

Dwedit•1mo ago

According to the page here: https://www.humus.name/index.php?page=News&ID=228

The best way to draw a circle on a GPU is to start with a large triangle, and keep adding additional triangles on the edges until you've reached the point where you do not need to add any more triangles (smaller than a pixel)

jesse__•1mo ago

I'd put money on that the best way is actually to draw a quad, or single triangle, and draw the circle as a SDF in the fragment shader

dahart•4w ago

Fwiw, that’s not the best way to draw a circle in general, the test shows it’s the fastest way to tessellate a circle, among the methods the author tried, and using a specific GPU. You don’t have to use triangles to draw a circle, and the author didn’t try all possible tessellations, and the author there didn’t compare perf against any other method (a shader, for example), and the also didn’t investigate accuracy. Their fast method might have numerical accuracy issues with thin sliver triangles at some point.

Lichtso•1mo ago

> but [analytic anti-aliasing (aaa)] also has much better quality than what can be practically achieved with supersampling

What this statement is missing is that aaa coverage is immediately resolved, while msaa coverage is resolved later in a separate step with extra data being buffered in between. This is important because msaa is unbiased while aaa is biased towards too much coverage once two paths partially cover the same pixel. In other words aaa becomes incorrect once you draw overlapping or self-intersecting paths.

Think about drawing the same path over and over at the same place: aaa will become darker with every iteration, msaa is idempotent and will not change further after the first iteration.

Unfortunately, this is a little known fact even in the exquisite circles of 2D vector graphics people, often presenting aaa as the silver bullet, which it is not.

jesse__•1mo ago

Interestingly they do not cite calculating a signed distance to the surface of the shape as an approach to doing AA, as described in the Valve paper [1]. I suppose this is more targeted at offline baking, but given they're suggesting iterating every curve at every pixel, I'm not sure why you wouldn't.

[1] https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007...

reallynattu•1mo ago

For anyone looking at this space: ThorVG is worth checking out.

Open-source vector engine with GPU backends (WebGPU, OpenGL), runs on microcontrollers to browsers. Now a Linux Foundation project.

https://github.com/thorvg/thorvg

(Disclosure: CTO at LottieFiles, we build and maintain ThorVG in-house, with community contributions from individuals and companies like Canva)

erichocean•4w ago

How does ThorVG's GPU implementation compare to Impeller (Flutter's new-ish GPU rendering backend)?

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Dark Alley Mathematics

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Hackers (1995) Animated Experience

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

PC Floppy Copy Protection: Vault Prolok

Show HN: ARM64 Android Dev Kit

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Delimited Continuations vs. Lwt for Threads

Why I Joined OpenAI

How to effectively write quality code with AI

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Introducing the Developer Knowledge API and MCP Server

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Learning from context is harder than we thought

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

FORTH? Really!?

WebView performance significantly slower than PWA

I'm going to cure my girlfriend's brain tumor

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

How we made geo joins 400× faster with H3 indexes

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Show HN: I spent 4 years building a UI design tool with only the features I use

Dark Alley Mathematics

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Hackers (1995) Animated Experience

Show HN: If you lose your memory, how to regain access to your computer?

An Update on Heroku

PC Floppy Copy Protection: Vault Prolok

Show HN: ARM64 Android Dev Kit

Unseen Footage of Atari Battlezone Arcade Cabinet Production

Delimited Continuations vs. Lwt for Threads

Why I Joined OpenAI

How to effectively write quality code with AI

Female Asian Elephant Calf Born at the Smithsonian National Zoo

Introducing the Developer Knowledge API and MCP Server

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Learning from context is harder than we thought

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

FORTH? Really!?

WebView performance significantly slower than PWA

I'm going to cure my girlfriend's brain tumor

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

Vector graphics on GPU

Comments