Game engines and other large codebases with graphics logic are commonly written in C++, and only having to learn and write a single language is great.
Right now, shaders -- if not working with an off-the-shelf graphics abstraction -- are kind of annoying to work with. Cross-compiling to GLSL, HLSL and Metal Shading Language is cumbersome. Almost all game engines create their own shading language and code generate / compile that to the respective shading languages for specific platforms.
This situation could be improved if GPUs were more standardized and didn't have proprietary instruction sets. Similar to how CPUs mainly have x86_64 and ARM64 as the dominant instruction sets.
The alleged goal here is to match syntax of other parts of the program, and those tend to be written in C++.
can you please explain or link some sources about this?
btw, is C++ STD really bloated? There are a lot of languages that mess in std much more stuff. E.g. python. A lot of people complaining about the lack of many library features - networking, reflection, <expected> and <optional> was added too late and so on.
Java and C# only did thanks to tooling, the unavoidable presence on Android, previously J2ME, the market success with Minecraft, XNA and Unity.
Anything else that wants to take on C and C++ for those industries had to come up with similar unavoidable tooling.
Anyway, there's a good chance that I'm missing something here because there seems to be a lot of interest in writing shaders in CPU centric languages.
If the macrostructure of the operations can be represented appropriately, automatic platform-specific optimization is more approachable.
CUDA is a polyglot development stack for compute, with first party support for C, C++, Fortran, Python JIT DSL, and anything PTX. With the hardware semantics, nowadays following the C++ memory model, although it wasn't originally designed that way.
As NVidia blessed extensions for compiler backends targeting PTX, there are Haskell, .NET, Java, Julia tooling.
For whatever reason, all of that keeps being forgotten and only either C or C++ gets a mention, which is the same mistake Intel and AMD keep doing on the CUDA porting kits.
Metal Shading Language for example uses a subset of C++, and HLSL and GLSL are C-like languages.
In my view, it is nice to have an equivalent syntax and language for both CPU and GPU code, even though you still want to write simple code for GPU compute kernels and shaders.
The difference is that shader languages have a specific set of semantics, while the former still have to worry about ISO standard semantics, coupled with the extensions and broken expectations when the code takes another execution semantics from what a regular C or C++ developer would expect.
What does a "GPU centric language" look like?
The most commonly used languages in terms of GPU:
- CUDA: C++ like
- OpenCL: C like
- HLSL/GLSL: C like
OpenCL and GLSL might as well be dead given the vast difference in development resources between them and HLSL/Slang. Slang is effectively HLSL++.
Metal is the main odd man out, but is C++-like.
The module system, generics and operators definitions.
Java is "C like" and uses garbage collection for dynamic memory management. It doesn't have determistic destructors. The major idiom is inheritance and overriding virtual methods.
GLSL is "C like" and doesn’t even support dynamic memory allocation, manual or otherwise. The major idiom is an implicit fixed function pipeline that executes around your code - you don't write the whole program.
So what does "C like" actually mean? IMHO it refers to superficial syntax elements like curly braces, return type before the function name, prefix and postfix increment operators, etc. It tells you almost nothing about the semantics, which is the part that determines how code in that language will map to CPU machine code vs. a GPU IR like SPIR-V. For example, CUDA is based on C++ but it has to introduce a new memory model to match the realities of GPU silicon.
But one thing I miss in C++ compared to shaders is all the vector sizzling, like v.yxyx. I couldn't really see how they handle vectors but might have missed it.
I really wanted something that's compatible with shaders and fast so we can quickly swap between CPU and GPU because it was time consuming to port the code.
I've been down this road before. If you aren't doing SIMD it's pretty easy to implement but relies on UB that works on all the compilers I tried (C++ properties would make this better and portable). I got something working with SIMD that unfortunately doesn't compile correctly on Clang!
They're using the same "proxy object" method I was doing for their sizzling which I'm pretty sure won't work with SIMD types but would love to be proven wrong!
I haven't deep dived into the library as I'm no longer doing this kind of code.
Many years ago (approx 2011-2012) my own introduction to CUDA came by way of a neat .NET library Cudafy that allowed you to annotate certain methods in your C# code for GPU execution. Obviously the subset of C# that could be supported was quite small, but it was "the same" code you could use elsewhere, so you could test (slowly) the nominal correctness of your code on CPU first. Even now the GPU tooling/debugging is not as good, and back then it was way worse, so being able to debug/test nearly identical code on CPU first was a big help. Of course sometimes the abstraction broke down and you ended up having to look at the generated CUDA source, but that was pretty rare.
For folks who don't know: Unity.Mathematics is a package that ships a low-level math library whose types (`float2`, `float3`, `float4`, `int4x4`, etc.) are a 1-to-1 mirror of HLSL's built-in vector and matrix types. Because the syntax, swizzling, and operators are identical, any pure-math function you write in C# compiles under Burst to SIMD-friendly machine code on the CPU and can be dropped into a `.hlsl` file with almost zero edits for the GPU.
That's already been worked out to some extent with libraries such as Aparapi, although you still need to know what you're doing, and to actually need it.
Aparapi allows Java developers to take advantage of the compute power of GPU and APU devices by executing data parallel code fragments on the GPU rather than being confined to the local CPU. It does this by converting Java bytecode to OpenCL at runtime and executing on the GPU, if for any reason Aparapi can't execute on the GPU it will execute in a Java thread pool.
C++ DevEx is significantly better than ISF despite them looking very similar and it seems like less of a hurdle to get C++ to spit out an ISF compatible file than it is to build all the tools for ISF (and GLSL, HLSL, WGSL)
With IV code, that goes out the way.
[0] examples Matrix 3D shader: https://www.shadertoy.com/view/4t3BWl - Very fast procedural ocean: https://www.shadertoy.com/view/4dSBDt
How would you use shared/local memory in GLSL? What if you want to implement Kahan summation, is that possible? How's the out-of-core and multi-GPU support in GLSL?
> People don't understand
Careful pointing that finger, 4 fingers might point back... Shadertoy isn't some obscure thing no one has heard of, some of us are in the demoscene since over 20 years :)
> some of us are in the demoscene since over 20 years :)
Demoscene is different, though what I'm imagining with shadertoy and what it could be hasn't really been implemented. GLSL shaders are fully obscure outside of dev circles and that's a bummer.
In compute shaders the `shared` keyword is for this.
Hence why most companies are either using HLSL, even outside games industry, or adoption the new kid on the block Slang, which NVidia offered to Khronos as GLSL replacement.
So GLSL remains for OpenGL and WebGL and that is about it.
Sounds like ispc fits the bill: https://ispc.github.io/ispc.html#gang-convergence-guarantees
That is not entirely true, you can use phyiscal pointers with the "Buffer device address" feature. (https://docs.vulkan.org/samples/latest/samples/extensions/bu...) It was an extension, but now part of Vulkan. It is widely available on most GPUS.
This only works in buffers though. Not for images or local arrays.
Or did you mean some specific feature? I haven't used it on mobile.
There is a reason why there are some Vulkanised 2025 about improving the state of Vulkan affairs on Android.
rgbforge•7mo ago