Ultimately, I want to Klartraum to be an extensive neural network and rendering inference engine (no training, backprop, autograd) running on embedded devices and VR headsets with high performance. My major obstacle right now is that writing GLSL compute kernels is tedious compared to CUDA ...
What do you think should be added so the library would be of use for others?