VectorWare – from creators of `rust-GPU` and `rust-CUDA`

https://www.vectorware.com/blog/announcing-vectorware/

82•ashvardanian•3mo ago

Comments

billconan•3mo ago

after reading this page, I still don't know what gpu native ware they want to work on.

> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on.

I feel that this is due to the current hardware architecture, not the fault of software.

LegNeato•3mo ago

We have some demos coming in the next couple weeks. The hardware is there, the software isn't!

zozbot234•3mo ago

What does this mean for the rust-gpu and rust-cuda projects themselves? Will they go unmaintained now that the creators are running a business?

(Don't miss the "Pedantic mode" switch on the linked page, it adds relevant and detailed footnotes to the blog post.)

LegNeato•3mo ago

We are investing in them and they form the basis of what we are doing. That being said, we are also exploring other technical avenues with different tradeoffs...we don't want to assume a solution merely because we are familiar with them.

wrs•3mo ago

Be sure to turn on "pedantic mode" to get the footnotes that make this make more sense. Some examples of what this means by "applications" would help. I don't think the prediction here is that Excel's main event loop is going to run on the GPU, but I can see that its calculation engine might.

LegNeato•3mo ago

More software than you think can run fully on the GPU, especially with datacenter cards. We'll be sharing some demos in the coming weeks.

simonask•3mo ago

With current GPU architectures, this seems unlikely. Like, you would need a ton of cells with almost perfectly aligned inputs before even the DMA bus roundtrip is worth it.

We’re talking at least hundreds of thousands of cells, depending on the calculation, or at least a number that will make the UI very sad long before you’ll see a slowdown from calculation.

Databases, on the other hand…

zozbot234•3mo ago

There isn't always a DMA roundtrip; unified memory is a thing. But programming for the GPU is very awkward at a systems level. Even with unified memory, there is generally no real equivalent to virtual memory or mmap() so you have to shuffle your working set in and out of VRAM by hand anyway (i.e. backing and residency is managed explicitly, even with "sparse" allocation api's that might otherwise be expected to ease some of the work). Better GPU drivers may be enough to mitigate this, along with broad-based standardization of some current vendor-specific extensions (it's not clear that real HW changes are needed) but this creates a very real limitation in the scale of software (including the AI kind) you can realistically run on any given device.

the__alchemist•3mo ago

`rust-GPU` and `rust-CUDA` fall in the category to me of "Rust is great, let's build the X ecosystem in rust". Meanwhile, it's been in a broken and dormant state for years. There was a leadership/dev change recently, (Are the creators of VectorWare the creators of Rust-CUDA, or the new leaders?), and more activity. I haven't tried since.

If you have a Rust application or library and want to use the GPU, these approaches are comparatively smooth:

  - WGPU: Great for 3D graphics
  - Ash and other Vulkan bindings: Low-level graphics bindings
  - Cudarc: Nice API for running CUDA kernels.

I am using WGPU and Cudarc for structural biology + molecular dynamics computations, and they work well.

Rust - CUDA feels like lots-of-PR, but not as good of a toolkit as these quieter alternatives. What would be cool for them to deliver, and I think is in their objectives: Cross-API abstractions, so you could, for example, write code that runs on Vulkan Compute in addition to CUDA.

Something else that would be cool: High-level bindings to cuFFT and vkFFT. You can FFI them currently, but that's not ideal. (Not too bad to impl though, if you're familiar with FFI syntax and the `cc` crate)

LegNeato•3mo ago

Yes, it is all these folks getting together and getting resources to push those projects to the next level: https://www.vectorware.com/team/

wgpu, ash, and cudarc are great. We're focusing on the actual code that runs on the GPU in Rust, and we work with those projects. We have cust in rust-cuda, but that existed before cudarc and we have been seriously discussing just killing it in favor of cudarc.

jjallen•3mo ago

+1 for cudarc. I've been using it for a couple of years now and has worked great. I'm using it for financial markets backtesting.

LegNeato•3mo ago

Pedantic note: rust-cuda was created by https://github.com/RDambrosio016 and he is not currently involved in VectorWare. rust-gpu was created by the folks at embark software. We are the current maintainers of both.

We didn't post this or the title, we would never claim we created the projects from scratch.

ashvardanian•3mo ago

My bad! "contributors" is more accurate, but HN doesn't allow editing titles, sadly :(

LegNeato•3mo ago

No worries, just wanted to correct it for folks. Thanks for posting!

kibwen•3mo ago

HN allows the submitter to edit the title, at least it did last time I checked.

pjmlp•3mo ago

It still does, but you have a timeout for the first set of minutes after submission.

I routinely have to fix the autoformating done by HN.

Keyframe•3mo ago

folks at embark software

seems like embark has disembarked from Rust and support for it altogether

LegNeato•3mo ago

One of the founders here, feel free to ask whatever. We purposefully didn't put much technical detail in the post as it is an announcement post (other people posted it here, we didn't).

structural•3mo ago

1. What does it mean to be a GPU-native process?

2. Can modern GPU hardware efficiently make system calls? (if you can do this, you can eventually build just about anything, treating the CPU as just another subordinate processor).

3. At what order-of-magnitude size might being GPU-native break down? (Can CUDA dynamically load new code modules into an existing process? That used to be problematic years ago)

Thinking about what's possible, this looks like an exceptionally fun project. Congrats on working on an idea that seems crazy at first glance but seems more and more possible the more you think about it. Still it's all a gamble of whether it'll perform well enough to be worth writing applications this way.

LegNeato•3mo ago

1. The GPU owns the control loop And the only sparingly kicks to the CPU when it can't do something.

2. Yes

3. We're still investigating the limitations. A lot of them are hardware dependent, obviously data center cards have higher limits more capability than desktop cards.

Thanks! It is super fun trailblazing and realizing more of the pieces are there than everybody expects.

jiehong•3mo ago

Sounds interesting.

But, languages like Java or python simply lack even programming constructs to program on GPUs easily.

No standardised ISA on GPUs also mean compilers can’t really provide a translation layer.

Let’s hope things get better over time!

binarymax•3mo ago

Python has decorators which can be used to add sugar to methods for things like true parallelization. For example, see modal.com’s Python snippets.

https://modal.com/docs/examples/batched_whisper

LegNeato•3mo ago

You might be interested in a previous blog post where we showed one codebase running on many types of GPUs: https://rust-gpu.github.io/blog/2025/07/25/rust-on-every-gpu...

cutlilacs•3mo ago

> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on. For example, pytorch uses the CPU by default and GPU acceleration is opt-in. Even after opting in, the CPU is in control and orchestrates work on the GPU. Furthermore, if you look at the software kernels that run on the GPU they are simplistic with low cyclomatic complexity. This is not unique to pytorch. Most software is CPU-only, a small subset is GPU-aware, an even smaller subset is GPU-only, and no software is GPU-native.

> We are building software that is GPU-native. We intend to put the GPU in control. This does not happen today due to the difficulty of programming GPUs, the immaturity of GPU software and abstractions, and the relatively few developers targeting GPUs.

Really feels like fad engineering. The CPU works better as a control structure and the design of GPUs are not fitted for proper orchestration compared to CPUs. What really worries me is their mention of GPU abstractions, which is completely the wrong way to think about hardware designed for HPC. Their point about PyTorch. and kernels having low cyclomatic complexity is confusing to me. GPUs aren't optimize for control flow. The nature of SIMD/SIMT values throughput and the hardware design forgoes things like branch prediction. Having many independent paths a GPU kernel could take would make it perform much worse. You could very well end up with kernels that are slower than their optimize CPU counterparts.

I'm sure the people behind this are talented and know what they're doing, but these statements don't make sense to me. GPU algorithms are harder to reason about and implement. You often need to do more work just to gain the parallizable benefit. There aren't actually that many use cases were the GPU being the primary compute platform is a better choice. My cynical view is that people like the GPU because they compare unoptimize slow CPU code with decent GPU/tensorized code. They never see how much a modern CPU can actually do, and how fast it can be.

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

The Highest Exam: How the Gaokao Shapes China

Open-source framework for tracking prediction accuracy

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

The Highest Exam: How the Gaokao Shapes China

Open-source framework for tracking prediction accuracy

India's Sarvan AI LLM launches Indic-language focused models

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

CReact Version 0.3.0 Released

Show HN: CReact – AI Powered AWS Website Generator

The rocky 1960s origins of online dating (2025)

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

Why there is no official statement from Substack about the data leak

Effects of Zepbound on Stool Quality

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

Kessler Syndrome Has Started [video]

Complex Heterodynes Explained

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

VectorWare – from creators of `rust-GPU` and `rust-CUDA`

Comments