I built this project because I wanted to understand the low-level mechanics of LLMs and how FFI overhead differs between languages.
Some key takeaways:
Architecture: It's a 6.9B MoE model implemented purely in Rust, Go, and Python.
Shared CUDA: All three languages bind to the exact same CUDA kernels (no PyTorch/TensorFlow).
Performance: I was surprised to see how Go handles cgo overhead compared to Rust's FFI in this specific workload.
I know it's reinventing the wheel, but it was a great way to learn. Happy to answer any questions about the implementation or the FFI architecture!
fumi2026•1h ago
I built this project because I wanted to understand the low-level mechanics of LLMs and how FFI overhead differs between languages.
Some key takeaways:
Architecture: It's a 6.9B MoE model implemented purely in Rust, Go, and Python.
Shared CUDA: All three languages bind to the exact same CUDA kernels (no PyTorch/TensorFlow).
Performance: I was surprised to see how Go handles cgo overhead compared to Rust's FFI in this specific workload.
I know it's reinventing the wheel, but it was a great way to learn. Happy to answer any questions about the implementation or the FFI architecture!