https://github.com/ratulb/tenmo
Tenmo focuses on:
SIMD-optimization explicit memory layout zero-copy views a minimal but practical autograd system Status: Tenmo evolves alongside Mojo itself. APIs may change. Not production-ready yet.
Performance MNIST (4-layer MLP, 105K params, 15 epochs) Platform Device Avg Epoch Total Test Acc Tenmo CPU (Mojo) 11.4s 171s 97.44% PyTorch CPU 14.5s 218s 98.26% PyTorch GPU (Tesla T4) 15.2s 227s 97.87% Notes
Tenmo uses SIMD-vectorized kernels on contiguous buffers. No BLAS was used in the MNIST run — everything executes as pure Mojo code. GPU overhead dominates for models of this size; larger models benefit more from GPU acceleration. Quick Example from testing import assert_true from tenmo import Tensor
fn main() raises: var a = Tensor.d1([1.0, 2.0, 3.0], requires_grad=True)
var b = a * 2
var c = a * 3
var d = b + c
d.backward()
assert_true(a.grad().all_close(Tensor.d1([5.0, 5.0, 5.0])))
Feedback highly appreciated!
DenisDolya•1h ago