Show HN: Axiom – C++ tensor library with NumPys API, optimized for Apple Silicon

https://github.com/Frikallo/axiom

2•noahkay13•1h ago

I kept hitting the same wall: prototype something in NumPy or PyTorch, then rewrite it in C++ for edge deployment. The rewrite always took longer than the original work. Eigen's fixed-size matrix API doesn't map to tensor workloads, xtensor is CPU-only with compile-time templated types that produce unreadable errors, and none of them have GPU support on Mac. Worse, Eigen was often slower than the Python version because PyTorch bundles optimized BLAS while Eigen uses its own limited implementation.

So I built Axiom to make that rewrite mechanical. The API mirrors NumPy/PyTorch as closely as I could — same method names, broadcasting rules, operator overloading, dynamic shapes, runtime dtypes. Code that looks like this in PyTorch:

    scores = Q.matmul(K.transpose(-2, -1)) / math.sqrt(64)
    output = scores.softmax(-1).matmul(V)

looks like this in Axiom:

    auto scores = Q.matmul(K.transpose(-2, -1)) / std::sqrt(64.0f);
    auto output = scores.softmax(-1).matmul(V);

No mental translation. No debugging subtle API differences.

What's in the box (28k LOC):

- 100+ operations: arithmetic, reductions, activations (relu, gelu, silu, softmax), pooling, FFT, full LAPACK linear algebra (SVD, QR, Cholesky, eigendecomposition, solvers) - Metal GPU via MPSGraph — all ops run on GPU, not just matmul. Compiled graphs are cached by (shape, dtype) to avoid recompilation - Seamless CPU ↔ GPU: `auto g = tensor.gpu();` — unified memory on Apple Silicon avoids copies entirely - Built-in einops: `tensor.rearrange("b h w c -> b c h w")` - Highway SIMD across architectures (NEON, AVX2, AVX-512, SSE, WASM, RISC-V) - Runtime dtypes via variant (readable errors, not template explosions) - Row-major default, column-major supported via as_f_contiguous() - Works on macOS, Linux, Windows, and WebAssembly

Performance on M4 Pro (vs Eigen with OpenBLAS, PyTorch, NumPy):

- Matmul 2048×2048: 3,196 GFLOPS (Eigen 2,911 / PyTorch 2,433) - ReLU 4096×4096: 123 GB/s (Eigen 117 / PyTorch 70) - FFT2 2048×2048: 14.9ms (PyTorch 27.6ms / NumPy 63.5ms)

To try it:

    git clone https://github.com/frikallo/axiom.git
    cd axiom && make release

Or add to your CMake project via FetchContent. Example files in examples/.

Happy to answer questions about the internals or take feedback on the API.

Comments

DenisDolya•6m ago

This is an impressive piece of engineering, no doubt. The API is clean, performance work is serious, and it’s clear a lot of effort went into making this fast. But let’s be honest: without autograd and a real training ecosystem, this is not a PyTorch replacement, it’s a very nice numerical toolbox. Also, tying GPU acceleration mostly to Metal makes this far less useful outside the Apple ecosystem. Right now, it looks like a technically excellent project searching for its real-world niche. If you add proper differentiation, broader GPU support, and prove that this scales with real users, then it could become something truly important. Until then, it’s great work — but not a revolution.

AI Incident Roundup – November and December 2025 and January 2026

TikTok hit with charges by EU for its addictive features

Signy: Signed URLs for Small Devices

Shadcn/Create in Figma

Show HN: Langraph Networks as Equations

Bui – TUI for painless Bubblewrap sandboxing

John Haugeland on the failure of micro-worlds

EC preliminarily finds TikTok's addictive design is Digital Services Act breach

Ask HN: XCancel but for Instagram?

Toyota retains top auto crown in 2025 with record sales

Iranian regime propaganda floods Wikipedia

Show HN: Gazill – Save your code, it's live. Built for vibe coders and agents

Show HN: Grok Prompts – AI image and video generator with 500 curated prompts

Library of Babel 3D

Amara

A curated list of best Python books

Monty – A minimal, secure Python interpreter written in Rust for use by AI

A startup copied my landing page (and then gave me great feedback on it)

US Immigration on the Easiest Setting

Trois-Rivières, le jeu vidéO

Show HN: PromptHub – 2000 Free AI Prompts for ChatGPT and Midjourney

First Proof: Research-Level Math for AI Evaluation

Show HN: An app to use Instagram without Reels on iOS

Need feedback for AI tool that lets non-technical users query Postgres

Ask HN: What you want in a travel planner app?

Western Digital details 14-platter 3.5-inch HAMR HDD designs with 140 TB plus

So, your developers use AI now – here's what to know

Against Markdown

What causes surprise infra costs in your org?

Show HN: Nano Banana Presentation Editor