I implemented a neural network from scratch in x86 assembly (no frameworks, no Python) to recognize handwritten digits from MNIST.
Feedback on performance optimizations or next steps is welcome
Uses AVX-512 SIMD for parallel float32 ops (~7× faster than NumPy).
Runs in a lightweight Debian Slim Docker container.
The goal was to understand neural networks at the CPU level.
checker659•11h ago
> ~7× faster than NumPy
Is that on the CPU (not sure if NumPy has a GPU backend)
mghaderi•11h ago
checker659•11h ago
Is that on the CPU (not sure if NumPy has a GPU backend)
mghaderi•11h ago