Benchmarks (Intel i7, 4 cores, PyTorch 2.8.0):
resample 512³→256³ trilinear: 34 ms vs 55 ms (1.6×) area mode: 65 ms vs 613 ms (9.5×) — PyTorch doesn't parallelize this well int16 nearest: 8 ms vs 93 ms (11×) — PyTorch has no native int16 path (even 13x on single thread) grid_sample 128³: 38 ms vs 169 ms (4.4×) The main wins come from: pre-computed index tables, fused-type specialization (no dtype casting), branchless inner loops, and OpenMP parallelization that actually scales for single-image workloads.
No GPU, no autograd, float32-only for interpolation — just fast CPU resampling with a 2-function API.
pip install volresample
GitHub: https://github.com/JoHof/volresample
If you find it interesting, I wrote about the motivation and some implementation details here: https://johof.github.io/2026/02/volresample-3d-volume-resamp...