The theoretical improvement of DMMSY over Dijkstra is O(log^{2/3} n) vs O(log n). For n = 1M, that's a ratio of ~2.7x. To get even a 10x improvement from asymptotics alone, you'd need n ≈ 2^{1000}, far beyond any graph that fits in memory (or in the observable universe, for that matter). The ~800ns figure for 1M nodes also seems physically implausible. Likely the benchmark is comparing an optimized hot path against a naive Dijkstra, or measuring something other than what it claims.
Secondarily I believe in the case of the "opt" version, the compiler might be optimising out the entire algorithm because it has no observable side-effects.
Presumably the claimed contribution was the optimized version, but note that whatever DMMSY(res) is, it should still be O(m log^{2/3} n) or whatever DMMSY's time complexity is supposed to be.
But DMMSY(res)'s run times follow Dijkstra closely in the graph.
The only conclusion is that something is off -- either the author measured the wrong thing, or he was able to optimize the implementation of the algorithm to the extent that the optimizations over-powers any asymptotic gains. Or the implementation is wrong.
At any rate, as you mentioned, the difference between `m log n` and `m log^{2/3} n` is insignificant at n=1M (and TBH won't be significant at any practical n). If the results and the implementation are both correct, it just means the author simply found a way to reduce the constant time factor of the implementation.
https://en.wikipedia.org/wiki/General_number_field_sieve
It’s such an out of nowhere part of complexity statements (what’s special about ‘x/3’?!) that i have to wonder if there’s some underlying relation.
Integer factorisation and discrete logarithm solving do look a lot like a type of search.
I feel it’s possibly similar to the moonshine theory where different fields kept producing the same (or 1 off from the same) out of nowhere number and for the longest time no one saw the link between them.
Interesting - I hadn’t heard that term. Wikipedia link for anyone curious: https://en.wikipedia.org/wiki/Monstrous_moonshine
The term was coined by John Conway of Conway’s Game of Life.
- You did this with AI. I'm not going to dock points for this, it's $current_year and it's the normal thing to do, but it would be nice to explain your process a little bit? To put it bluntly, why does this improve upon typing "hey claude, read this paper and go nuts" into the proverbial text box?
- As others are pointing out, the 20,000x figure is quite something. For "normal sized" graphs we'd expect this algorithm to be actually slower, why aren't we seeing that? Could you explain how you tested your code?
- Why aren't you benchmarking against other established libraries? It seems like the Dijkstra implementation you claim to be improving upon is also your own.
Didn't have time to read the code more in depth.
The readme is very obviously Claude-written (or a similar model - certainly not GPT), if you check enough vibecoded projects you'll easily spot those readmes.
The style of the HTML page, as noted by others.
Useless comments in the source code, which humans also do, but LLMs do more often:
// Basic random double
static inline double rand_double() { return (double)rand() / (double)RAND_MAX; }
gcc -O3 -march=native -flto -DNDEBUG src/*.c -I include -o dmmsy_bench -lm
./a.out
===========================================================================
n m Dijkstra DMMSY Res DMMSY Opt Spd(Opt)
---------------------------------------------------------------------------
1000 5000 0.0153 0.0171 0.0172 0.89 x
5000 25000 0.0006 0.0001 0.0000 16.26 x
10000 50000 1.0649 1.0729 1.0791 0.99 x
25000 125000 3.1196 3.1592 3.1291 1.00 x
50000 250000 6.9091 7.0232 7.0154 0.98 x
75000 375000 10.7737 10.9603 10.9621 0.98 x
100000 500000 14.9982 14.8546 14.8742 1.01 x
150000 750000 23.3425 23.8628 23.8416 0.98 x
200000 1000000 33.0851 33.8681 34.0710 0.97 x
250000 1250000 43.5518 45.6267 46.4157 0.94 x
350000 1750000 73.1518 75.8528 74.6965 0.98 x
500000 2500000 125.9957 130.0651 129.2500 0.97 x(I think it's unlikely that it can be straight-up optimized out because it dirties the workspace thread local, and compilers generally don't optimize out writes to thread local variables.)
Apparently all the LLMs get their design expertise and color palletes from the same place.
First, using a 4ary heap means the performance is closer to O((E+V)logV) instead of the E + VlogV you can get using a Fibonacci heap. So it removes the drawback to the new method, which is that it goes from VlogV to ELog2/3 V.
If you look at that term, it means the new method is only asymptotically faster when log1/3(n) > m/n, IF you're comparing to the actual VlogV version of Dijkstra's...
For m/n = 3, that cutover is at about 140M nodes.
Second, the "opt" seems to be applied only to the new method. There's no apples to apples comparison here.
danalec•2h ago
*Key implementation details:* - Cache-optimized CSR layout for maximum spatial locality - Zero-allocation design with pre-allocated workspaces - Recursive subproblem decomposition instead of global priority queues
*Benchmarks:* On graphs with 250k–1M nodes, this shows 20,000x+ speedups over standard Dijkstra implementations. The DMMSY core runs in ~800ns for 1M nodes.
*Paper:* https://arxiv.org/pdf/2504.17033 *GitHub:* https://github.com/danalec/DMMSY-SSSP
This is experimental—focused on correctness first. Would love feedback on edge cases, better decomposition strategies, or cache-oblivious optimizations I might have missed.
crdrost•2h ago
reactordev•1h ago
reactordev•1h ago
This will have downstream impacts for sure. Keep doing stuff like this.