FWIW, on my 2020 era laptop, the following:
#include <stdio.h>
int main(void) {
int a = 1, b = 1, c, n = 99999999;
while (--n) {
c = (a+b) % 9837;
a = b;
b = c;
}
printf("%d\n", a);
return 0;
}
takes 330 ms measured as "time a.out".The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:
import functools
@functools.cache
def F(n):
if n <= 2:
return 1
m = n // 2 # rounds down
if n % 2:
# odd
return (F(m+1)**2 + F(m)**2) % 9837
else:
return ((2*F(m+1) - F(m))*F(m)) % 9837
print(F(99999999))Or recursive hashing?
As mentioned, the topic comes from an end-of-chapter problem set on prefix sums, at https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf .
A prefix sum cam be implemented using a GPU, as demonstrated.
However, using a prefix sum may not be the best way to compute Fibonacci numbers. As demonstrated.
Probably Claude can do it? Maybe I'll give it a try later :)
#include <stdio.h>
#include <time.h>
int fib_mod_9837(int n) {
int a = 0;
int b = 1;
int k = 1;
while (k < n) {
k <<= 1;
}
while (k) {
int a2 = a*a;
int b2 = b*b;
int ab2 = a*b << 1;
a = a2 + ab2;
b = a2 + b2;
if (n & k) {
int t = a;
a += b;
b = t;
}
a %= 9837;
b %= 9837;
k >>= 1;
}
return a;
}
int main() {
struct timespec begin, end;
clock_gettime(CLOCK_MONOTONIC_RAW, &begin);
int result = fib_mod_9837(99999999);
clock_gettime(CLOCK_MONOTONIC_RAW, &end);
printf("%lld\n", result);
printf("elapsed nanos: %lld\n", (end.tv_nsec - begin.tv_nsec) + (end.tv_sec - begin.tv_sec)*1000000000);
}My favorite of the genre so far is https://www.youtube.com/watch?v=KzT9I1d-LlQ , which tries to compute a big Fibonacci number in under a second. It doesn't do the modulus, so it has to do big integer arithmetic, and so is really about doing big multiplications with FFTs. But then it funnily spins off into a series of videos as the author keeps dryly realizing their code is extremely wasteful (ending in finally just using GMP).
> That look in his eyes should have warned me even before he started talking. “We want to use a parallel supercomputer for a numerical calculation up to 1,000,000,000, but we need a faster algorithm to do it.” I’d seen that distant look before. Eyes dulled from too much exposure to the raw horsepower of supercomputers - machines so fast that brute force seemed to eliminate the need for clever algorithms; at least until the problems got hard enough.
lb-guilherme•7mo ago
Someone•7mo ago
It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.
Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“
⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.
If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.
KeplerBoy•7mo ago
It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.
meindnoch•7mo ago
Unintentionally, the article is a good cautionary tale of how algorithmic knowledge can beat raw hardware power, if you know what you're doing.
CBLT•7mo ago
In my experience where I didn't modulo the numbers, the compute time was dominated by the last couple of multiplications with Megabyte-sized integers.