frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
209•theblazehen•2d ago•63 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
686•klaussilveira•15h ago•204 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
959•xnx•20h ago•553 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
127•matheusalmeida•2d ago•35 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
65•videotopia•4d ago•3 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
28•kaonwarb•3d ago•24 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
44•jesperordrup•5h ago•23 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
236•isitcontent•15h ago•26 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
8•matt_d•3d ago•2 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
230•dmpetrov•15h ago•122 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
334•vecti•17h ago•146 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
26•speckx•3d ago•16 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
499•todsacerdoti•23h ago•244 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
384•ostacke•21h ago•97 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
360•aktau•21h ago•183 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
295•eljojo•18h ago•186 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
421•lstoll•21h ago•280 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
67•kmm•5d ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
95•quibono•4d ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
21•bikenaga•3d ago•11 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
262•i5heu•18h ago•211 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
38•gmays•10h ago•13 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1074•cdrnsf•1d ago•460 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
61•gfortaine•13h ago•27 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
294•surprisetalk•3d ago•45 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
153•vmatsiiako•20h ago•72 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
14•1vuio0pswjnm7•1h ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
159•SerCe•11h ago•146 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
187•limoce•3d ago•103 comments
Open in hackernews

Calculating the Fibonacci numbers on GPU

https://veitner.bearblog.dev/calculating-the-fibonacci-numbers-on-gpu/
42•rbanffy•7mo ago

Comments

lb-guilherme•7mo ago
The algorithm here, fast doubling, is sequential (there is no parallelism) and is quite fast, with less than a hundred operations to arrive to the final result. This runs in nanosecond on a CPU and the benchmark in the article is mostly measuring the communication time rather than actual computation time.
Someone•7mo ago
They also do not make use of the fact that matrix multiplication is associative, so M⁴ = M² × M², M⁸ = M⁴ × M⁴, etc. That means one can compute F32 using 5 matrix multiplications. This uses 31.

It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.

Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“

⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.

If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.

KeplerBoy•7mo ago
The matrix in question is tiny (2x2). There's no hope a GPU would ever outperform a CPU on that.

It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.

meindnoch•7mo ago
They are unnecessarily filling a 99999999-sized array, only to print the last element and throw away the rest. Most of the time is going to be spent on filling this 400MB array.

Unintentionally, the article is a good cautionary tale of how algorithmic knowledge can beat raw hardware power, if you know what you're doing.

CBLT•7mo ago
Writing C code that performs this fast doubling algorithm for Fibonacci numbers was actually quite fun. Highly recommend it.

In my experience where I didn't modulo the numbers, the compute time was dominated by the last couple of multiplications with Megabyte-sized integers.

eesmith•7mo ago
> Using the power of GPUs we are able to calculate F99999999 in only 17 milliseconds on my Consumer GPU

FWIW, on my 2020 era laptop, the following:

  #include <stdio.h>
  
  int main(void) {
    int a = 1, b = 1, c, n = 99999999;
    while (--n) {
      c = (a+b) % 9837;
      a = b;
      b = c;
    }
    printf("%d\n", a);
    return 0;
  }
takes 330 ms measured as "time a.out".

The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:

  import functools

  @functools.cache
  def F(n):
      if n <= 2:
          return 1

      m = n // 2 # rounds down
      if n % 2:
          # odd
          return (F(m+1)**2 + F(m)**2) % 9837
      else:
          return ((2*F(m+1) - F(m))*F(m)) % 9837

  print(F(99999999))
amelius•7mo ago
Maybe a better test is digits of Pi?

Or recursive hashing?

staunton•7mo ago
A test of what?
amelius•7mo ago
Test of your GPU approach/library versus CPU.
eesmith•7mo ago
I think the author is more interested in showing how to implement a certain problem using a GPU approach than to test a GPU approach vs. a CPU one.

As mentioned, the topic comes from an end-of-chapter problem set on prefix sums, at https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf .

A prefix sum cam be implemented using a GPU, as demonstrated.

However, using a prefix sum may not be the best way to compute Fibonacci numbers. As demonstrated.

simon_vtr•7mo ago
yea, i wrote this blogpost rather to show how to use scan in different ways than the canonical example of calculating prefix sum of a vector shown in introductions on gpu programming.
qsort•7mo ago
Well, there's a closed formula right? I wonder if it's faster to carry out the computation in Z[sqrt(5]).
seanhunter•7mo ago
Yes. Binet’s formula https://mathworld.wolfram.com/BinetsFormula.html
qsort•7mo ago
Yeah, there's a similar formula for all homogeneous linear recurrences if I'm not mistaken, but you can't evaluate it naively with floating point arithmetic. We can view (1 + sqrt(5)) and (1 - sqrt(5)) as elements of Z[sqrt(5)] though. It's probably extremely annoying to write because you have to implement the arithmetic operations in that field, but it wouldn't surprise me if it had faster runtime.

Probably Claude can do it? Maybe I'll give it a try later :)

meindnoch•7mo ago
Good luck calculating (1+sqrt(5))^n in Z[sqrt(5)].
meindnoch•7mo ago
They are calculating F(99999999) mod 9837. This can be done by 54 matrix multiplications of 2x2 matrices. It takes nanoseconds on a CPU. There's absolutely no point in using the GPU for this.
Strilanc•7mo ago
Yeah, I measure ~250 nanoseconds on CPU single shot. ~125 nanos amortized if repeated 100 times. It's fast enough that I'm not sure I'm benchmarking the method, as opposed to overheads like reading the time.

    #include <stdio.h>
    #include <time.h>
    
    int fib_mod_9837(int n) {
        int a = 0;
        int b = 1;
        int k = 1;
        while (k < n) {
            k <<= 1;
        }
        while (k) {
            int a2 = a*a;
            int b2 = b*b;
            int ab2 = a*b << 1;
            a = a2 + ab2;
            b = a2 + b2;
            if (n & k) {
                int t = a;
                a += b;
                b = t;
            }
            a %= 9837;
            b %= 9837;
            k >>= 1;
        }
        return a;
    }
    
    int main() {
        struct timespec begin, end;
        clock_gettime(CLOCK_MONOTONIC_RAW, &begin);
        int result = fib_mod_9837(99999999);
        clock_gettime(CLOCK_MONOTONIC_RAW, &end);
        printf("%lld\n", result);
        printf("elapsed nanos: %lld\n", (end.tv_nsec - begin.tv_nsec) + (end.tv_sec  - begin.tv_sec)*1000000000);
    }
threatofrain•7mo ago
What's wrong with the closed form formula?
meindnoch•7mo ago
The closed-form formula won't work for F(99999999) unless you're using an arbitrary-precision floating point library, and even then, you'd have trouble getting an integer result due to the transcendental sqrt(5) factors.
Strilanc•7mo ago
Ah yes, another entry in the "I'll compute Fibonacci fast!... oh no" genre.

My favorite of the genre so far is https://www.youtube.com/watch?v=KzT9I1d-LlQ , which tries to compute a big Fibonacci number in under a second. It doesn't do the modulus, so it has to do big integer arithmetic, and so is really about doing big multiplications with FFTs. But then it funnily spins off into a series of videos as the author keeps dryly realizing their code is extremely wasteful (ending in finally just using GMP).

0xf00ff00f•7mo ago
Great video, thanks for the link!
foobarian•7mo ago
Reminds me of the opening in a war story in the "Algorithm Design Manual."

> That look in his eyes should have warned me even before he started talking. “We want to use a parallel supercomputer for a numerical calculation up to 1,000,000,000, but we need a faster algorithm to do it.” I’d seen that distant look before. Eyes dulled from too much exposure to the raw horsepower of supercomputers - machines so fast that brute force seemed to eliminate the need for clever algorithms; at least until the problems got hard enough.