frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

https://github.com/fokdelafons/lustra
1•fokdelafons•48s ago•0 comments

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•2m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•4m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•8m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•8m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•9m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•9m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•11m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•14m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•14m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•19m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•19m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•20m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
2•Brajeshwar•21m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•22m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•22m ago•1 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
8•c420•23m ago•1 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•23m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
3•HotGarbage•23m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•23m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•25m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
4•surprisetalk•29m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
4•TheCraiggers•30m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•30m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
14•doener•31m ago•2 comments

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

https://github.com/vgrippa/myflames
1•tanelpoder•32m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•32m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•33m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•34m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
4•elsewhen•37m ago•0 comments
Open in hackernews

Calculating the Fibonacci numbers on GPU

https://veitner.bearblog.dev/calculating-the-fibonacci-numbers-on-gpu/
42•rbanffy•7mo ago

Comments

lb-guilherme•7mo ago
The algorithm here, fast doubling, is sequential (there is no parallelism) and is quite fast, with less than a hundred operations to arrive to the final result. This runs in nanosecond on a CPU and the benchmark in the article is mostly measuring the communication time rather than actual computation time.
Someone•7mo ago
They also do not make use of the fact that matrix multiplication is associative, so M⁴ = M² × M², M⁸ = M⁴ × M⁴, etc. That means one can compute F32 using 5 matrix multiplications. This uses 31.

It wouldn’t surprise me at all if doing those 5 matrix multiplications on a CPU were faster than doing 31 on the GPU.

Also, FTA: “To calculate extremly large fibonacci numbers we need to avoid integer overflow. A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication“

⇒ This also is NOT calculating the Fibonacci numbers, copping out when things get difficult on a GPU.

If you want to compute Fn mod some constant for 1 ≤ n ≤ some large constant fast on a GPU, I think I would compute quite a few of the intermediate matrices using the fast way, then spin up some ‘threads’ to compute the intermediate versions by copying those with the ‘standard’ matrix.

KeplerBoy•7mo ago
The matrix in question is tiny (2x2). There's no hope a GPU would ever outperform a CPU on that.

It only gets interesting if you need a large matmuls or millions of small matmuls in parallel.

meindnoch•7mo ago
They are unnecessarily filling a 99999999-sized array, only to print the last element and throw away the rest. Most of the time is going to be spent on filling this 400MB array.

Unintentionally, the article is a good cautionary tale of how algorithmic knowledge can beat raw hardware power, if you know what you're doing.

CBLT•7mo ago
Writing C code that performs this fast doubling algorithm for Fibonacci numbers was actually quite fun. Highly recommend it.

In my experience where I didn't modulo the numbers, the compute time was dominated by the last couple of multiplications with Megabyte-sized integers.

eesmith•7mo ago
> Using the power of GPUs we are able to calculate F99999999 in only 17 milliseconds on my Consumer GPU

FWIW, on my 2020 era laptop, the following:

  #include <stdio.h>
  
  int main(void) {
    int a = 1, b = 1, c, n = 99999999;
    while (--n) {
      c = (a+b) % 9837;
      a = b;
      b = c;
    }
    printf("%d\n", a);
    return 0;
  }
takes 330 ms measured as "time a.out".

The problem with Fibonacci is there are algorithmic ways to speed things up. The following (using the method at https://en.wikipedia.org/wiki/Fibonacci_sequence to compute Fibonacci numbers recursively in O(log n) arithmetic operations) takes Python 0.1 ms to compute the same value:

  import functools

  @functools.cache
  def F(n):
      if n <= 2:
          return 1

      m = n // 2 # rounds down
      if n % 2:
          # odd
          return (F(m+1)**2 + F(m)**2) % 9837
      else:
          return ((2*F(m+1) - F(m))*F(m)) % 9837

  print(F(99999999))
amelius•7mo ago
Maybe a better test is digits of Pi?

Or recursive hashing?

staunton•7mo ago
A test of what?
amelius•7mo ago
Test of your GPU approach/library versus CPU.
eesmith•7mo ago
I think the author is more interested in showing how to implement a certain problem using a GPU approach than to test a GPU approach vs. a CPU one.

As mentioned, the topic comes from an end-of-chapter problem set on prefix sums, at https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf .

A prefix sum cam be implemented using a GPU, as demonstrated.

However, using a prefix sum may not be the best way to compute Fibonacci numbers. As demonstrated.

simon_vtr•7mo ago
yea, i wrote this blogpost rather to show how to use scan in different ways than the canonical example of calculating prefix sum of a vector shown in introductions on gpu programming.
qsort•7mo ago
Well, there's a closed formula right? I wonder if it's faster to carry out the computation in Z[sqrt(5]).
seanhunter•7mo ago
Yes. Binet’s formula https://mathworld.wolfram.com/BinetsFormula.html
qsort•7mo ago
Yeah, there's a similar formula for all homogeneous linear recurrences if I'm not mistaken, but you can't evaluate it naively with floating point arithmetic. We can view (1 + sqrt(5)) and (1 - sqrt(5)) as elements of Z[sqrt(5)] though. It's probably extremely annoying to write because you have to implement the arithmetic operations in that field, but it wouldn't surprise me if it had faster runtime.

Probably Claude can do it? Maybe I'll give it a try later :)

meindnoch•7mo ago
Good luck calculating (1+sqrt(5))^n in Z[sqrt(5)].
meindnoch•7mo ago
They are calculating F(99999999) mod 9837. This can be done by 54 matrix multiplications of 2x2 matrices. It takes nanoseconds on a CPU. There's absolutely no point in using the GPU for this.
Strilanc•7mo ago
Yeah, I measure ~250 nanoseconds on CPU single shot. ~125 nanos amortized if repeated 100 times. It's fast enough that I'm not sure I'm benchmarking the method, as opposed to overheads like reading the time.

    #include <stdio.h>
    #include <time.h>
    
    int fib_mod_9837(int n) {
        int a = 0;
        int b = 1;
        int k = 1;
        while (k < n) {
            k <<= 1;
        }
        while (k) {
            int a2 = a*a;
            int b2 = b*b;
            int ab2 = a*b << 1;
            a = a2 + ab2;
            b = a2 + b2;
            if (n & k) {
                int t = a;
                a += b;
                b = t;
            }
            a %= 9837;
            b %= 9837;
            k >>= 1;
        }
        return a;
    }
    
    int main() {
        struct timespec begin, end;
        clock_gettime(CLOCK_MONOTONIC_RAW, &begin);
        int result = fib_mod_9837(99999999);
        clock_gettime(CLOCK_MONOTONIC_RAW, &end);
        printf("%lld\n", result);
        printf("elapsed nanos: %lld\n", (end.tv_nsec - begin.tv_nsec) + (end.tv_sec  - begin.tv_sec)*1000000000);
    }
threatofrain•7mo ago
What's wrong with the closed form formula?
meindnoch•7mo ago
The closed-form formula won't work for F(99999999) unless you're using an arbitrary-precision floating point library, and even then, you'd have trouble getting an integer result due to the transcendental sqrt(5) factors.
Strilanc•7mo ago
Ah yes, another entry in the "I'll compute Fibonacci fast!... oh no" genre.

My favorite of the genre so far is https://www.youtube.com/watch?v=KzT9I1d-LlQ , which tries to compute a big Fibonacci number in under a second. It doesn't do the modulus, so it has to do big integer arithmetic, and so is really about doing big multiplications with FFTs. But then it funnily spins off into a series of videos as the author keeps dryly realizing their code is extremely wasteful (ending in finally just using GMP).

0xf00ff00f•7mo ago
Great video, thanks for the link!
foobarian•7mo ago
Reminds me of the opening in a war story in the "Algorithm Design Manual."

> That look in his eyes should have warned me even before he started talking. “We want to use a parallel supercomputer for a numerical calculation up to 1,000,000,000, but we need a faster algorithm to do it.” I’d seen that distant look before. Eyes dulled from too much exposure to the raw horsepower of supercomputers - machines so fast that brute force seemed to eliminate the need for clever algorithms; at least until the problems got hard enough.