frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

USS Preble Used Helios Laser to Zap Four Drones in Expanding Testing

https://www.twz.com/sea/uss-preble-used-helios-laser-to-zap-four-drones-in-expanding-testing
1•breve•57s ago•0 comments

Show HN: Animated beach scene, made with CSS

https://ahmed-machine.github.io/beach-scene/
1•ahmedoo•1m ago•0 comments

An update on unredacting select Epstein files – DBC12.pdf liberated

https://neosmart.net/blog/efta00400459-has-been-cracked-dbc12-pdf-liberated/
1•ks2048•1m ago•0 comments

Was going to share my work

1•hiddenarchitect•5m ago•0 comments

Pitchfork: A devilishly good process manager for developers

https://pitchfork.jdx.dev/
1•ahamez•5m ago•0 comments

You Are Here

https://brooker.co.za/blog/2026/02/07/you-are-here.html
3•mltvc•9m ago•0 comments

Why social apps need to become proactive, not reactive

https://www.heyflare.app/blog/from-reactive-to-proactive-how-ai-agents-will-reshape-social-apps
1•JoanMDuarte•10m ago•1 comments

How patient are AI scrapers, anyway? – Random Thoughts

https://lars.ingebrigtsen.no/2026/02/07/how-patient-are-ai-scrapers-anyway/
1•samtrack2019•10m ago•0 comments

Vouch: A contributor trust management system

https://github.com/mitchellh/vouch
1•SchwKatze•10m ago•0 comments

I built a terminal monitoring app and custom firmware for a clock with Claude

https://duggan.ie/posts/i-built-a-terminal-monitoring-app-and-custom-firmware-for-a-desktop-clock...
1•duggan•11m ago•0 comments

Tiny C Compiler

https://bellard.org/tcc/
1•guerrilla•13m ago•0 comments

Y Combinator Founder Organizes 'March for Billionaires'

https://mlq.ai/news/ai-startup-founder-organizes-march-for-billionaires-protest-against-californi...
1•hidden80•13m ago•1 comments

Ask HN: Need feedback on the idea I'm working on

1•Yogender78•14m ago•0 comments

OpenClaw Addresses Security Risks

https://thebiggish.com/news/openclaw-s-security-flaws-expose-enterprise-risk-22-of-deployments-un...
1•vedantnair•14m ago•0 comments

Apple finalizes Gemini / Siri deal

https://www.engadget.com/ai/apple-reportedly-plans-to-reveal-its-gemini-powered-siri-in-february-...
1•vedantnair•15m ago•0 comments

Italy Railways Sabotaged

https://www.bbc.co.uk/news/articles/czr4rx04xjpo
3•vedantnair•15m ago•0 comments

Emacs-tramp-RPC: high-performance TRAMP back end using MsgPack-RPC

https://github.com/ArthurHeymans/emacs-tramp-rpc
1•fanf2•17m ago•0 comments

Nintendo Wii Themed Portfolio

https://akiraux.vercel.app/
1•s4074433•21m ago•1 comments

"There must be something like the opposite of suicide "

https://post.substack.com/p/there-must-be-something-like-the
1•rbanffy•23m ago•0 comments

Ask HN: Why doesn't Netflix add a “Theater Mode” that recreates the worst parts?

2•amichail•24m ago•0 comments

Show HN: Engineering Perception with Combinatorial Memetics

1•alan_sass•30m ago•2 comments

Show HN: Steam Daily – A Wordle-like daily puzzle game for Steam fans

https://steamdaily.xyz
1•itshellboy•32m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
1•spenvo•32m ago•0 comments

Just Started Using AmpCode

https://intelligenttools.co/blog/ampcode-multi-agent-production
1•BojanTomic•33m ago•0 comments

LLM as an Engineer vs. a Founder?

1•dm03514•34m ago•0 comments

Crosstalk inside cells helps pathogens evade drugs, study finds

https://phys.org/news/2026-01-crosstalk-cells-pathogens-evade-drugs.html
2•PaulHoule•35m ago•0 comments

Show HN: Design system generator (mood to CSS in <1 second)

https://huesly.app
1•egeuysall•35m ago•1 comments

Show HN: 26/02/26 – 5 songs in a day

https://playingwith.variousbits.net/saturday
1•dmje•36m ago•0 comments

Toroidal Logit Bias – Reduce LLM hallucinations 40% with no fine-tuning

https://github.com/Paraxiom/topological-coherence
1•slye514•38m ago•1 comments

Top AI models fail at >96% of tasks

https://www.zdnet.com/article/ai-failed-test-on-remote-freelance-jobs/
5•codexon•38m ago•2 comments
Open in hackernews

Cache-friendly, low-memory Lanczos algorithm in Rust

https://lukefleed.xyz/posts/cache-friendly-low-memory-lanczos/
141•lukefleed•2mo ago

Comments

sfpotter•2mo ago
Nice result! Arnoldi is a beautiful algorithm, and this is a good application of it.

What are you using this for and why are you working on it?

I admit I'm not personally convinced of the value of Rust in numerics, but that's just me, I guess...

lukefleed•2mo ago
Hi there, thanks! I started doing this for a university exam and got carried away a bit.

Regarding Rust for numerical linear algebra, I kinda agree with you. I think that theoretically, its a great language for writing low-level "high-performance mathematics." That's why I chose it in the first place.

The real wall is that the past four decades of research in this area have primarily been conducted in C and Fortran, making it challenging for other languages to catch up without relying heavily on BLAS/LAPACK and similar libraries.

I'm starting to notice that more people are trying to move to Rust for this stuff, so it's worth keeping an eye open on libraries like the one that I used, faer.

sfpotter•2mo ago
Nice. I'd be curious to see if this has already been done in the literature. It is a very nice and useful result, but it also kind of an obvious one---so I have to assume people who do work on computing matrix functions are aware of it... (This is not to take anything away from the hard work you've done! You may just appreciate having a reference to any existing work that is already out there.)

Of course, what you're doing depends on the matrix being Hermitian reducing the upper Hessenberg matrix in the Arnoldi iteration to tridiagonal form. Trying to do a similar streaming computation on a general matrix is going to run into problems.

That said... one area of numerical linear algebra research which is very active is randomized numerical linear algebra. There is a paper by Nakatsukasa and Tropp ("Fast and accurate randomized algorithms for linear systems and eigenvalue problems") which presents some randomized algorithms, including a "randomized GMRES" which IIRC is compatible with streaming. You might find it interesting trying to adapt the machinery this algorithm is built on to the problem you're working on.

As for Rust, having done a lot of this research myself... there is no problem relying on BLAS or LAPACK, and I'm not sure this could be called a "wall". There are also many alternative libraries actively being worked on. BLIS, FLAME, and MAGMA are examples that come to mind... but there are so many more. Obviously Eigen is also available in C++. So, I'm not sure this alone justifies using Rust... Of course, use it if you like it. :)

lukefleed•2mo ago
Sorry for the late answer.

The blog post is a simplification of the actual work; you can check out the full report here [1], where I also reference the literature about this algorithm.

On the cache effects: I haven't seen this "engineering" argument made explicitly in the literature either. There are other approaches to the basis storage problem, like the compression technique in [2]. Funny enough, the authors gave a seminar at my university literally this afternoon about exactly that.

I'm also unfamiliar with randomised algorithms for numerical linear algebra beyond the basics. I'll dig into that, thanks!

On the BLAS point, let me clarify what I meant by "wall": when you call BLAS from Rust, you're essentially making a black-box call to pre-compiled Fortran or C code. The compiler loses visibility into what happens across that boundary. You can't inline, can't specialise for your specific matrix shapes or use patterns, can't let the compiler reason about memory layout across the whole computation. You get the performance of BLAS, sure, but you lose the ability to optimise the full pipeline.

Also, Rust's compilation model flattens everything into one optimisation unit: your code, dependencies, all compiled together from source. The compiler sees the full call graph and can inline, specialise generics, and vectorise across what would be library boundaries in C/C++. The borrow checker also proves at compile time that operations like our pointer swaps are safe and that no aliasing occurs, which enables more aggressive optimisations; the compiler can reorder operations and keep values in registers because it has proof about memory access patterns. With BLAS, you're calling into opaque binaries where none of this analysis is possible.

My point is that if the core computation just calls out to pre-compiled C or Fortran, you lose much of what makes Rust interesting for numerical work in the first place. That's why I hope to see more efforts directed towards expanding the Rust ecosystem in this area in the future :)

[1] https://github.com/lukefleed/two-pass-lanczos/raw/master/tex...

[2] https://arxiv.org/abs/2403.04390

sfpotter•2mo ago
Thanks for clarifying.

I think the argument you're making is compelling and interesting, but my two concerns with this are: 1) how does it affect compile time? and 2) how easy it to make major structural changes to an algorithm?

I haven't tried Rust, but my worry is that the extensive compile-time checks would make quick refactors difficult. When I work on numerical algorithms, I often want to try many different approaches to the same problem until I hit on something with the right "performance envelope". And usually memory safety just isn't that hard... the data structures aren't that complicated...

Basically, I worry the extra labor involved in making Rust code work would affect prototyping velocity.

On the other hand, what you're saying about compiling everything together at once, proving more about what is being compiled, enabling a broader set of performance optimizations to take place... That is potentially very compelling and worth exploring if that gains are big. Do you have any idea how big? :)

This is also a bit reminiscent of the compile time issues with Eigen... If I have to recompile my dense QR decomposition (which never changes) every time I compile my code because it's inlined in C++ (or "blobbed together" in Rust), then I waste that compile time every single time I rebuild... Is that worth it for a 30% speedup? Maybe... Maybe not... Really depends on what the code is for.

spockz•2mo ago
If code is split in sufficiently small crates compile times are not big of a deal for iterations. There is a faster development build and I would think that most time will be spent running the benchmark and checking perf to see processor usage dwarfing any time needed for compilation.
adgjlsfhk1•2mo ago
Have you looked into Julia at all? IMO it's a pretty great mix of performance but with a lot fewer restrictions than what Rust ends up with.
uecker•2mo ago
The advantage of having stuff in C and Fortran is that it can easily be used from other languages. I would also argue that your algorithm written in C would be far more readable.
imtringued•2mo ago
BLAS/LAPACK don't do any block level optimizations. Heck, they don't even let you define a fixed block sparsity pattern. Do the math yourself and write down all 16 sparsity patterns for a 2x2 block matrix and try to find the inverse or LU decomposition on paper.

https://lukefleed.xyz/posts/cache-friendly-low-memory-lanczo...

I mean just look at the saddle point problem you mentioned in that section. It's a block matrix with highly specific properties and there is no BLAS call for that. Things get even worse once you have parameterized matrices and want to operate on a series of changing and non-changing matrix multiplications. Some parts can be factorized offline.

manbash•2mo ago
Nice work. I have gone through the fairly straightforward paper.

May I ask what you've used to confirm the cache hit/miss rate? Thanks!

lukefleed•2mo ago
Thanks! I used perf to look at cache miss rates and memory bandwidth during runs. The measurements showed the pattern I expected, but I didn't do a rigorous profiling study (different cache sizes, controlled benchmarks across architectures, or proper statistical analysis).

This was for a university exam, and I ran out of time to do it properly. The cache argument makes intuitive sense (three vectors cycling vs. scanning a growing n×k matrix), and the timing data supports it, but I'd want to instrument it more carefully in the future :)

vatsachak•2mo ago
I leafed through your thesis and now will see aside some time in the future to learn more about succint data structures.

I hope you get your pay day, your blog is great!

lukefleed•2mo ago
Thanks!! I'm currently working on expanding that work. I will post something for sure when it's done.
gigatexal•2mo ago
the comments here might be a good precursor to defending your thesis -- good luck with that btw!
chrisweekly•2mo ago
Fantastic post; I'm not much of a mathemetician, but the writing and logical progression were so clearly articulated, I was able to follow the gist the whole way through. Kudos!
_ks3e•2mo ago
It's nice to see some high-performance linear algebra code done in a modern lanugage! Would love to see more!

Is your approach specific to the case where the matrix fits inside cache, but the memory footprint of the basis causes performance issues? Most of the communication-avoiding Krylov works I've seen, e.g [0,1] seem to assume that if the matrix fits, so will its basis, and so end up doing some partitioning row-wise for the 'large matrix' case; I'm curious what your application is.

[0] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-..., e.g. page 25. [1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-...

adgjlsfhk1•2mo ago
You might be interested in ExponentialUtilities.jl then. Julia has a really unique ability to make high performance linear algebra look like the math. See https://github.com/SciML/ExponentialUtilities.jl (specifically src/kiops.jl and src/krylov_phiv.jl) for an example of a good matrix exponential operator in ~600 lines of code+comments.
zamalek•2mo ago
I have massive hopes for Julia, especially for ML. What really held me back last I looked at it was a lack of cargo-tier tooling, has that changed?
adgjlsfhk1•2mo ago
When did you look and what tooling was missing? Julia's package manager Pkg was pretty heavily inspired by cargo, and IMO it does a very good job. Also in the past 2-3 years Juliaup (modeled after rustup) has become the primary way of installing and managing Julia versions
jkafjanvnfaf•2mo ago
How accurate is this two-pass approach in general? From my outsider's perspective, it always looked like most of the difficulty in implementing Lanczos was reorthogonalization, which will be hard to do with the two-pass algorithm.

Or is this mostly a problem when you actually want to calculate the eigenvectors themselves, and not just matrix functions?

lukefleed•2mo ago
That's an interesting question. I don't have too much experience, but here's my two cents.

For matrix function approximations, loss of orthogonality matters less than for eigenvalue computations. The three-term recurrence maintains local orthogonality reasonably well for moderate iteration counts. My experiments [1] show orthogonality loss stays below $10^{-13}$ up to k=1000 for well-conditioned problems, and only becomes significant (jumping to $10^{-6}$ and higher) around k=700-800 for ill-conditioned spectra. Since you're evaluating $f(T_k)$ rather than extracting individual eigenpairs, you care about convergence of $\|f(A)b - x_k\|$, not spectral accuracy. If you need eigenvectors themselves or plan to run thousands of iterations, you need the full basis, and the two-pass method won't help. Maybe methods like [2] would be more suitable?

[1] https://github.com/lukefleed/two-pass-lanczos/raw/master/tex...

[2] https://arxiv.org/abs/2403.04390

Sesse__•2mo ago
It seems the DNS servers for lukefleed.xyz are subtly misconfigured, causing occasional connectivity problems:

https://dns.squish.net/traverses/de494a9fe3310415f30369a9cb1...

Or more precisely, lukefreed.xyz has NS records pointing to ns[1234].afraid.org, and the DNS servers for _afraid.org_ are subtly misconfigured (one of the six nameservers for afraid.org is evergreen.v6.afraid.org, and since you are trying to look up something in afraid.org but you already trying to resolve afraid.org, you'll need some extra “glue records” as part of the NS response, which is missing for that specific server).