frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

AI job alerts that match your skills

https://jobsphere.onboardai.net/
1•Gurnoorsb•37s ago•1 comments

They Were Identical 'Twinnies' Who Charmed Orwell, Camus and More

https://www.nytimes.com/2025/05/04/books/review/the-dazzling-paget-sisters-ariane-bankes.html
1•lermontov•2m ago•1 comments

AI Food Detection and Free Calorie Counter App by Recipe

https://whatthefood.io
1•OdehAhwal•5m ago•0 comments

Nord Stream 2 Enters Debt Restructuring Deal with Creditors

https://oilprice.com/Latest-Energy-News/World-News/Nord-Stream-2-Enters-Debt-Restructuring-Deal-with-Creditors.html
1•PaulHoule•7m ago•0 comments

Better air quality is the easiest way not to die

https://dynomight.net/air/
2•haltingproblem•8m ago•0 comments

Jane Street-Millennium Trade Secrets Fight Ends in Settlement (2024)

https://www.bnnbloomberg.ca/business/international/2024/12/05/jane-street-millennium-settle-india-options-trade-secrets-case/
1•walterbell•11m ago•0 comments

TalwarAI: A suite of autonomous security agents

https://test.pypi.org/project/talwarai-beta/
1•handfuloflight•19m ago•0 comments

Apple Says Fortnite for iOS Isn't Blocked Worldwide, Just the U.S.

https://www.macrumors.com/2025/05/16/apple-fortnite-ios-not-blocked-worldwide/
3•smileybarry•23m ago•1 comments

Typograph: Prompt to Font

https://typograph.studio/en
1•handfuloflight•27m ago•0 comments

Harvard bought a Magna Carta copy for $27. It turned out to be an original

https://www.usatoday.com/story/news/nation/2025/05/15/harvard-magna-carta-1300/83643266007/
2•rmason•32m ago•0 comments

Core War

https://en.wikipedia.org/wiki/Core_War
2•michalpleban•34m ago•0 comments

Reddit is down

2•tom1337•34m ago•1 comments

Yeast-Based LLM Research

1•daly•35m ago•0 comments

Berkshire Hathaway Inc Q4 2024 vs. Q1 2025 13F Holdings Comparison

https://13f.info/13f/000095012325005701-berkshire-hathaway-inc-q1-2025
1•kamaraju•35m ago•0 comments

How to Split Ranges in C++23 and C++26

https://www.cppstories.com/2025/ranges_split_chunk/
2•ibobev•40m ago•0 comments

Leica M10 Battery Teardown and Reverse Engineering

https://tokilabs.co/tech/
1•k2enemy•42m ago•0 comments

The Digital Panopticon Nightmare

https://www.thedissident.news/the-digital-panopticon-nightmare/
4•anigbrowl•44m ago•0 comments

Ask HN: Did anyone else see an avalanche of old email appear in Gmail?

1•DamnInteresting•45m ago•0 comments

Party Till the Break of 10 P.M

https://www.nytimes.com/2025/05/15/style/earlybirds-club-dance-party.html
2•whack•47m ago•0 comments

On-Demand: AI Agent Automation

https://on-demand.io/
1•handfuloflight•49m ago•0 comments

U.S. Loses Last Triple-A Credit Rating

https://www.wsj.com/economy/central-banking/u-s-loses-last-triple-a-credit-rating-bfcbae5d
8•mudil•51m ago•1 comments

FDA clears first blood test for diagnosing Alzheimer's

https://www.statnews.com/2025/05/16/alzheimers-fujirebio-fda-approval/
1•pseudolus•51m ago•0 comments

Really Really Simple "Pure CSS" Squircles

https://gist.github.com/pouyakary/136fafc75a14abd867e0100856add5a0
3•pmkary•55m ago•0 comments

After HTTPS: Indicating Risk Instead of Security (2019)

https://scholarsarchive.byu.edu/etd/7403/
2•transpute•55m ago•0 comments

Our Idea of Happiness Has Gotten Shallow

https://www.nytimes.com/2025/05/03/magazine/happiness-history-living-well.html
1•gmays•1h ago•0 comments

Dept Homeland Security in vetting process for immigrant reality TV show

https://www.cnn.com/2025/05/16/politics/dhs-vetting-immigrant-reality-tv-show
3•jeffwass•1h ago•0 comments

HMPL v3.0: Small template language for displaying UI from server to client

https://github.com/hmpl-language/hmpl/releases/tag/3.0.0
3•todsacerdoti•1h ago•0 comments

Peter Lax, Pre-Eminent Cold War Mathematician, Dies at 98

https://www.nytimes.com/2025/05/16/science/peter-lax-dead.html
5•donohoe•1h ago•1 comments

Beyond the Gang of Four: Practical Design Patterns for Modern AI Systems

https://www.infoq.com/articles/practical-design-patterns-modern-ai-systems/
1•rbanffy•1h ago•0 comments

AI Could Help Humans Understand Animals

https://nautil.us/ai-could-help-humans-understand-animals-1211108/
1•rbanffy•1h ago•0 comments
Open in hackernews

X X^t can be faster

https://arxiv.org/abs/2505.09814
148•robinhouston•7h ago

Comments

BryanLegend•7h ago
So if an AI company spends $5B on a cluster, is this optimization worth $250m?
disofu234•6h ago
Don't think so. This is an optimization on multiplying a matrix times its own transpose, not for generic matrix multiplication.
qoez•5h ago
Definitely didn't cost that much to solve this particluar problem (that cost is amortized over the lifetime of the clusters existence). Compared to the CO2 to feed academics eating red meat, this is like wind power compared to coal it's way more environmentally friendly.
Scene_Cast2•7h ago
I can't name any applications off the top of my head, other than iterative matrix multiplication for approximate eigenvector finding in square matrixes. But I don't know what's actually used for finding eigenvectors (or other decompositions for that matter).
TimorousBestie•7h ago
It’s a fairly common operation. The sample covariance of a vector-valued random variable is XX^t/N.
vqv•6h ago
It’s pretty fundamental in a variety of multivariate statistical methods. If the rows of X are multi variate observations, then XX’ is the Gram matrix (of dot products). This can be used in clustering and regression. If the columns of X are (centered) multivariate observations, then XX’ is a scalar multiple of the sample covariance matrix. This is used in PCA.

But in large scale applications you may not want to store XX’ but instead are interested in computing products of the form XX’ v on the fly.

adgjlsfhk1•2h ago
generally good implementations use QR instead
bee_rider•6h ago
Could be useful when doing an SVD as well (although, really, you don’t want X’X but X’Xv for some vector…).
constantcrying•6h ago
I think one particularly interesting result is that superior algorithms for numerical linear algebra exist at all and can be found by artificial intelligence.

XX^T is the matrix of all piecewise dot products of vectors in X and as others have pointed out there are legitimate applications. Another one being e.g. transforming an undetermined linear system into a least squares problem.

FabHK•5h ago
The solution to the least squares problem Ax ≈ b is given by x = (A'A)^-1 A'b, but modern solvers never form the matrix product A'A explicitly. Even for the SVD it isn't formed.
jerf•3h ago
Google put something out like this recently too. The original is https://deepmind.google/discover/blog/alphaevolve-a-gemini-p..., but it's sort of buried in there. Here's the YouTube video that introduced it to me, forwarded to the spot where the host talks about the improved 4x4 matrix multiplication algorithm: https://www.youtube.com/watch?v=sGCmu7YKgPA&t=396s It's only a slight tweak, but it's a slight tweak to something pretty highly studied and still impressive.
blobbers•4h ago
Covariance.
Bootvis•3h ago
I expect this algorithm or similar to work for X^tX as well. Fun fact, that operation is common enough that a trading firm was named after it: XTX Markets.
TimorousBestie•6h ago
I wish they could have modeled memory cache movements as well, somehow. It would have made the analysis more difficult but often these large matrix algorithms live or die by how cache-friendly the memory access patterns are.

Splitting into 4x4 blocks is typically very nice, though. Maybe it doesn’t matter so much to practical runtime.

grumpymuppet•1h ago
I wish cache analysis was exposed more broadly everywhere. Everyone I deal with seems to understand and know how to think about it, but best practices seem to be wrapped up in a lot of heuristics.

I am aware of being able use hardware counters for profiling systems, but the more fine-grained exposure of what's happening at a lower level seems to be just hidden... Or I haven't found the right reference resource yet.

meisel•6h ago
Is this like the Karatsuba algorithm, where it's theoretically faster but not actually faster when run on real hardware?

Btw, it's worth noting that if you know that the result will be symmetric (such as is the case for X * X^T), you can make things faster. For example in cuBLAS, cublas*syrk (the variant optimized for when the result is symmetric) IME isn't faster than gemm, so what you can do instead is just do smaller multiplications that fill in one of the two triangles piece by piece, and then copy that triangle to the other one.

optimalsolver•6h ago
>where it's theoretically faster but not actually faster when run on real hardware

Aka a galactic algorithm:

https://en.wikipedia.org/wiki/Galactic_algorithm

thrance•6h ago
I feel like a galactic algorithm is slow even theoretically, unless you work with out-of-this-world data.

Whereas many algorithms are theoretically fine for real data but don't make good use of cache, or instruction sets...

thrance•6h ago
They tested it against BLAS primitives and found theirs to be faster, in hardware.
qoez•6h ago
Since this is 2x2 there's simd instructions that can do this (or with two simd dot products) both on CPU and inside each GPU core. So with current hardware you won't beat writing this out manually.
bee_rider•5h ago
The mention 5% improvements and small matrices in the abstract, so my gut says (I haven’t read the actual paper yet) that is probably is a practical-type algorithm.
odo1242•5h ago
It’s faster for matrices of approximately at least ~256x256, though it depends on hardware
pbsd•5h ago
Karatsuba is definitely faster than schoolbook multiplication at practical sizes. You presumably mean Strassen.
ttoinou•6h ago
Are there researchers interested in accelerating these algorithms while also keeping the maximum accuracy of the results ? This and others optimizations are trading less multiplication for more additions, but from the little I know, on floating point additions and subtractions are risky precision wise, while multiplications are harmless. Also using FMA fused multiply add operations would be beneficial.
constantcrying•6h ago
The question of stability of the algorithm is interesting, the paper does not seem to discuss it though. You are correct that we should expect the algorithms to deliver different results.

>This and others optimizations are trading less multiplication for more additions, but from the little I know, on floating point additions and subtractions are risky precision wise, while multiplications are harmless.

Certainly not true. In general different orders of magnitude are a problem. With multiplication and division these are more serve problems, as they lead to greater changes in magnitude, although even that is a generalization.

wizzwizz4•4h ago
Different orders of magnitude aren't a problem when multiplying floats, because the order-of-magnitude portion gets translated into addition of exponents.

They're a problem with fixed point: might you have been thinking of that?

Dylan16807•4h ago
We're not looking at instructions in a vacuum. No matter what you have both multiplications and additions happening.

So they're making a point that if you apply more multiplications before the inevitable addition, you are likely increasing the danger levels of that addition.

wizzwizz4•3h ago
Oh, if one of the multiplications yields a positive number, and another yields a negative number of similar magnitude? That makes sense; I forgot linear spaces were over fields.
almostgotcaught•4h ago
> while also keeping the maximum accuracy of the results

All of these papers/algos are for the ML hype-train. ML algos are approximate anyway so no one cares about absolute accuracy, only the precision of the overall pipeline (class labels shouldn't change, at least not too much). Consider that very many papers/techniques quantize down to 8 or even 4 bits (yes sometimes even during training) for the purposes of perf.

This whole research area should just be renamed to something like approximate computing so that people don't confuse it (and the goals) with classical numerical analysis.

saagarjha•4h ago
Surely nobody is tiling 4x4 blocks for ML training
almostgotcaught•3h ago
Why surely? Eg on GPU style chips today you'd tile to warpsize (or half or something like that) to target warp cooperative primitives (so called tensor cores). On AMD and NV that's like 16x16 or 32x32 depending on the data type. That's not that far from 4x4 and it's not like all chips in the world have 32-lane warps. Anyway if a trick is good enough and people start trying to shoehorn it into everything (not saying this one is) then the next gen (or the one after that or after) will just have units to support the trick (it's an expensive way to gain ground on mlperf rankings).
ttoinou•1h ago
I get your point, neural networks could be some kind of black boxes and how the sequence of operations evolves might not be so dependent on the accuracy of each operations -- at least, it would be a good intuition to believe "accuracy at each step matters" but it's not rigorously proven ? We could empirically check this by training a neural net and adding some random noise at each step
HappyPanacea•2h ago
Yes see this: https://arxiv.org/abs/1507.00687 . There is also this answer ( https://mathoverflow.net/a/421380 ) in MathOverflow which state that you can't do better than cubic time if you want strong form of numerical stability.
ttoinou•1h ago
Wow thank you I do not understand everything written and didn't take time to do so (and can't even find reference 23 paper on Brent stability), but I was thinking we don't necessarily need a new algorithm, we could tweak existing ones.

For example swapping columns and rows before applying a product, so that accuracy is best kept (then reswapping back at the end of the operation). It seems to me the Strassen-like algorithm choose arbitrarily some elements of the matrices over others to make their temporary sum-products variables, so we could exploit this asymmetry (not sure if it's the correct word here) to choose which elements are to be chosen

toolslive•5h ago
as a general tip: X^-1 doesn't mean you have to inverse the matrix. It's often a notational shorthand for "you can solve the system". Same remark for X^t. It doesn't mean you have to build a new matrix. It just means you have to use the one you have in a different way. I've have seen this being butchered by scientists and then they complain their performance sucks.
jerf•3h ago
Even computer programmers can get it wrong, and even reify the wrongness into interview questions. The correct answer to the common question of "how do you reverse an array" is that you don't reverse it. You read it backwards. Details vary from language to language; languages with iterators can find this particularly easy. But this is often not the "correct" answer according to the interviewer.

In the end, all data structures are functions. The academic applications of that may make your eyes glaze over, but the practical application is that if you want a reversed array/tree/etc., you just need the reading/iteration "function" for the array to return the data as if it was reversed. A matrix can be "transposed" just by reading its transpose. (Caching or other consequences may result in you choosing to manifest it in memory anyhow, but if it was stored cleverly in the first place, or you're only going to read it once anyhow such that using it once is as expensive as reading it once to reconstruct it anyhow, then there's no reason to.) An array can be randomly permuted simply by wrapping the read function with a permutation on the index, it need not be manifested in memory. etc.

kazinator•2h ago
The "invert binary tree" interview question can be solved by swapping the accessors of the node type:

  tmpfn = tree.left
  tree.left = tree.right
  tree.right = tmpfn
Now when tree.left(node) is applied, it returns the right child, and vice versa. All traversals use the accessors, QED.
cmovq•1h ago
I see this a lot in graphics, where people will for example say to create a view matrix you first create your camera transform and then invert it. When the better solution is to just directly construct the inverted matrix.

There is almost never a good reason to invert a full 4x4/3x3 transform, yet I see it all the time.

selimthegrim•5h ago
I guess the news is the “for small sizes” part
kazinator•2h ago
Under multiplication, rows go into columns and all that. So if we transpose things, rows are being dot-producted with (what were originally) rows:

  [ a b ] [ a c ] = [ (a b) . (a b)  (a b) . (c d) ] = [   |(a b)]^2     (a b) . (c d) ]
  [ c d ] [ b d ]   [ (c d) . (a b)  (c d) . (c d) ]   [ (c d) . (a b)      |(c d)|^2  ]
 
Down the diagonal we have squared norms of (a b) and (c d) which is interesting and could somehow be used.

We also have repeated values between the upper and lower triangle because (a b) . (c d) is the same as (c d) . (a b): the dot product commutes.

Whereas just squaring the matrix:

  [ a b ] [ a b ] = [ (a b) . (a c)  (a b) . (b d) ]
  [ c d ] [ c d ]   [ (c d) . (a c)  (c d) . (b d) ]
all four dot products are unique.

So right of the bat, we can find interesting things about X X^t without going deep, and an obvious shortcut.