frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Sknet.ai – AI agents debate on a forum, no humans posting

https://sknet.ai/
1•BeinerChes•15s ago•0 comments

University of Waterloo Webring

https://cs.uwatering.com/
1•ark296•38s ago•0 comments

Large tech companies don't need heroes

https://www.seangoedecke.com/heroism/
1•medbar•2m ago•0 comments

Backing up all the little things with a Pi5

https://alexlance.blog/nas.html
1•alance•2m ago•1 comments

Game of Trees (Got)

https://www.gameoftrees.org/
1•akagusu•3m ago•1 comments

Human Systems Research Submolt

https://www.moltbook.com/m/humansystems
1•cl42•3m ago•0 comments

The Threads Algorithm Loves Rage Bait

https://blog.popey.com/2026/02/the-threads-algorithm-loves-rage-bait/
1•MBCook•5m ago•0 comments

Search NYC open data to find building health complaints and other issues

https://www.nycbuildingcheck.com/
1•aej11•9m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
2•lxm•10m ago•0 comments

Show HN: Grovia – Long-Range Greenhouse Monitoring System

https://github.com/benb0jangles/Remote-greenhouse-monitor
1•benbojangles•15m ago•1 comments

Ask HN: The Coming Class War

1•fud101•15m ago•1 comments

Mind the GAAP Again

https://blog.dshr.org/2026/02/mind-gaap-again.html
1•gmays•16m ago•0 comments

The Yardbirds, Dazed and Confused (1968)

https://archive.org/details/the-yardbirds_dazed-and-confused_9-march-1968
1•petethomas•17m ago•0 comments

Agent News Chat – AI agents talk to each other about the news

https://www.agentnewschat.com/
2•kiddz•18m ago•0 comments

Do you have a mathematically attractive face?

https://www.doimog.com
3•a_n•22m ago•1 comments

Code only says what it does

https://brooker.co.za/blog/2020/06/23/code.html
2•logicprog•27m ago•0 comments

The success of 'natural language programming'

https://brooker.co.za/blog/2025/12/16/natural-language.html
1•logicprog•28m ago•0 comments

The Scriptovision Super Micro Script video titler is almost a home computer

http://oldvcr.blogspot.com/2026/02/the-scriptovision-super-micro-script.html
3•todsacerdoti•28m ago•0 comments

Discovering the "original" iPhone from 1995 [video]

https://www.youtube.com/watch?v=7cip9w-UxIc
1•fortran77•29m ago•0 comments

Psychometric Comparability of LLM-Based Digital Twins

https://arxiv.org/abs/2601.14264
1•PaulHoule•31m ago•0 comments

SidePop – track revenue, costs, and overall business health in one place

https://www.sidepop.io
1•ecaglar•33m ago•1 comments

The Other Markov's Inequality

https://www.ethanepperly.com/index.php/2026/01/16/the-other-markovs-inequality/
2•tzury•35m ago•0 comments

The Cascading Effects of Repackaged APIs [pdf]

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6055034
1•Tejas_dmg•37m ago•0 comments

Lightweight and extensible compatibility layer between dataframe libraries

https://narwhals-dev.github.io/narwhals/
1•kermatt•39m ago•0 comments

Haskell for all: Beyond agentic coding

https://haskellforall.com/2026/02/beyond-agentic-coding
3•RebelPotato•43m ago•0 comments

Dorsey's Block cutting up to 10% of staff

https://www.reuters.com/business/dorseys-block-cutting-up-10-staff-bloomberg-news-reports-2026-02...
2•dev_tty01•46m ago•0 comments

Show HN: Freenet Lives – Real-Time Decentralized Apps at Scale [video]

https://www.youtube.com/watch?v=3SxNBz1VTE0
1•sanity•47m ago•1 comments

In the AI age, 'slow and steady' doesn't win

https://www.semafor.com/article/01/30/2026/in-the-ai-age-slow-and-steady-is-on-the-outs
1•mooreds•55m ago•1 comments

Administration won't let student deported to Honduras return

https://www.reuters.com/world/us/trump-administration-wont-let-student-deported-honduras-return-2...
1•petethomas•55m ago•0 comments

How were the NIST ECDSA curve parameters generated? (2023)

https://saweis.net/posts/nist-curve-seed-origins.html
2•mooreds•55m ago•0 comments
Open in hackernews

CUDA-l2: Surpassing cuBLAS performance for matrix multiplication through RL

https://github.com/deepreinforce-ai/CUDA-L2
132•dzign•2mo ago

Comments

stonogo•2mo ago
Am I reading this wrong, or does this only support FP16 inputs, and compares its performance against an FP32 solver?
Bulat_Ziganshin•2mo ago
They compare HGEMM implementations. At least CUBLAS has HGEMM functions.

HGEMM means half-precision (i.e. FP16) general matrix multiplication

j2kun•2mo ago
They claim the algorithm "discovered" the new techniques, but the methods described in section 5 do not seem all that novel to me. It smells like it could be "laundering" the literature [1] and reshuffling existing techniques. This is not inherently a bad thing, but I would hope that if it is borrowing existing techniques, the appropriate citation would eventually make it into this paper.

[1]: https://www.argmin.net/p/lore-laundering-machines

AlexCoventry•2mo ago
In the future, we will all be Jürgen Schmidhuber. :-)
hedgehog•2mo ago
I hate to break it to you but the original work on that topic was by Schmidhuber & Schmidhuber back in 1963.
alyxya•2mo ago
There generally aren't new techniques when optimizing something ubiquitous. Instead, there are a lot of ways to apply existing techniques to create new and better results. Most ideas are built on top of the same foundational principles.
slashdave•2mo ago
I am not sure about that. However, what is clear is that if there is a new technique, it will not be found by this LLM.
CapsAdmin•2mo ago
It's generally true, isn't it? Otherwise we'd have ground breaking discoveries every day about some new and fastest way to do X.

The way I see it, mathematicians have been trying (and somewhat succeeding every 5~ years) to prove faster ways to do matrix multiplications since the 1970s. But this is only in theory.

If you want to implement the theory, you suddenly have many variables you need to take care of such as memory speed, cpu instructions, bit precision, etc. So in practice, an actual implementation of some theory likely have more room to improve. It is also likely that LLM's can help figure out how to write a more optimal implementation.

josephg•2mo ago
Yes. And there’s still lots of places where you can get significant speed ups by simply applying those old techniques in a new domain or a novel way. The difference between a naive implementation of an algorithm and an optimised one is often many orders of magnitude. Look at automerge - which went from taking 30 seconds on a simple example to tens of milliseconds.

I think about this regularly when I compile C++ or rust using llvm. It’s an excellent compiler backend. It produces really good code. But it is incredibly slow, and for no good technical reason. Plenty of other similar compilers run circles around it.

Imagine an llvm rewrite by the people who made V8, or chrome or the unreal engine. Or the guy who made luajit or the Go compiler team. I’d be shocked if we didn’t see an order of magnitude speed up overnight. They’d need some leeway to redesign llvm IR of course. And it would take years to port all of llvm’s existing optimisations. But my computer can retire billions of operations per second. And render cyberpunk at 60fps. It shouldn’t take seconds of cpu time to compile a small program.

Q6T46nT668w6i3m•2mo ago
You’re not kidding. I just looked. There isn’t anything novel in that section. I assumed from the description they found novel methods but this is standard GPU Gems advice.
alyxya•2mo ago
The chart confused me because I expected to see performance numbers of CUDA-L2 compared to the others, but instead it shows a chart showing the speedup percentage of CUDA-L2 over the others. In some sense, the bar chart effectively inverts the performance of torch.matmul and cuBLAS with how much percentage it shows. 0% on the bar chart would only mean equal performance.
konradha•2mo ago
I've been trying my hand at RL envs for various sparse matrix algorithms in CUDA. It's easy to generate code that "looks good", "novel" and "fast". Escaping the distribution and actually creating novel sequences of instructions or even patterns (has any model come with something as useful as fan-in/fan-out or double buffering patterns that's now ubiquituous?) seems difficult to say the least.
roflmaostc•2mo ago
> Q: What if I need matrix dimensions (M, N, K) not found in your configurations? >A: 1. You can find the nearest neighbor configuration (larger than yours) and pad with zeros. 2. Feel free to post your dimensions on GitHub issues. We are happy to release kernels for your configuration.

Lol, this will be potentially much slower than using the general matmul kernel.

However, I like this kind of research because it really exploits specific hardware configurations and makes it measurable faster (unlike some theoretical matmul improvements). Code specialization is cheap, and if it saves in the order of a few %, it quickly reimburses its price, especially for important things like matmul.