news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: FlashAttention-2 in Cute, from Scratch

https://blog.echen.io/p/flashattention-2-in-cute-from-scratch/

1•echen314•51m ago

Comments

echen314•46m ago

Author here. Some context on why I wrote this and how it's different from other public FA-2 content.

Most FA-2 walkthroughs cover either the algorithm or a Triton implementation. Both abstract away the parts that take the most time to actually understand if you're trying to read the original CUDA source: the swizzling, the LDSM/LDSM_T atoms, the V-transpose, the SMEM layouts, the fragment/register layout, partition_fragment behavior, the async pipelining. This post walks the production CUTLASS 3.x (CuTe) kernel line-by-line on Ampere. I add diagrams that help visualize the weird layout stuff going in the background to make this approachable to even the most elementary CUDA dev.

Repro details: kernel hits 88–105% of production FA-2's throughput on A100 across hdim=64/128 up to 64K seq, peaking at ~63% of fp16 tensor-core peak (the alg is just stripped version of original, no novelty on this aspect). Tested with bitwise-identical output vs the production reference.

Happy to dig into specifics in the comments — particularly interested if anyone has counterexamples to the sVtNoSwizzle no-op claim, or has done the equivalent investigation on Hopper.

Why Outsource Your Auth System and How to Sell the Decision to Your Company

https://fusionauth.io/articles/identity-basics/outsource-auth-system-blueprint

1•mooreds•1m ago•0 comments

Show HN: Spud – cross-platform remote control, optimised for gaming

https://github.com/xfoa/spud

1•foax•6m ago•0 comments

The Merge

https://blog.samaltman.com/the-merge

1•skinfaxi•14m ago•0 comments

Anthropic co-founder to present AI encyclical alongside Pope Leo XIV

https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-first-encyclical-magnifica-humanitas...

2•cucho•26m ago•0 comments

Don't Sign in with Google

https://twitter.com/the_smart_ape/status/2055941633179283523

4•DeusExMachina•27m ago•0 comments

Built an API client with age-encrypted vault for secrets

https://github.com/xaaha/hulak

1•xaaha•30m ago•1 comments

Meta Reassigns 7k Employees to Focus on A.I

https://www.nytimes.com/2026/05/18/technology/meta-reassigns-7000-employees-ai.html

5•xnx•31m ago•2 comments

MIT 14.12 Economic Applications of Game Theory, Fall 2025

https://www.youtube.com/watch?v=WRibE2nt8wM

2•mdp2021•34m ago•0 comments

Key landmark regulations against 'forever' toxins removed by Trump admin

https://www.cnn.com/2026/05/18/health/trump-pfas-rollback-wellness

5•zzzeek•35m ago•0 comments

Intern gets inspired by Bun to rewrite codebase into assembly

https://twitter.com/KashyapVisharad/status/2055239392147718394

1•sundarurfriend•37m ago•1 comments

Canceled by Hinge

https://www.theatlantic.com/ideas/2026/03/hinge-banning-dating-apps-matchgroup/686445/

4•8f2ab37a-ed6c•38m ago•2 comments

APIMatic – Type-safe SDKs for 7 languages from an OpenAPI spec

https://www.apimatic.io

1•m3h•39m ago•1 comments

Click

https://clickclickclick.click/

35•andrewzeno•41m ago•3 comments

Manchester Code Made Bits Behave

https://spectrum.ieee.org/manchester-code-ieee-milestone

1•jnord•42m ago•0 comments

Super-Adaptable Mayhem 10 Swarming Drone Evolved from the Switchblade

https://www.twz.com/air/super-adaptable-mayhem-10-swarming-drone-evolved-from-the-switchblade

2•breve•43m ago•0 comments

My blog was hacked and Claude and I just fixed it

https://lengrand.fr/my-blog-was-hacked-and-claude-and-i-just-fixed-it/

1•jlengrand•45m ago•0 comments

An asteroid discovered days ago will narrowly miss Earth – RNZ News

https://www.rnz.co.nz/news/world/595636/an-asteroid-discovered-days-ago-will-narrowly-miss-earth

4•colinprince•46m ago•0 comments

If AI can translate instantly, why learn another language?

https://theconversation.com/if-ai-can-translate-instantly-why-learn-another-language-280310

1•0in•47m ago•0 comments

A New Look for Express

https://expressjs.com/en/blog/2026-05-18-a-new-look-for-express/

1•patrikcsak•47m ago•0 comments

Musk says Tesla unsupervised FSD will be 'widespread' in the US by year-end

https://electrek.co/2026/05/18/musk-unsupervised-fsd-widespread-us-end-of-year-smart-mobility-sum...

3•breve•47m ago•2 comments

Human Bottlenecks

https://borretti.me/article/human-bottlenecks

2•jger15•49m ago•0 comments

Starship's Twelfth Flight Test

https://www.spacex.com/launches/starship-flight-12

3•bookmtn•50m ago•0 comments

Updated Debian 13: 13.5 released

https://www.debian.org/News/2026/20260516

1•doener•51m ago•0 comments

Show HN: FlashAttention-2 in Cute, from Scratch

https://blog.echen.io/p/flashattention-2-in-cute-from-scratch/

1•echen314•51m ago•1 comments

Tech bros say AI can be your best friend. Experts explain why it can't

https://www.rnz.co.nz/life/wellbeing/tech-bros-say-ai-can-be-your-best-friend-experts-explain-why...

4•billybuckwheat•51m ago•0 comments

AI-Governed EV Charging Could Extend Battery Life Nearly 23%

https://www.thedrive.com/news/ai-governed-ev-charging-could-extend-battery-life-nearly-23

4•breve•52m ago•0 comments

Show HN: Handoff – preserve coding context when agents run out of tokens

https://github.com/TStansel/handoff

1•tstansel•53m ago•0 comments

EPA and HHS propose rescinding parts of Biden's PFAS limits in drinking water

https://www.washingtonexaminer.com/policy/energy-and-environment/4573000/epa-hhs-propose-rescindi...

4•petethomas•53m ago•0 comments

Parallel Cities

https://vicnaum.github.io/parallel-cities/

1•bookofjoe•53m ago•0 comments

Show HN: Clawputer – A personal AI assistant with a real computer and memory

https://clawputer.app

1•iacguy•53m ago•0 comments