frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I rebuilt FlashAttention in Triton to understand the performance archaeology

https://aminediro.com/posts/flash_attn/
2•amindiro•1h ago

Comments

amindiro•1h ago
I’ve spent the last few weeks deconstructing FlashAttention. While the original paper is brilliant, I found that just reading it didn't give me a "gut feeling" for why certain engineering choices were made (the transition from v1 to v2).

I decided to rebuild it from scratch using Triton. This post is a chronicle of that journey—moving beyond the high-level algorithm and into the "performance archaeology" of the GPU:

- Profiling with Nsight Compute to find the real bottlenecks.

- Looking at the generated PTX and SASS code.

- Debugging shared memory bank conflicts and MIO bottlenecks.

- Iterating through the logic to see why tiling and online softmax are hardware-necessitated, not just mathematical tricks.

I’ve tried to keep it in the spirit of Simon Boehm’s matmul deep dive. Would love to hear from any GPU engineers on whether my interpretations of the SASS/bank conflict behavior match what you've seen in production.

Seven Stages of Open Software

https://docs.coiled.io/blog/stages-of-openness.html
1•fanf2•4m ago•0 comments

Show HN: The equation for smoke vortices also describes 100M° fusion plasma

https://github.com/Lulzx/driftmap
2•lulzx•5m ago•0 comments

I Don't Play the Game

https://silence.bearblog.dev/i-dont-play-the-game/
1•thejamesbox•6m ago•0 comments

Recent discoveries on the acquisition of the highest levels of human performance

https://www.science.org/doi/10.1126/science.adt7790
1•Anon84•6m ago•0 comments

Luke Howard's Essay on the Modification of Clouds (1865)

https://publicdomainreview.org/collection/essay-on-the-modification-of-clouds/
1•Petiver•7m ago•0 comments

Google's Boomerang Year: 20% of 2025 AI Engineers Were Former Employees

https://www.cnbc.com/2025/12/19/google-boomerang-year-20percent-ai-software-devs-hired-2025-ex-em...
1•birdculture•9m ago•0 comments

A Plug and Play all Purpose Robotic OS

https://axiomrobotics.netlify.app/
1•Akshaiy•9m ago•1 comments

A Tool Born from Prompt Frustration

https://chromewebstore.google.com/detail/best-nano-banana-prompt/phmmhcemapcpjjcggghiobjmncdipmdp
1•AI_kid1412•10m ago•0 comments

Show HN: Shittp – Volatile Dotfiles over SSH

https://github.com/FOBshippingpoint/shittp
4•sdovan1•12m ago•1 comments

Ask HN: How to handle this prompt allowing AI to self reflect on human risks

1•liefde•15m ago•0 comments

I Wouldn't Want John Solomon's New CMO Job at Mozilla

https://fossforce.com/2025/12/why-i-wouldnt-want-john-solomons-new-cmo-job-at-mozilla/
1•speckx•16m ago•0 comments

14 years ago: the day Teller gave me the secret to my career in magic. (2009)

http://shwood.squarespace.com/news/2009/9/21/14-years-ago-the-day-teller-gave-me-the-secret-to-my...
1•Tomte•17m ago•0 comments

Things Software Developers Should Learn about Learning (2024)

https://cacm.acm.org/research/10-things-software-developers-should-learn-about-learning/
1•Tomte•18m ago•0 comments

Show HN: Zero Trust API – Image CDR in Rust/WASM (Rebuild Images from Pixels)

https://zero-trust-web.vercel.app/
1•Raviteja_•21m ago•0 comments

Affordability, Part IV

https://paulkrugman.substack.com/p/affordability-part-iv
1•rbanffy•21m ago•0 comments

Efficacy and safety of transcranial AC stimulation in adults with ADHD [pdf]

https://www.nature.com/articles/s41380-025-03407-0
1•thunderbong•22m ago•0 comments

Show HN: Modern Trello Website

https://tasklanes.app
1•fcuk112•25m ago•0 comments

iPhone/iPad/Android Notifications for ClaudeCode

https://github.com/teito-dev/claudecode-pushover-integration
1•teitoklien•27m ago•1 comments

Elon Musk becomes first person worth $700B following pay package ruling

https://www.reuters.com/business/autos-transportation/elon-musk-becomes-first-person-worth-700-bi...
1•ksec•29m ago•0 comments

Show HN: The Official National Train Map Sucked, So I Made My Own

https://www.bdzmap.com/
3•Pavlinbg•30m ago•1 comments

Concurrent JavaScript: It can work

https://webkit.org/blog/7846/concurrent-javascript-it-can-work/
2•samwillis•31m ago•0 comments

Show HN: A practical database of AI SEO strategies for founders and marketers

https://www.aiseodatabase.com/
1•mohitvaswani•32m ago•0 comments

Show HN: Free tool to blur images instantly without signup or watermarks

https://www.blurimageonline.com/
2•teroquyiqwu•34m ago•0 comments

Against Likes and Subscribers

https://metanomad.blog/against-likes-and-subscribers/
2•speckx•35m ago•0 comments

Inception X TicTacToe: a fractal game

https://tic-tac-toe-inception-49280791970.us-west1.run.app/
1•apitaru•38m ago•1 comments

An 11-qubit atom processor in silicon with all fidelities from 99.10% to 99.99%

https://www.nature.com/articles/s41586-025-09827-w
1•giuliomagnifico•38m ago•1 comments

Eye Sentry: 5-Day Built macOS Eye Care Tool

https://eye-sentry.vercel.app
1•lispking•39m ago•0 comments

Language Switcher Guide: Java, JavaScript, Python, Go Comparison for B

https://blog.blockingqueue.com/language-switcher-cheatsheet-java-javascript-python-go
1•liviu31•40m ago•0 comments

Research Reveals the Optimal Way to Optimize

https://www.wired.com/story/researchers-discover-the-optimal-way-to-optimize/
2•quapster•41m ago•0 comments

NY Gov. vetoes bill to mandate 2-person subway train crews

https://gothamist.com/news/ny-gov-hochul-vetoes-bill-to-mandate-2-person-subway-train-crews
1•geox•41m ago•0 comments