frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open Source Must Win

https://twitter.com/0xSero/status/2035022588439581076
1•simonebrunozzi•1m ago•0 comments

Couples where both WFH have 0.32 more babies

https://www.nber.org/papers/w34963
1•mandevil•1m ago•0 comments

Norwegian AI Championship

https://ainm.no/en
1•whirlwin•1m ago•0 comments

Show HN: Sift, a small CLI that groups noisy test failures into root causes

https://github.com/bilalimamoglu/sift
2•bimamoglu•3m ago•0 comments

Claude Code as Career Coach

https://seanplusplus.github.io/2026/03/20/claude-code-as-career-coach/
1•seanplusplus•6m ago•0 comments

Slap your MacBook, it yells back (Apple Silicon accelerometer)

https://github.com/taigrr/spank
1•ex-aws-dude•6m ago•0 comments

You and Your Spinner Can Go to Hell

https://catskull.net/you-and-your-spinner-can-go-to-hell.html
1•speckx•7m ago•0 comments

Microsoft update breaks internet access to Windows 11 Teams, Edge, OneDrive

https://www.neowin.net/news/microsoft-kb5079473-breaks-internet-access-to-windows-11-teams-edge-o...
1•binsquare•7m ago•0 comments

Portless replaces port numbers with stable .localhost URLs for local development

https://port1355.dev/
1•tanelpoder•8m ago•0 comments

RustCC: Bringing Rust-Style Safety to C++17 via Policy Enforcement

https://github.com/yunquleonliu/RustCC-Profiler/blob/main/Rust_Cpp_Manifesto.md
1•leontheyellow•8m ago•0 comments

Top US Lawyers' Fees Have Skyrocketed. Be Prepared to Pay $3,400 an Hour

https://www.wsj.com/business/lawyer-hourly-rate-bill-3400-807cf6ce
1•tchalla•8m ago•0 comments

Postgres's Costs

https://www.flowercomputer.com/news/postgres/
1•edouerd•9m ago•0 comments

i am a software mechanic

https://russell.ballestrini.net/i-am-a-software-mechanic/
1•unfirehose•10m ago•0 comments

Talking Postgres Podcast Ep37 Transcript: Building Postgres Services on Azure

https://talkingpostgres.com/episodes/building-postgres-services-on-azure-with-charles-feddersen/t...
1•clairegiordano•13m ago•0 comments

Show HN: Qwack – Collaborative steering for AI agents built on OpenCode

https://qwack.ai
1•zfleeman•14m ago•0 comments

Show HN: Reverse Engineering Tools MCP Server

https://github.com/daedalus/mcp_reverse_engineering
1•dclavijo•15m ago•0 comments

Software in the AI Era

https://www.philipithomas.com/software-in-the-ai-era
2•speckx•15m ago•0 comments

How we are looking for cofounders is stupid

1•shoman3003•15m ago•0 comments

Cambalache's First Major Milestone

https://blogs.gnome.org/gtk/2026/03/20/cambalaches-first-major-milestone/
1•samtheDamned•16m ago•0 comments

Using LLMs to Study Deregulation

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6440018
1•paulpauper•17m ago•0 comments

Startup Will Pay You $100 an Hour to 'Bully' AI

https://www.entrepreneur.com/business-news/memvid-is-paying-100-an-hour-to-bully-ai
1•paulpauper•17m ago•0 comments

Saaspocalypse: Real or Hype?

https://iamcharliegraham.substack.com/p/saaspocalypse-real-or-hype
3•grahac•18m ago•1 comments

MST3K – KTMA – K03 – Star Force [video]

https://www.youtube.com/watch?v=u_Z4hnagGLE
1•jmward01•19m ago•1 comments

How we cut our GTM workflow time by 70%

https://www.operator23.com/
1•Mrakermo•20m ago•0 comments

Why Many People Misunderstand Dividends, and the Damage This Does

https://www.mutualfundobserver.com/discuss/discussion/56300/why-many-people-misunderstand-dividen...
2•paulpauper•21m ago•0 comments

Even Before the Iran War, There Was a Growing Inflation Problem

https://newsletter.mikekonczal.com/p/even-before-the-iran-war-there-was
2•NomNew•22m ago•0 comments

Show HN: The Cost of Manual Workflows

2•Mrakermo•22m ago•0 comments

I Quit Editing Photos

https://jamesbaker.uk/i-quit-editing-photos/
1•speckx•22m ago•0 comments

Twttr is a service for friends, family, and co-workers to communicate

https://twttr.eu/
1•doener•24m ago•0 comments

World Models: Computing the Uncomputable

https://www.notboring.co/p/world-models
2•gmays•24m ago•0 comments
Open in hackernews

Attention Residuals

https://github.com/MoonshotAI/Attention-Residuals
25•GaggiX•1h ago

Comments

jszymborski•56m ago
This is reminds me of the input gates of an LSTM.
jjcm•34m ago
Two things stand out to me with this:

1. Drops compute required for training by ~20%. This approach wont just help the ever escalating model sizes larger companies are pushing for, it means things like autoresearch can iterate on new model architectures faster.

2. WAY lower bandwidth requirements for inference. Means with approaches like this it should run on consumer hardware far better. It apparently requires 1/6th the memory bandwidth of a traditional approach for better results.

This is a big improvement if it can be generalized. They're claiming it's a drop in replacement, so it seems like it can as well.

dvt•21m ago
> Drops compute required for training by ~20%.

This is not true. Authors claim that w.r.t. training, their method adds negigible overhead for AttnRes with no memory impact (but is way more complicated for Block AttnRes since we need to use pipelining for larger models).

com2kid•17m ago
> 2. WAY lower bandwidth requirements for inference. Means with approaches like this it should run on consumer hardware far better. It apparently requires 1/6th the memory bandwidth of a traditional approach for better results.

That should be the headline right there. Giant side 60 font headline.

Some people have PhDs in burying the lede!

westurner•12m ago
ScholarlyArticle: "Attention Residuals" (2026) https://arxiv.org/abs/2603.15031 :

> Abstract: Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with depth, progressively diluting each layer's contribution. We propose Attention Residuals (AttnRes), which replaces this fixed accumulation with softmax attention over preceding layer outputs, allowing each layer to selectively aggregate earlier representations with learned, input-dependent weights. To address the memory and communication overhead of attending over all preceding layer outputs for large-scale model training, we introduce Block AttnRes, which partitions layers into blocks and attends over block-level representations, reducing the memory footprint while preserving most of the gains of full AttnRes. [...]