frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Building an Evolutionary Search for Attention Mechanisms

https://github.com/drhemanm/evo-attention
3•hemanm•3h ago

Comments

hemanm•3h ago
Building an Evolutionary Search for Attention Mechanisms (on Free Colab) I spent the last few weeks building a framework that allows evolution to design attention mechanisms instead of hand-crafting them. The results were interesting enough to share.

The Question: Transformers use scaled dot-product attention because it was shown to be effective in the "Attention is All You Need" paper. But was it actually optimal, or just the first thing that worked well enough? Most research tweaks hyperparameters. I wanted to explore the mechanism design space itself. The Constraint: I have no computing budget. No lab. No institutional backing. Just free Colab and curiosity.

This meant: Small models only (~500K parameters) Fast training (5K steps per model) Limited search (120 evaluations total) WikiText-2 (small enough to iterate quickly)

The Approach: I encoded attention mechanisms as genes with 4 components:pythongene = AttentionGene( similarity='dot', # How Q and K compute scores normalization='sparsemax', # How scores become weights gating='output_gate', # Optional gating mechanism temperature='learned' # How to scale attention

This creates a discrete search space of 384+ possible mechanisms.

Then I ran a simple genetic algorithm: Initialize 12 random attention mechanisms Train each for 5K steps on WikiText-2 Keep top 3 (elitism) Generate 9 offspring via crossover + mutation Repeat for 10 generations Each generation takes ~2 hours on free Colab. Total: ~20 GPU hours.What Evolution FoundBest mechanism: dot + sparsemax + output_gate + learned_temperatureResults:

Evolved: 98.45 perplexity Baseline (dot + softmax): 102.90 perplexity Improvement: 4.3% The interesting part isn't the 4% improvement. It's what evolution consistently chose:

Finding #1: Sparsemax > Softmax. Every top performer used sparsemax normalization instead of softmax. Sparsemax (from a 2016 paper) creates sparse attention - many weights become exactly zero. The ML community largely ignored it. Evolution rediscovered it works.

Finding #2: Output Gating is a Universal top mechanism used output gating:pythonoutput = attention_result gate = sigmoid(linear(input)) output = output * gate. This wasn't in the original Transformer. Evolution found it's critical.

Finding #3: Highway Gating Always FailsHighway connections (borrowed from Highway Networks) were the worst performers across all generations. Average perplexity: 115.8.This surprised me - highway connections work elsewhere. But for attention, they consistently failed.

Finding #4: Dot-Product is Actually Good. The winner uses standard dot-product similarity, not some exotic function. The improvement comes from normalization + gating, not from replacing the core similarity function. This makes the result more practical - dot-product is fast.

The Honest Part: This is proof-of-concept, not production-ready: Not tested:

Large models (100M+ params) Other datasets Other domains (vision, audio) Production deployment Known issues:

Training variance is ±1 perplexity Only 93 mechanisms evaluated (~24% of search space) Single run per mechanism (no statistical tests) Baseline wasn't hyperparameter-tuned With enough evolutionary steps, you can probably find "good" hyperparameters for any mechanism. I don't know if I discovered better mechanisms or just better hyperparameters. What I Learned 1. Evolutionary Search is Viable at Small Scale. You don't need massive compute to explore architecture spaces. 20 GPU hours found something interesting.

That's 0.8 points of noise. My "4% improvement" has ~1 point of uncertainty baked in. Proper validation requires multiple runs. I didn't do this (compute constraints). Search Space Design is Everything. I spent more time designing the search space than writing the evolution code. What components to include? What ranges? What's too complex? Bad search space = wasted compute.

There's no resolution to the crisis in sight

https://punchbowl.news/archive/102325-am/
1•zerosizedweasle•1m ago•0 comments

Why do you love Fedora?

https://old.reddit.com/r/Fedora/comments/1od5xpr/why_do_you_love_fedora/
1•sipofwater•1m ago•0 comments

Ask HN: How many 9s did AWS lose due to the last outage (us-east-1 incident)?

1•2dvisio•2m ago•0 comments

Echoes of Memory

https://neurofrontiers.blog/echoes-of-memory-a-conversation-beyond-the-lab/
3•lentoutcry•5m ago•0 comments

China releases 'UBIOS' standard to replace UEFI

https://www.tomshardware.com/software/china-releases-ubios-standard-to-replace-uefi-huawei-backed...
3•paulgdp•8m ago•0 comments

Developers spend 1% of coding time using VS Code's debugger (11K sessions)

https://floustate.com/blog/developers-spend-1-percent-time-vscode-debugger
2•skrid•8m ago•0 comments

Introduction to Telecom for Software Engineers

https://www.youtube.com/watch?v=hu30dhkrN_U
1•vances•10m ago•0 comments

ChatGPT's Horny Era Could Be Its Stickiest Yet

https://www.wired.com/story/chatgpt-horny-era/
2•quapster•10m ago•0 comments

JSON Schemas in Go

https://www.airs.com/blog/archives/675
1•Bogdanp•10m ago•0 comments

Government shutdown reaching a tipping point, could send the economy into spiral

https://www.marketwatch.com/story/the-government-shutdown-is-reaching-a-tipping-point-that-could-...
2•zerosizedweasle•15m ago•0 comments

Perpetual ML Suite on Snowflake Marketplace

https://app.snowflake.com/marketplace/listing/GZSYZX0EMJ/perpetual-ml-perpetual-ml-suite
1•deadsoul•15m ago•0 comments

Did YouTube censor a well-known Indian documentary about the dairy industry?

https://maakadoodh.in/
2•metta2uall•16m ago•1 comments

Aerospace firms link up to create European rival to Musk's SpaceX

https://www.theguardian.com/business/2025/oct/23/airbus-leonardo-thales-european-rival-elon-musk-...
2•n1b0m•16m ago•0 comments

Cyberus Technology is hiring a Business Development Manager (m/f/d)

https://www.cyberus-technology.de/en/about/careers/business-development-manager
2•CyberusTech•28m ago•1 comments

PyTorch Monarch

https://pytorch.org/blog/introducing-pytorch-monarch/
3•jarbus•29m ago•0 comments

Importing vs. Fetching JSON in JavaScript

https://jakearchibald.com/2025/importing-vs-fetching-json/
3•jaffathecake•37m ago•0 comments

Cyberus Technology is hiring a Full-Stack Software Engineer (m/f/d) in Rust

https://www.cyberus-technology.de/en/about/careers/rust-engineer
1•CyberusTech•37m ago•1 comments

Dash.Monster – The Unified Game API Platform

https://dash.monster/
1•prompt2tool•38m ago•1 comments

Build DApps on BNB Chain – $400K+ in Prizes and Launchpad Opportunities

https://dorahacks.io/hackathon/predictionmarketshackathon
1•seedify•39m ago•1 comments

Is Zscaler Considered a VPN? The Key Differences Explained

https://cloudexplorer.ai/zscaler-vpn-key-differences-explained/
3•BlackPlot•41m ago•0 comments

Redis LangCache

https://redis.io/langcache/
3•mattbit•45m ago•0 comments

Airbus, Leonardo, Thales to Launch Space Tie-Up to Compete with Musk's SpaceX

https://www.wsj.com/business/airbus-leonardo-thales-to-launch-space-tie-up-to-compete-with-musks-...
3•thm•46m ago•0 comments

HunyuanWorld-Mirror: Universal 3D World Reconstruction with Any-Prior Prompting

https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror
1•SweetSoftPillow•47m ago•0 comments

Min Chess Puzzles

https://play.google.com/store/apps/details?id=com.rooktook&hl=en_US
1•shubhamrrawal•48m ago•0 comments

The China Model's Fatal Flaw

https://www.foreignaffairs.com/china/china-models-fatal-flaw-lizzi-lee
1•imastrategist•48m ago•0 comments

What the 3.0 Release Tells Us About WebAssembly's Uncertain Future

https://redmonk.com/kholterhoff/2025/10/17/wasms-identity-crisis/
1•pjmlp•48m ago•0 comments

Bank chief says US firm collapses ring 'alarm bells'

https://www.bbc.com/news/articles/cvgv102n4gwo
2•zerosizedweasle•49m ago•0 comments

15h.org – Maintaining coreboot for the bleeding edge of blob-free x86

https://15h.org/index.php/Home
2•15h•49m ago•1 comments

AIWallpaper: A free wallpaper generation website – no login required

https://aiwallpaper.help/
1•JoahYi•50m ago•1 comments

Ruby Butler: It's Time to Rethink RubyGems and Bundler

https://rubyelders.com/writings/2025-10-ruby-butler-1.html
3•amalinovic•50m ago•0 comments