frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Building an Evolutionary Search for Attention Mechanisms

https://github.com/drhemanm/evo-attention
3•hemanm•3mo ago

Comments

hemanm•3mo ago
Building an Evolutionary Search for Attention Mechanisms (on Free Colab) I spent the last few weeks building a framework that allows evolution to design attention mechanisms instead of hand-crafting them. The results were interesting enough to share.

The Question: Transformers use scaled dot-product attention because it was shown to be effective in the "Attention is All You Need" paper. But was it actually optimal, or just the first thing that worked well enough? Most research tweaks hyperparameters. I wanted to explore the mechanism design space itself. The Constraint: I have no computing budget. No lab. No institutional backing. Just free Colab and curiosity.

This meant: Small models only (~500K parameters) Fast training (5K steps per model) Limited search (120 evaluations total) WikiText-2 (small enough to iterate quickly)

The Approach: I encoded attention mechanisms as genes with 4 components:pythongene = AttentionGene( similarity='dot', # How Q and K compute scores normalization='sparsemax', # How scores become weights gating='output_gate', # Optional gating mechanism temperature='learned' # How to scale attention

This creates a discrete search space of 384+ possible mechanisms.

Then I ran a simple genetic algorithm: Initialize 12 random attention mechanisms Train each for 5K steps on WikiText-2 Keep top 3 (elitism) Generate 9 offspring via crossover + mutation Repeat for 10 generations Each generation takes ~2 hours on free Colab. Total: ~20 GPU hours.What Evolution FoundBest mechanism: dot + sparsemax + output_gate + learned_temperatureResults:

Evolved: 98.45 perplexity Baseline (dot + softmax): 102.90 perplexity Improvement: 4.3% The interesting part isn't the 4% improvement. It's what evolution consistently chose:

Finding #1: Sparsemax > Softmax. Every top performer used sparsemax normalization instead of softmax. Sparsemax (from a 2016 paper) creates sparse attention - many weights become exactly zero. The ML community largely ignored it. Evolution rediscovered it works.

Finding #2: Output Gating is a Universal top mechanism used output gating:pythonoutput = attention_result gate = sigmoid(linear(input)) output = output * gate. This wasn't in the original Transformer. Evolution found it's critical.

Finding #3: Highway Gating Always FailsHighway connections (borrowed from Highway Networks) were the worst performers across all generations. Average perplexity: 115.8.This surprised me - highway connections work elsewhere. But for attention, they consistently failed.

Finding #4: Dot-Product is Actually Good. The winner uses standard dot-product similarity, not some exotic function. The improvement comes from normalization + gating, not from replacing the core similarity function. This makes the result more practical - dot-product is fast.

The Honest Part: This is proof-of-concept, not production-ready: Not tested:

Large models (100M+ params) Other datasets Other domains (vision, audio) Production deployment Known issues:

Training variance is ±1 perplexity Only 93 mechanisms evaluated (~24% of search space) Single run per mechanism (no statistical tests) Baseline wasn't hyperparameter-tuned With enough evolutionary steps, you can probably find "good" hyperparameters for any mechanism. I don't know if I discovered better mechanisms or just better hyperparameters. What I Learned 1. Evolutionary Search is Viable at Small Scale. You don't need massive compute to explore architecture spaces. 20 GPU hours found something interesting.

That's 0.8 points of noise. My "4% improvement" has ~1 point of uncertainty baked in. Proper validation requires multiple runs. I didn't do this (compute constraints). Search Space Design is Everything. I spent more time designing the search space than writing the evolution code. What components to include? What ranges? What's too complex? Bad search space = wasted compute.

The P in PGP isn't for pain: encrypting emails in the browser

https://ckardaris.github.io/blog/2026/02/07/encrypted-email.html
1•ckardaris•2m ago•0 comments

Show HN: Mirror Parliament where users vote on top of politicians and draft laws

https://github.com/fokdelafons/lustra
1•fokdelafons•3m ago•1 comments

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

1•Chance-Device•4m ago•0 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
1•ColinWright•7m ago•0 comments

Jim Fan calls pixels the ultimate motor controller

https://robotsandstartups.substack.com/p/humanoids-platform-urdf-kitchen-nvidias
1•robotlaunch•10m ago•0 comments

Exploring a Modern SMTPE 2110 Broadcast Truck with My Dad

https://www.jeffgeerling.com/blog/2026/exploring-a-modern-smpte-2110-broadcast-truck-with-my-dad/
1•HotGarbage•10m ago•0 comments

AI UX Playground: Real-world examples of AI interaction design

https://www.aiuxplayground.com/
1•javiercr•11m ago•0 comments

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•12m ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•14m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
3•sohimaster•16m ago•1 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
3•harshalone•16m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•21m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•21m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•22m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
2•Brajeshwar•23m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•24m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•24m ago•1 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
9•c420•25m ago•1 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•25m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
3•HotGarbage•26m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•26m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•27m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
5•surprisetalk•31m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
4•TheCraiggers•32m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•33m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
14•doener•33m ago•2 comments

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

https://github.com/vgrippa/myflames
1•tanelpoder•34m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•34m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•36m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•36m ago•0 comments