Reverse Proxy Deep Dive: Why Load Balancing at Scale Is Hard

https://startwithawhy.com/reverseproxy/2025/08/08/ReverseProxy-Deep-Dive-Part4.html

86•miggy•6mo ago

Comments

betaby•5mo ago

On the subject I can recommend the original paper from Google about Maglev https://static.googleusercontent.com/media/research.google.c...

and subsequent enhancement from Yandex folks https://github.com/kndrvt/mhs

Explanation is at https://habr.com/ru/companies/yandex/articles/858662/ use your favorite translate site.

nimbius•5mo ago

its honestly not, but younger developers can be forgiven for assuming traefik is all you need. the learn-to-code camps really did a number on kids these days :(

use DSR and 50% of your traffic is taken care of. https://www.loadbalancer.org/blog/direct-server-return-is-si...

explore load balancing lower in the stack based on ASN to preroute stuff for divide and conquer. (geolocated, etc...)

weighted load balancing only works for uniform traffic sources. youll need to weight connections based on priority or location, backend heavy transactions (checkout vs just browsing the store) and other conditions that can change the affinity of your user (sometimes dynamically.) keepalived isnt mentioned once, or .1q trunk optimization, or SRV records and failover/HA thats performed in most modern browsers based on DNS information itself.

SteveNuts•5mo ago

> most modern browsers based on DNS information itself.

I went down this rabbit hole and was surprised how all over the place the behavior was against various http clients (not just browsers). Very little consistency in how the IPs in the dns response are retried, if at all.

miggy•5mo ago

Author here. Thanks for sharing these thoughts. You’re right that DSR, ASN-based routing, SRV records, and other lower-layer approaches are important in certain setups.

This post is focused primarily on Layer 7 load balancing, connection and request routing based on application-level information, so it doesn’t go into Layer 3/4 techniques like DSR or network-level optimizations. Those are certainly worth covering in a broader series that spans the full stack.

gerdesj•5mo ago

HA Proxy has been doing this sort of thing for a very, very long time.

You have stick tables and a very rich way of populating them and then you can use these tables of in RAM data to make routing decisions.

Sometimes you need another proxy too - eg Apache/nginx or whatever, perhaps for authn/authz.

Yes it is a tricky concept and this series of articles merely scratches the surface. Good effort though.

miggy•5mo ago

Author here. Absolutely, HAProxy’s sticktables is a powerful way to implement advanced routing logic, and they’ve been around for years. This series focuses on explaining the broader concepts and tradeoffs rather than diving deep into any single implementation, and since it also covers other aspects of reverse proxies, the focus on load balancing here is mostly to present the challenges and high-level ideas.

Glad you found it a good effort, and I agree there’s room to go deeper in future posts.

gerdesj•5mo ago

"Common load balancing algoithims and challenges"

algorithms is pretty hard as a spelling: its derived from something like Al Gorism - the name of an Arab chap who documented an early notion. By the time English has decided to create a word, you can be sure it will be ... painful!

Keep going mate, you have a great writing style and presentation.

ExoticPearTree•5mo ago

Shower thoughts: since we can do service discovery pretty easily to know when a server was added or removed from a pool, we can also discover a metrics endpoint with a limited set like CPU load, memory load, threads available etc. With a helper process/thread running alongside the loadbalancer main processes, it could populate/update in almost realtime the equivalent of an haproxy stick tables but with much richer information. When the next request hits the loadbalancer, you know “exactly” where to route it for best performance.

miggy•5mo ago

Author here. Two quick thoughts: 1. As I covered in an earlier part of this series, service discovery is not always easy at scale. High churn, partial failures, and the cost of health checks can make it tricky to get right. 2. Using server-side metrics for load balancing is a great idea. In many setups, feedback is embedded in response headers or health check responses so the LB can make more informed routing decisions. Hodor at LinkedIn is a good example of this in practice: https://www.linkedin.com/blog/engineering/data-management/ho...

ExoticPearTree•5mo ago

I was thinking something along the lines of a “map” with all the backends and their capabilities that would be recomputed every N seconds and atomically switched with the previous one. The LB woukd then be able to decide where to send a request and also have a precomputed backup option in case the first choice would become unavailable. You could also use those metrics to signal that a node needs to be drained of traffic for example, so no more new connections towards it.

I understand the complexities of having a large set of distributed services behind load balancers, I just think there could be a better way of choosing a backend based not only on least requests, TTFB and an OK response from a health check every N seconds.

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

I Write Games in C (yes, C)

Unseen Footage of Atari Battlezone Arcade Cabinet Production

SectorC: A C Compiler in 512 bytes

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Hackers (1995) Animated Experience

Ga68, a GNU Algol 68 Compiler

We Mourn Our Craft

Speed up responses with fast mode

Hoot: Scheme on WebAssembly

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Al Lowe on model trains, funny deaths and working with Disney

The AI boom is causing shortages everywhere else

The Waymo World Model

Reinforcement Learning from Human Feedback

Start all of your commands with a comma (2009)

Vocal Guide – belt sing without killing yourself

France's homegrown open source online office suite

Coding agents have replaced every framework I used

Selection Rather Than Prediction

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

I Write Games in C (yes, C)

Unseen Footage of Atari Battlezone Arcade Cabinet Production

SectorC: A C Compiler in 512 bytes

Where did all the starships go?

Software factories and the agentic moment

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Learning from context is harder than we thought

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Making geo joins faster with H3 indexes

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Hackers (1995) Animated Experience

Ga68, a GNU Algol 68 Compiler

Reverse Proxy Deep Dive: Why Load Balancing at Scale Is Hard

Comments