frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Postgres Postmaster does not scale

https://www.recall.ai/blog/postgres-postmaster-does-not-scale
63•davidgu•13h ago

Comments

vel0city•1h ago
Isn't this kind of the reason why teams will tend to put database proxies in front of their postgres instances, to handle massive sudden influxes of potentially short lived connections?

This sounds exactly like the problem tools like pgbouncer were designed to solve. If you're on AWS one could look at RDS Proxy.

evanelias•1h ago
Also check out ProxySQL [1][2], it's an extremely powerful and battle-tested proxy. Originally it was only for MySQL/MariaDB, where it is very widely used at scale, even despite MySQL already having excellent built-in scalable threaded connection management. But ProxySQL also added Postgres support too in 2024 and that has become a major focus.

[1] https://proxysql.com/

[2] https://github.com/sysown/proxysql

sroussey•1h ago
And lets you rewrite queries on the fly. :)
pmontra•53m ago
The article is very well written but is somewhat lacking at the end.

The conclusion lists pgbouncer as one of the solutions but it does not explain it clearly.

> Many pieces of wisdom in the engineering zeitgeist are well preached but poorly understood. Postgres connection pooling falls neatly into this category. In this expedition we found one of the underlying reasons that connection pooling is so widely deployed on postgres systems running at scale. [...] an artificial constraint that has warped the shape of the developer ecosystem (RDS Proxy, pgbouncer, pgcat, etc) around it.

The artificial constraint is the single core nature of postmaster.

Other points at the end of the article that can be improved:

> we can mechnically reason about a solution.

Mechanically as in letting an AI find a solution, or as in reasoning like a mechanic, or? Furthermore:

> * Implementing jitter in our fleet of EC2 instances reduced the peak connection rate

How? Did they wait a random amount of milliseconds before sending queries to the db?

> * Eliminating bursts of parallel queries from our API servers

How?

tbrownaw•41m ago
> Mechanically as in letting an AI find a solution, or as in reasoning like a mechanic, or?

As in it's fully characterized, so you can use only math and logic rather than relying on experience and guesswork.

vivzkestrel•1h ago
very stupid question: similar to how we had a GIL replacement in python, cant we replace postmaster with something better?
lfittl•39m ago
Specifically on the cost of forking a process for each connection (vs using threads), there are active efforts to make Postgres multi-threaded.

Since Postgres is a mature project, this is a non-trivial effort. See the Postgres wiki for some context: https://wiki.postgresql.org/wiki/Multithreading

But, I'm hopeful that in 2-3 years from now, we'll see this bear fruition. The recent asynchronous read I/O improvements in Postgres 18 show that Postgres can evolve, one just needs to be patient, potentially help contribute, and find workarounds (connection pooling, in this case).

levkk•1h ago
One of the many problems PgDog will solve for you!
eatonphil•1h ago
The article addresses this, sort of. I don't understand how you can run multiple postmasters.

> Most online resources chalk this up to connection churn, citing fork rates and the pid-per-backend yada, yada. This is all true but in my opinion misses the forest from the trees. The real bottleneck is the single-threaded main loop in the postmaster. Every operation requiring postmaster involvement is pulling from a fixed pool, the size of a single CPU core. A rudimentary experiment shows that we can linearly increase connection throughput by adding additional postmasters on the same host.

btown•52m ago
You don't need multiple postmasters to spawn connection processes, if you have a set of Postgres proxies each maintaining a set pool of long-standing connections, and parceling them out to application servers upon request. When your proxies use up all their allocated connections, they throttle the application servers rather than overwhelming Postgres itself (either postmaster or query-serving systems).

That said, proxies aren't perfect. https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html outlines some dangers of using them (particularly when you might need session-level variables). My understanding is that PgDog does more tracking that mitigates some of these issues, but some of these are fundamental to the model. They're not a drop-in component the way other "proxies" might be.

atherton94027•1h ago
I'm a bit confused here, do they have a single database they're writing to? Wouldn't it be easier and more reliable to shard the data per customer?
atsjie•34m ago
I wouldn't call that "easier" perse.
haki•1h ago
Some a prime example of a service that naturally peaks at round hours.

We have a habbit of never scheduling long running processes at round hours. Usually because they tend to be busier.

https://hakibenita.com/sql-tricks-application-dba#dont-sched...

abrookewood•47m ago
I wish more applications would adopt the "H" option that Jenkins uses in it's cron notation - essentially it is a randomiser, based on some sort of deterministic hashing function. So you say you want this job to run hourly and it will always run at the same minute past the hour, but you don't know (or care) what that minute that is. Designed to prevent the thundering herd problem with scheduled work.
parentheses•49m ago
I think this is the kind of investigation that AI can really accelerate. I imagine it did. I would love to see someone walk through a challenging investigation assisted by AI.
moomoo11•45m ago
maybe this is silly but these days cloud resources are so cheap. just loading up instances and putting this stuff into memory and processing it is so fast and scalable. even if you have billions of things to process daily you can just split if needed.

you can keep things synced across databases easily and keep it super duper simple.

mannyv•37m ago
Note that they were running Postgres on a 32 CPU box with 256GB of ram.

I'm actually surprised that it handled that many connections. The data implies that they have 4000 new connections/sec...but is it 4000 connections handled/sec?

kayson•22m ago
> sudo echo $NUM_PAGES > /proc/sys/vm/nr_hugepages

This won't work :) echo will run as root but the redirection is still running as the unprivileged user. Needs to be run from a privileged shell or by doing something like sudo sh -c "echo $NUM_PAGES > /proc/sys/vm/nr_hugepages"

The point gets across, though, technicality notwithstanding.

thayne•14m ago
Or

    echo $NUM_PAGES | tee /proc/sys/vm/nr_hugepages 

I've always found it odd that there isn't a standard command to write stdin to a file that doesn't also write it to stdout. Or that tee doesn't have an option to supress writing to stdout.

When internal hostnames are leaked to the clown

https://rachelbythebay.com/w/2026/02/03/badnas/
41•zdw•41m ago•7 comments

Voxtral Transcribe 2

https://mistral.ai/news/voxtral-transcribe-2
822•meetpateltech•14h ago•202 comments

Postgres Postmaster does not scale

https://www.recall.ai/blog/postgres-postmaster-does-not-scale
65•davidgu•13h ago•20 comments

Sqldef: Idempotent schema management tool for MySQL, PostgreSQL, SQLite

https://sqldef.github.io/
115•Palmik•3d ago•30 comments

Claude Code: connect to a local model when your quota runs out

https://boxc.net/blog/2026/claude-code-connecting-to-local-models-when-your-quota-runs-out/
243•fugu2•3d ago•120 comments

A few CPU hardware bugs

https://www.taricorp.net/2026/a-few-cpu-bugs/
14•signa11•2h ago•3 comments

OpenClaw is what Apple intelligence should have been

https://www.jakequist.com/thoughts/openclaw-is-what-apple-intelligence-should-have-been
250•jakequist•5h ago•230 comments

ICE seeks industry input on ad tech location data for investigative use

https://www.biometricupdate.com/202602/ice-seeks-industry-input-on-ad-tech-location-data-for-inve...
64•WaitWaitWha•1h ago•19 comments

AI is killing B2B SaaS

https://nmn.gl/blog/ai-killing-b2b-saas
297•namanyayg•12h ago•450 comments

Claude Code for Infrastructure

https://www.fluid.sh/
188•aspectrr•11h ago•144 comments

A case study in PDF forensics: The Epstein PDFs

https://pdfa.org/a-case-study-in-pdf-forensics-the-epstein-pdfs/
267•DuffJohnson•15h ago•147 comments

Remarkable Pro Colors

https://www.thregr.org/wavexx/rnd/20260201-remarkable_pro_colors/
89•ffaser5gxlsll•3d ago•33 comments

Building a 24-bit arcade CRT display adapter from scratch

https://www.scd31.com/posts/building-an-arcade-display-adapter
144•evakhoury•12h ago•42 comments

Why S7 Scheme? (2020)

https://iainctduncan.github.io/scheme-for-max-docs/s7.html
7•bmacho•4d ago•2 comments

Microsoft's Copilot chatbot is running into problems

https://www.wsj.com/tech/ai/microsofts-pivotal-ai-product-is-running-into-big-problems-ce235b28
174•fortran77•13h ago•203 comments

Listen to Understand

https://talk.bradwoods.io/blog/listen-to-understand/
26•bradwoodsio•3d ago•3 comments

Sam Altman Responds to Anthropic Ad Campaign

https://twitter.com/i/status/2019139174339928189
46•gradus_ad•1h ago•24 comments

Child prodigies rarely become elite performers

https://www.economist.com/science-and-technology/2026/01/14/why-child-prodigies-rarely-become-eli...
88•i7l•3h ago•69 comments

Lily Programming Language

https://lily-lang.org
29•FascinatedBox•3d ago•21 comments

The Great Unwind

https://occupywallst.com/yen
228•jart•12h ago•182 comments

An interactive version of Byrne's The Elements of Euclid (1847)

https://c82.net/euclid/
9•tzury•2d ago•1 comments

Tractor

https://incoherency.co.uk/blog/stories/tractor.html
166•surprisetalk•1d ago•52 comments

Why more companies are recognizing the benefits of keeping older employees

https://longevity.stanford.edu/why-more-companies-are-recognizing-the-benefits-of-keeping-older-e...
86•andsoitis•6h ago•28 comments

How not to securely erase a NVME drive (2022)

https://peterbabic.dev/blog/how-not-to-securely-erase-nvme-drive/
43•transpute•4d ago•30 comments

A tale of two flows: Metaflow and Kubeflow

https://blog.kubeflow.org/metaflow/
9•savin-goyal•2h ago•0 comments

Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation

https://arxiv.org/abs/2602.00294
158•fheinsen•15h ago•86 comments

Claude is a space to think

https://www.anthropic.com/news/claude-is-a-space-to-think
415•meetpateltech•17h ago•222 comments

Converge (YC S23) Is Hiring Product Engineers (NYC, In-Person)

https://www.runconverge.com/careers/product-engineer
1•thomashlvt•13h ago

RS-SDK: Drive RuneScape with Claude Code

https://github.com/MaxBittker/rs-sdk
106•evakhoury•13h ago•41 comments

Coding Agent VMs on NixOS with Microvm.nix

https://michael.stapelberg.ch/posts/2026-02-01-coding-agent-microvm-nix/
92•secure•3d ago•44 comments