The highlights of what I ran into:
The Swiss Cheese Heap: Even with idiomatic code, I kept hitting R14 memory errors. It turns out the heap was fragmenting so badly that the OS couldn't reclaim RAM. Instead of just jumping to jemalloc, I forced glibc to be frugal by setting MALLOC_ARENA_MAX=2 and manually triggering GC.compact every 10k records to smash the "holes" closed.
Sanitization Boundaries: MongoDB’s schema-less nature meant I had null bytes (\u0000) hiding in my text. Postgres (rightfully) hates those, so I had to build a sanitization boundary into the upsert logic to keep transactions from aborting.
The "Murder by Console" Problem: I learned the hard way that jumping into a production Rails console on a limited Redis plan can grab half your available connections instantly, killing your background workers. I ended up capping concurrency to 1 and RAILS_MAX_THREADS to 2.
The Flow State: Counter-intuitively, throughput went up 40 percent when I silenced ActiveRecord logs and dropped Sidekiq concurrency to 1. Removing the context switching and disk I/O noise allowed the worker to stay in a tight loop.
The goal was to move from a complex polyglot setup to a boring stack that just works. If you are dealing with large-scale backfills in Ruby, I’d love to hear how you handle the memory fragmentation side of things.