frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

I migrated 15M records of 13GB data from Mongo to Rails Postgres on 512MB budget

1•Fh_•1h ago
I recently finished a migration for a document extraction engine, moving 15 million paragraphs from MongoDB to a boring, standard PostgreSQL stack. Since I was running this on Heroku with tight resource constraints, I ran into some interesting "physics" problems with Ruby’s memory and the Linux allocator.

The highlights of what I ran into:

The Swiss Cheese Heap: Even with idiomatic code, I kept hitting R14 memory errors. It turns out the heap was fragmenting so badly that the OS couldn't reclaim RAM. Instead of just jumping to jemalloc, I forced glibc to be frugal by setting MALLOC_ARENA_MAX=2 and manually triggering GC.compact every 10k records to smash the "holes" closed.

Sanitization Boundaries: MongoDB’s schema-less nature meant I had null bytes (\u0000) hiding in my text. Postgres (rightfully) hates those, so I had to build a sanitization boundary into the upsert logic to keep transactions from aborting.

The "Murder by Console" Problem: I learned the hard way that jumping into a production Rails console on a limited Redis plan can grab half your available connections instantly, killing your background workers. I ended up capping concurrency to 1 and RAILS_MAX_THREADS to 2.

The Flow State: Counter-intuitively, throughput went up 40 percent when I silenced ActiveRecord logs and dropped Sidekiq concurrency to 1. Removing the context switching and disk I/O noise allowed the worker to stay in a tight loop.

The goal was to move from a complex polyglot setup to a boring stack that just works. If you are dealing with large-scale backfills in Ruby, I’d love to hear how you handle the memory fragmentation side of things.

KittySploit Framework

https://github.com/SIA-IOTechnology/Kittysploit-framework
1•TheShogan•45s ago•0 comments

NASA 'Emailed' a Wrench into Space (2014)

https://www.businessinsider.com/nasa-emails-a-wrench-into-space-using-3d-printing-2014-12
1•teleforce•1m ago•0 comments

Show HN: Posts p/month more than doubled in the last year

https://petegoldsmith.com/2026/01/26/2026-01-26-show-hn-trends/
1•theraven•5m ago•0 comments

ChatGPT can analyze Apple Watch health data

https://www.washingtonpost.com/technology/2026/01/26/chatgpt-health-apple/
1•reaperducer•5m ago•0 comments

Weird Old Punctuation Marks We Should Bring Back

https://www.mentalfloss.com/language/weird-old-punctuation-marks-bring-back
1•jcynix•6m ago•0 comments

Me/CFS – A Comprehensive Medical Documentation

https://zenodo.org/records/18370022
1•humanfromearth9•8m ago•1 comments

Joel Spolsky: Painless Software Schedules (2000)

https://www.joelonsoftware.com/2000/03/29/painless-software-schedules/
1•MonkeyClub•13m ago•0 comments

KTH Innovation Award 2025: Anton Osika and Fabian Hedin

https://www.kth.se/en/om/nyheter/centrala-nyheter/anton-osika-och-fabian-hedin-kth-innovation-awa...
1•teleforce•13m ago•0 comments

TSMC Risk

https://stratechery.com/2026/tsmc-risk/
1•swolpers•13m ago•0 comments

Ports and Adapters: death by a thousand ports

https://world.hey.com/apetrov/ports-adapters-death-by-a-thousand-ports-8b42afcf
1•apetrov•18m ago•0 comments

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
2•beigebrucewayne•20m ago•0 comments

ESI Language Specification 1.0

https://www.w3.org/TR/esi-lang/
1•captn3m0•21m ago•0 comments

A study of personality convergence across language models

https://avikrishna.substack.com/p/eliciting-frontier-model-character
1•tjsk•22m ago•0 comments

Copilot committed my repo secrets into AGENTS.md

https://bsky.app/profile/benfoxall.bsky.social/post/3mdcumabg6s2c
3•benjaminbenben•22m ago•1 comments

Transformers V5 is out!

https://github.com/huggingface/transformers/releases/tag/v5.0.0
3•kashifr•22m ago•0 comments

Clawdbot: Personal AI Assistant

https://clawd.bot/
2•puppion•25m ago•0 comments

Show HN: A Neovim plugin to add comments for coding agents

https://github.com/czheo/anno.nvim
1•czheo•28m ago•0 comments

Zero-Knowledge Encrypted Notebook

1•thesecurenote•28m ago•0 comments

Trump Administration Plans to Write Regulations Using Artificial Intelligence

https://www.propublica.org/article/trump-artificial-intelligence-google-gemini-transportation-reg...
3•beardyw•30m ago•0 comments

Show HN: An interactive timeline of computer viruses, worms, and digital threats

https://github.com/rsc-dev/malware-museum.com
1•rsc-dev•31m ago•0 comments

Tell HN: Aden, A YC company, is growth hacking by luring devs with paid work

5•theblazehen•32m ago•0 comments

The Private Equity Roll-Up of HVAC

https://talk24.ai/blog/hvac-private-equity-consolidation
1•atreeleaf•32m ago•0 comments

Building a Sovereign Portfolio Risk Calculator: Why We Ditched the Back End

https://www.pocketportfolio.app/blog
1•pocketportfolio•32m ago•0 comments

FOSDEM 2026 – The Servo project and its impact on the web platform ecosystem [video]

https://fosdem.org/2026/schedule/event/LXFKS9-servo-project-impact/
1•robin_reala•42m ago•0 comments

Robin Williams tickles Coco the monkey

https://www.koko.org/emails/when_robin_met_koko_video/
2•irthomasthomas•43m ago•0 comments

World's Biggest TikToker from Senegal sells company in $900M deal

https://africa.businessinsider.com/local/markets/worlds-biggest-tiktoker-from-senegal-sells-compa...
3•thunderbong•45m ago•0 comments

Free-Coloring-Pages-Generator

https://www.genstory.app/story-template/free-coloring-pages-generator
1•RyanMu•48m ago•1 comments

Ellen MacArthur Foundation Circularity Indicators Flawed?

1•_zero_echo_•50m ago•0 comments

Instructions in papers can manipulate AI reviewers 78-86% of the time

https://www.researchsquare.com/article/rs-8432945
2•evilscript•52m ago•2 comments

Browserbase vs. Kernel: Building a Google Flights Scraper Twice

https://medium.com/tech-stackups/browserbase-vs-kernel-cloud-browser-automation-for-ai-agents-b10...
1•sixhobbits•52m ago•0 comments