frontpage.

We recently hosted a small online meetup at OLake where a Data Engineer at PhysicsWallah, walked through why his team dropped Debezium and moved to OLake’s “MongoDB → Iceberg” pipeline.

Video (29 min): https://www.youtube.com/watch?v=qqtE_BrjVkM

If you are someone who prefer text, here’s the quick TLDR;

Why Debezium became a drag for them: 1. Long full loads on multi-million-row MongoDB collections, and any failure meant restarting from scratch 2. Kafka and Connect infrastructure felt heavy when the end goal was “Parquet/Iceberg on S3” 3. Handling heterogeneous arrays required custom SMTs 4. Continuous streaming only; they still had to glue together ad-hoc batch pulls for some workflows 5. Ongoing schema drift demanded extra code to keep Iceberg tables aligned

What changed with OLake? -> Writes directly from MongoDB (and friends) into Apache Iceberg, no message broker in between

-> Two modes: full load for the initial dump, then CDC for ongoing changes — exposed by a single flag in the job config -> Automatic schema evolution: new MongoDB fields appear as nullable columns; complex sub-docs land as JSON strings you can parse later

-> Resumable, chunked full loads: a pod crash resumes instead of restarting

-> Runs as either a Kubernetes CronJob or an Airflow task; config is one YAML/JSON file.

Their stack in one line: MongoDB → OLake writer → Iceberg on S3 → Spark jobs → Trino / occasional Redshift, all orchestrated by Airflow and/or K8s.

Posting here because many of us still bolt Kafka onto CDC just to land files. If you only need Iceberg tables, a simpler path might exist now. Curious to hear others’ experiences with broker-less CDC tools.

(Disclaimer: I work on OLake and hosted the meetup, but the talk is purely technical.)

Check out github repo - https://github.com/datazip-inc/olake

Gen Z Finding Meaning

The fly-tipped sofa: how an abandoned couch changed a small village

Reversible computing with mechanical links and pivots

Meta, App Makers Launch Washington Lobby to Fight Apple and Google

The Mira Pro Color is Boox's first color E Ink monitor

Brazil to offer tax breaks to lure data center investments, sources say

Chai by Langbase: Prompt to Agent

No-engine gamedev using Odin and Raylib

Anthony Downs on Personalities Within Organizations

GSAP is now 100% free for all users

NotebookLM Audio Overviews are now available in over 50 languages

Worries About AI Are Usually Complements Not Substitutes

Weekly Scroll: YouTube's AI Problem

Bio-based adsorbent from modified sphagnum moss for oil-water separation

Why Your Workflows Should Be Postgres Rows

Microsoft: Windows Server hotpatching to require subscription

Lessons Learned Generating 5k AI Personas

Applying Team Topologies to Marketing and Community

The Star Wars: Tales of the Underworld TV show will premiere in Fortnite game

Visual Study Guide – keep your notes, but add auto-generated diagrams and images

Low Background Steel – Content from Before AI

Www.concurrencycontrol.com

Finland restricts use of mobile phones during school day

AI Isn't Only a Tool–It's a Whole New Storytelling Medium

College Is Obsolete: AI Is Making Apprenticeships Cool Again

Problem with React Update Model

Can AI debug problem scenarios in the OpenTelemetry demo application?

What should Engineering Managers be doing, anyway?

Copilot Arena

AI Chatbot Leaderboard

Debezium to olake.io – PhysicsWallah switch for CDC