A Python-first data lakehouse

https://www.bauplanlabs.com/blog/everything-as-python

65•akshayka•2d ago

Comments

flakiness•3h ago

There have been so many "better notebook" implementations over the years that I cannot catch up. What are the promising one? Is this "marimo" one of them or rather a newcomer?

simonw•3h ago

Marimo is very impressive. It's effectively a cross between Jupyter and https://observablehq.com/ - it adds "reactivity", which solves the issue where Jupyter cells can be run in any order which can make the behavior of a notebook unpredictable, whereas in Marimo (and Observable) updating a cell automatically triggers other dependent cells to re-execute, similar to a spreadsheet.

Marimo is pretty new (first release January 2025) but has a high rate of improvement. It's particularly good for WebAssembly stuff - that's been one of their key features almost from the start.

My notes on it so far are here: https://simonwillison.net/tags/marimo/

lvl155•2h ago

I think it’s safe to say Observable’s inability to properly price their services made people look elsewhere. Their new offering is interesting but also ridiculously priced.

ayhanfuat•22m ago

I was also wondering their pricing because Canvas seemed so cool at first. Now that I've seen your comment I checked and $900/month (includes 10 users) is indeed very high. I guess they are primarily targeting big enterprises.

akshayka•2h ago

Thanks Simon for the kind words!

For those new to marimo, we have affordances for working with expensive (ML/AI/pyspark) notebooks too, including lazy execution that gives you guarantees on state without running automatically.

One small note: marimo was actually first launched publicly (on HN) in January 2024 [1]. Our first open-source release was in 2023 (a quiet soft launch). And we've been in development since 2022, in close consultation with Stanford scientists. We're used pretty broadly today :)

[1] https://news.ycombinator.com/item?id=38971966

theLiminator•3h ago

I personally really like marimo. It's very easy to use and for data analysis type tasks it seems to work a lot better than jupyter in most cases.

cantdutchthis•2h ago

marimo is open source and uses a reactive model which makes it fun to mix/match widgets with Python code. It even supports gamepads if you wanted to go nuts!

https://youtu.be/4fXLB5_F2rg?si=jeUj77Cte3TkQ1j-

disclaimer: I work for marimo and I made that video, but the gamepad support is awesome and really shows the flexibility

Snakes3727•2h ago

One of the most critical aspects a Lakehouse is protecting data for security and compliance reasons and this article completely just glosses over it which makes me really uncomfortable.

jtagliabuetooso•2h ago

Thanks for the feedback. Bauplan actually features a few innovative points in this area, and full Pythonic at that: Git for Data (https://docs.bauplanlabs.com/en/latest/concepts/git_for_data...) to sandbox any data change, tag it for compliance and make it querable; full code and data auditability in one command (AFAIK, the only platform offering this), as every change is automatically versioned and tagged with the exact run and code that produced it (https://docs.bauplanlabs.com/en/latest/concepts/git_for_data...).

Our sandbox with public data is free for you to try, or just reach out and ask any question!

jtagliabuetooso•2h ago

Hey, founder of Bauplan here. Happy to field any questions or thoughts. Yes, marimo is great, and it's the only way to work within a real Python ecosystem for production use cases shipping proper code.

waffletower•1h ago

Rolling a notebook out to a service rapidly is an attractive idea -- but, as mentioned, has security implications -- I can add that there are also a host of monitoring implications as well -- service quality & continuity, model quality etc.

jtagliabuetooso•1h ago

You mean on the data side? Data access in the example (and in real-world) is mediated by production-grade Iceberg compatible catalog, sandboxed changes, and full auditability trail (https://docs.bauplanlabs.com/en/latest/concepts/git_for_data...). Or do you mean something else?

waffletower•1h ago

I don't think python is always the best suited language for managing models and agents, but it certainly is the most popular and has the largest choice of related libraries. "Python first" or "pythonic" invites skepticism from me.

davistreybig•1h ago

Huge fan of Marimo - fixes so many of the annoying problems w/ notebooks

blooalien•34m ago

I find Marimo best for when you're trying to build something "app-like"; an interactive tool to perform a specific task. I find Jupyter lab more appropriate for random experimentation and exploration, and documenting your learnings. Each absolutely has it's place in the toolbox, and does it's thing well, but for me at least, there's not much overlap between the two other than the cell-based notebook-like similarity. That similarity works well for me when migrating from exploration mode to app design mode. The familiar interface makes it easy for me to take ideas from Jupyter into Marimo to build out a proper application.

Phoenix.new – Remote AI Runtime for Phoenix

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Show HN: Nxtscape – an open-source agentic browser

EU Eyes Ditching Microsoft Azure for France's OVHcloud

Show HN: Inspect and extract files from MSI installers directly in your browser

Cracovians: The Twisted Twins of Matrices

Dancing Naked on the Head of a Pin: The Early History of Microphotography

Oklo, the Earth's Two-billion-year-old only Known Natural Nuclear Reactor (2018)

Tuxracer.js play Tux Racer in the browser

A Python-first data lakehouse

Hurl: Run and test HTTP requests with plain text

Klong: A Simple Array Language

Show HN: SnapQL – Desktop app to query Postgres with AI

An analysis of recent multithreading improvements for a smoother game

New dating for White Sands footprints confirms controversial theory

How to Design Programs 2nd Ed (2024)

Verified Dynamic Programming with Σ-types in Lean

A Brief, Incomplete, and Mostly Wrong History of Robotics

Minimal auto-differentiation engine in Rust

Asterinas: A new Linux-compatible kernel project

Career advice, or something like it

Meta announces Oakley smart glasses

College baseball, venture capital, and the long maybe

Qfex (YC X25) – Back End Engineer for a 24/7 Stock Exchange

ELIZA Reanimated: Restoring the Mother of All Chatbots

Congestion pricing in Manhattan is a predictable success

Show HN: SecureBuild – Zero-CVE Images That Pay OSS Projects

Reworking Memory Management in CRuby [pdf]

Giant, all-seeing telescope is set to revolutionize astronomy

Andrej Karpathy: Software in the era of AI [video]

A Python-first data lakehouse

Comments

Phoenix.new – Remote AI Runtime for Phoenix

Visualizing environmental costs of war in Hayao Miyazaki's Nausicaä

Show HN: Nxtscape – an open-source agentic browser

EU Eyes Ditching Microsoft Azure for France's OVHcloud

Show HN: Inspect and extract files from MSI installers directly in your browser

Cracovians: The Twisted Twins of Matrices

Dancing Naked on the Head of a Pin: The Early History of Microphotography

Oklo, the Earth's Two-billion-year-old only Known Natural Nuclear Reactor (2018)

Tuxracer.js play Tux Racer in the browser

A Python-first data lakehouse

Hurl: Run and test HTTP requests with plain text

Klong: A Simple Array Language

Show HN: SnapQL – Desktop app to query Postgres with AI

An analysis of recent multithreading improvements for a smoother game

New dating for White Sands footprints confirms controversial theory

How to Design Programs 2nd Ed (2024)

Verified Dynamic Programming with Σ-types in Lean

A Brief, Incomplete, and Mostly Wrong History of Robotics

Minimal auto-differentiation engine in Rust

Asterinas: A new Linux-compatible kernel project

Career advice, or something like it

Meta announces Oakley smart glasses

College baseball, venture capital, and the long maybe

Qfex (YC X25) – Back End Engineer for a 24/7 Stock Exchange

ELIZA Reanimated: Restoring the Mother of All Chatbots

Congestion pricing in Manhattan is a predictable success

Show HN: SecureBuild – Zero-CVE Images That Pay OSS Projects

Reworking Memory Management in CRuby [pdf]

Giant, all-seeing telescope is set to revolutionize astronomy

Andrej Karpathy: Software in the era of AI [video]