frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
162•theblazehen•2d ago•47 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
674•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
950•xnx•20h ago•552 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
123•matheusalmeida•2d ago•33 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
22•kaonwarb•3d ago•19 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
58•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
232•isitcontent•14h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
225•dmpetrov•15h ago•118 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
332•vecti•16h ago•144 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
495•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
383•ostacke•20h ago•95 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
360•aktau•21h ago•182 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
289•eljojo•17h ago•175 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
413•lstoll•21h ago•279 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
32•jesperordrup•4h ago•16 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
20•bikenaga•3d ago•8 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
17•speckx•3d ago•6 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
63•kmm•5d ago•7 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
91•quibono•4d ago•21 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
258•i5heu•17h ago•196 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
32•romes•4d ago•3 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
44•helloplanets•4d ago•42 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
60•gfortaine•12h ago•26 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1070•cdrnsf•1d ago•446 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
36•gmays•9h ago•12 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•70 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
288•surprisetalk•3d ago•43 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
150•SerCe•10h ago•142 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
186•limoce•3d ago•100 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•14h ago•14 comments
Open in hackernews

A Python-first data lakehouse

https://www.bauplanlabs.com/blog/everything-as-python
140•akshayka•7mo ago

Comments

flakiness•7mo ago
There have been so many "better notebook" implementations over the years that I cannot catch up. What are the promising one? Is this "marimo" one of them or rather a newcomer?
simonw•7mo ago
Marimo is very impressive. It's effectively a cross between Jupyter and https://observablehq.com/ - it adds "reactivity", which solves the issue where Jupyter cells can be run in any order which can make the behavior of a notebook unpredictable, whereas in Marimo (and Observable) updating a cell automatically triggers other dependent cells to re-execute, similar to a spreadsheet.

Marimo is pretty new (first release January 2025) but has a high rate of improvement. It's particularly good for WebAssembly stuff - that's been one of their key features almost from the start.

My notes on it so far are here: https://simonwillison.net/tags/marimo/

lvl155•7mo ago
I think it’s safe to say Observable’s inability to properly price their services made people look elsewhere. Their new offering is interesting but also ridiculously priced.
ayhanfuat•7mo ago
I was also wondering their pricing because Canvas seemed so cool at first. Now that I've seen your comment I checked and $900/month (includes 10 users) is indeed very high. I guess they are primarily targeting big enterprises.
akshayka•7mo ago
Thanks Simon for the kind words!

For those new to marimo, we have affordances for working with expensive (ML/AI/pyspark) notebooks too, including lazy execution that gives you guarantees on state without running automatically.

One small note: marimo was actually first launched publicly (on HN) in January 2024 [1]. Our first open-source release was in 2023 (a quiet soft launch). And we've been in development since 2022, in close consultation with Stanford scientists. We're used pretty broadly today :)

[1] https://news.ycombinator.com/item?id=38971966

Peritract•7mo ago
> it adds "reactivity", which solves the issue where Jupyter cells can be run in any order

This is one of the key features of Jupyter to me; it encourages quick experimentation.

sodality2•7mo ago
Once you get to a certain complexity of notebooks, I find it only serves to complicate my mental model to “experiment” out of order. It makes me far more likely to forget to “commit” an ordering change.
abdullahkhalids•7mo ago
Jupyter notebooks do store the execution order of the cells. Just enforce a pre-commit or pre-merge hook that doesn't allow adding notebooks that have out-of-order cells.
akshayka•7mo ago
marimo still allows you to run cells one at a time (and has many built-in UI elements for very rapid experimentation). But the distinction is that in marimo, running a cell runs the subtree rooted at it (or if you have enabled lazy execution, marks its descendants as stale), keeping code and outputs consistent while also facilitating very rapid experimentation. The subtree is determined by statically parsing code into a dependency graph on cells.
theLiminator•7mo ago
I personally really like marimo. It's very easy to use and for data analysis type tasks it seems to work a lot better than jupyter in most cases.
cantdutchthis•7mo ago
marimo is open source and uses a reactive model which makes it fun to mix/match widgets with Python code. It even supports gamepads if you wanted to go nuts!

https://youtu.be/4fXLB5_F2rg?si=jeUj77Cte3TkQ1j-

disclaimer: I work for marimo and I made that video, but the gamepad support is awesome and really shows the flexibility

kernelsanderz•7mo ago
Marimo is really special and solves most of the problems that you have with Jupyter. For those Marimo curious I strongly recommend checking out their YouTube channel. So much effort gone into making these videos really great. https://youtube.com/@marimo-team?si=ZGaf8Zgq5WN3LKRg
Snakes3727•7mo ago
One of the most critical aspects a Lakehouse is protecting data for security and compliance reasons and this article completely just glosses over it which makes me really uncomfortable.
jtagliabuetooso•7mo ago
Thanks for the feedback. Bauplan actually features a few innovative points in this area, and full Pythonic at that: Git for Data (https://docs.bauplanlabs.com/en/latest/concepts/git_for_data...) to sandbox any data change, tag it for compliance and make it querable; full code and data auditability in one command (AFAIK, the only platform offering this), as every change is automatically versioned and tagged with the exact run and code that produced it (https://docs.bauplanlabs.com/en/latest/concepts/git_for_data...).

Our sandbox with public data is free for you to try, or just reach out and ask any question!

zelphirkalt•7mo ago
When I first quickly glanced at this heading, I read "Leakhouse" instead of "Lakehouse" :D And then I saw your comment...
jtagliabuetooso•7mo ago
Hey, founder of Bauplan here. Happy to field any questions or thoughts. Yes, marimo is great, and it's the only way to work within a real Python ecosystem for production use cases shipping proper code.
benrutter•7mo ago
Hey! Congrats on the product. Do you have any more complex examples anywhere?

I'm a data engineer and make decisions around what software we use for pipelines. A lot of examples for these types of tools showcase the simple case, which is a handy intro, but I'd love to see a real world example of Bauplan scaling to interconnected pipelines!

jtagliabuetooso•7mo ago
Hey Ben, thanks for your message.

We have people building stuff featured here (https://www.bauplanlabs.com/build-with-bauplan) as well as online (e.g. https://blog.det.life/bauplan-the-serverless-data-lakehouse-...), plus of of course our examples repo in Github that you can check as part of the tutorial.

Our largest client is a 5BN / USD year company running thousands of jobs on bauplan. If you have something in mind, you can try out the public sandbox for free and come on our Slack, and I'm happy to build something with you.

waffletower•7mo ago
Rolling a notebook out to a service rapidly is an attractive idea -- but, as mentioned, has security implications -- I can add that there are also a host of monitoring implications as well -- service quality & continuity, model quality etc.
jtagliabuetooso•7mo ago
You mean on the data side? Data access in the example (and in real-world) is mediated by production-grade Iceberg compatible catalog, sandboxed changes, and full auditability trail (https://docs.bauplanlabs.com/en/latest/concepts/git_for_data...). Or do you mean something else?
waffletower•7mo ago
I don't think python is always the best suited language for managing models and agents, but it certainly is the most popular and has the largest choice of related libraries. "Python first" or "pythonic" invites skepticism from me.
davistreybig•7mo ago
Huge fan of Marimo - fixes so many of the annoying problems w/ notebooks
blooalien•7mo ago
I find Marimo best for when you're trying to build something "app-like"; an interactive tool to perform a specific task. I find Jupyter lab more appropriate for random experimentation and exploration, and documenting your learnings. Each absolutely has it's place in the toolbox, and does it's thing well, but for me at least, there's not much overlap between the two other than the cell-based notebook-like similarity. That similarity works well for me when migrating from exploration mode to app design mode. The familiar interface makes it easy for me to take ideas from Jupyter into Marimo to build out a proper application.
marcoalopez•7mo ago
This is exactly my impression.
akshayka•7mo ago
Thanks for the kind words. Many of our users have switched entirely from Jupyter to marimo for experimentation (including the scientists at Stanford's SLAC alongside whom marimo was originally designed).

I have spent a lot of time in Jupyter notebooks for experimentation and research in a past life, and marimo's reactivity, built-in affordances for working with data (table viewer, database connections, and other interactive elements), lazy execution, and persistent caching make me far more productive when working with data, regardless of whether I am making an app-like thing.

But as the original developer of marimo I am obviously biased :) Thanks for using marimo!

blooalien•7mo ago
I just like the Jupyter Lab overall IDE-like interface. It's really well designed for general random exploration, and works well with Wil McGugan's "Rich" console output library. On the other hand, it's not really at all well suited for building web application type stuff. It's capable of it (with a whole lotta "hackery" and jumping through hoops) but it's not really built for it the way Marimo is. Marimo just feels like the right choice once you want to build a real repeatable end-usery type application for day to day use on a specific task. The widget set seems really well designed in Marimo too. I'm also really pleased with Marimo's usage of the uv Python package tool as well. I fully intend to keep both Marimo and Jupyter within easy reach, as they're both really excellent at what they do.
debarshri•7mo ago
If you click on See certifications in Security section[1]. It resolves to an empty section.

[1] https://security.bauplanlabs.com/#resources-b2152df0-4179-48...

jtagliabuetooso•7mo ago
Mhmm, it doesn't resolve to empty but the full SecureFrame monitoring: https://security.bauplanlabs.com/#resources-b2152df0-4179-48... - if you wait a second, this is the entire report: https://www.loom.com/share/7cfc9c2f020645ddab2b1850b9c47619?...
markhahn•7mo ago
I am strangely unmoved by some new SaaS which is not open-source and self-hostable.
jtagliabuetooso•7mo ago
Thanks for checking out bauplan (which also supports BYOC, so I guess it is indeed hostable by you in a sense!).

We've done quite a lot of open source in our life, at Bauplan (you can check our github), and before (you can check me ;-)), so the comment seems unfair!

We understand the importance of being clear on how the platform works, and for that we have a long series of blog posts and, if you're so inclined, quite a few peer-reviewed papers in top conferences, ranging from low-level memory optimizations (https://arxiv.org/abs/2504.06151), columnar caching (https://arxiv.org/abs/2411.08203), novel FaaS runtimes (https://arxiv.org/pdf/2410.17465), pipeline reproducibility (https://arxiv.org/pdf/2404.13682) and more.

We are also always happy to chat about our tech choices if you're interested.

bluehex•7mo ago
"Data lake", "data lakehouse"...

Who comes up with these weird names for patterns. What the heck is "lake" supposed to invoke.

jtagliabuetooso•7mo ago
Yeah, terms are confusing sometimes! "Data lakehouse" is weirdly enough a "technical term". The canonical reference is from CIDR https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf, but we have our own version from VLDB https://arxiv.org/pdf/2308.05368
Noumenon72•7mo ago
> Option 2 - Hand it off to DevOps. The other option is to have data science produce prototypes that can be on Notebooks and then have a devops team whose job is to refactor those into an application that runs in production. This process makes things less fragile, but it is slow and very expensive.

I've never understood why this is so hard. Every time data science gives me a notebook it feels like I have been handed a function that says `doFeature()` and should just have to put it behind an endpoint called /do_feature, but it always takes forever and I'm never even able to articulate why. It feels like I am clueless at reading code but just this one particular kind of code.

dcreater•7mo ago
I'll do you one better. Productionizing a data science prototype is exactly the kind of grunt work AI is able to take over.

I think its a much better result to have data science prototype translated to a performant production version rather than have a databricks type approach or what bauplan is proposing.

stingraycharles•7mo ago
Maybe, but it would still need to work within a well defined framework. Usually the data science part is “solve the problem”, the data engineering part is “make it work reliably, fast, at scale”.

What that looks like is highly dependent upon the environment at hand, and letting AI take that over may be one of those “now you have 2 problems” things.

jtagliabuetooso•7mo ago
We are not proposing or advocating for any approach to development (I personally almost never use notebooks these days and run Bauplan with preview).

The blog together with our marimo friends is to showcase that you can have notebook development if you like it AND cloud scaling (which u need) without code changes, thanks to the fact that both marimo and Bauplan are basically Python (maybe a small thing, but there is nothing else in the market remotely close).

On the AI part, we agree: the fact that bauplan is just Python, including data management and infra-as-code, makes it trivial for AI to build pipelines in Bauplan, which is not something that can be said about other data platforms - if you follow our blog, we are releasing in a few weeks or so a full "agentic" implementation with Bauplan API of production ETL workloads, which you may find interesting.

jtagliabuetooso•7mo ago
Thanks for the comment: your frustration is the default in the industry, and it's part of the reasons why Bauplan was built.

"but it always takes forever and I'm never even able to articulate why." -> there are way more factors at play than DoFeatures unfortunately, see for example Table 1 (https://arxiv.org/pdf/2404.13682). Even knowing which data people have developed on is hard, which is why bauplan has git-for-data semantics built in: everyone works on production data, but safely and reliably, to avoid data skews.

Each computer is different, which is why bauplan adopt FaaS with isolated and fully containerized functions: you are always in the cloud, so no skew in the infra etc.

The problem of "going to production" is still the biggest issue in the industry, and solving it is not a one-fix kind of thing, but unfortunately the combination of good ergonomics, new abstractions and reliable infra.

mr_toad•7mo ago
A data scientist wants results with minimum programming effort, and efficiency be damned. Pull all the data and join it all together in a honking great data frame, use brute force to analyse it.

This isn’t necessarily what you want in a daily production environment, let alone a real-time environment.

dcreater•7mo ago
Not open source. DOA.
jtagliabuetooso•7mo ago
Thanks for your comment. As stated elsewhere, we understand the need for people to know how the system works, and have contributed back our ideas (and quite a bit of open source code) to the community: if you want to check our blogs and / or papers, I'm sure you'll find many interesting things.

If you're worried about data movement or secure deployment, none of that is an issue because of Iceberg + BYOC option.

Databricks and Snowflake, just to mention two players in a similar space, are not OS: did you feel that would prevent you from adopting them as well?

dcreater•7mo ago
> Databricks and Snowflake, just to mention two players in a similar space, are not OS: did you feel that would prevent you from adopting them as well?

Yes absolutely. Snowflake is a modern Oracle. It may survive but will be more of a barnacle/legacy system for big corporations. Neither are the right solution for the next generation of companies that are starting up today

dcreater•7mo ago
Posing the sentiment differently: Why not go open source? Follow the same model as marimo, astral etc. that are enriching the python ecosystem?
jtagliabuetooso•7mo ago
Marimo and astral are great and we use them both, they are not however infrastructure companies, so the parallel is a bit imperfect. Wouldn't you use AWS because it's closed source? And BigQuery? Or Motherduck?

There is no "one size fits all" when it comes to building companies, and the right answers depend on many factors: it would be interesting to know your choices for example!

I do agree with you that is important to give back to the ecosystem, but per size / dollar, bauplaners have done and continue doing as much as anyone. All in all, we have shared our ideas in the community in 50+ research papers in top venues (with thousands of citations), and we have quite a few popular open source contributions, with millions of downloads and >10k GitHub stars in total (our FaaS scheduler simulator was just open sourced with our VLDB25 WS paper).

You can be a good citizen of the Python / AI / database ecosystem without doing open source as a business strategy: the reality is more nuanced I believe!

haikuya•7mo ago
For reproducibility https://kedro.org/
jtagliabuetooso•7mo ago
Importantly, kedro does not run things for you, resulting in a suboptimal experience because the runtime and dsl are separated: in particular, it does not solve the problem of having K different systems with scattered logs and not easy to integrate APIs.

If you want to dive deeper in one line reproducibility, you can chek our SIGMOD24 paper: https://arxiv.org/pdf/2404.13682. Let us know what you think!

b0a04gl•7mo ago
> "you can define assets in pure python using any framework or engine you want."

sounds flexible but what does that actually mean in practice? are there guardrails to keep things interoperable

> "engine-agnostic execution"

how that holds up when switching between, say, pandas and spark. are dependencies and semantics actually preserved or is it up to us to manually patch the gaps every time the backend shifts?

jtagliabuetooso•7mo ago
Spark is technically not Python, even if we support PySpark with the relevant decorator but it's a very niche use case for us.

As for all the other Python packages, including proprietary ones, the FaaS model is such that you can declare any package you want in a function as node in the pipeline DAG, and any other in another: every function is fully isolated, and you can even selectively use pandas 1 in one, pandas 2 in another, or update the Python interpreter only in node X.

If you're interested in containerization and FaaS abstractions, this is good deep dive: https://arxiv.org/pdf/2410.17465

If you're more the practical type, just try out a few runs in the public sandbox which is free even if we are not GA.

mrsofty•7mo ago
marimo has made development FUN again for me. From the way I can use uv to manage my packages using the ui ( left julia because of that issue) to the reactive nature of the cells ( goodbye jupyter) it's a wonderful experience. It's fast moving and the community is wonderful. To have another lakehouse with such ease of use is another example of it's attraction. WELL DONE marimo j215