Buckaroo is my open source project. It is a dataframe viewer that has the basic features we expect in a modern table - scroll, search, sort. In addition there are summary stats, and histograms available. Buckaroo support Pandas and Polars dataframes and works on Jupter, Marimo, VSCode and Google Colab notebooks.
All of this is extensible. I think of Buckaroo as a framework for building table UIs, and an initial data exploration app built on top of that framework. AG-Grid is used for the core table display and it has been customized with a declarative layer so you don't have to pass JS functions around for customizations. On the python side there is a framework for adding summary stats (with a small DAG for dependencies). There is also an entire Low Code UI for point and click selection of common commands (drop column). The lowcode UI also generates a python function that accomplishes the same tasks. This is built on top of JLisp - a small lisp interpreter that reads JSON flavored lisp.
Auto Cleaning looks at columns and heuristically suggests common cleaning operations. The operations are added to the lowcode UI where they can be edited. Multiple cleaning strategies can be applied and the best fit retained. Autocleaning without a UI and multiple strategies is very opaque. Since this runs heuristically (not with an LLM), it’s fast and data stays local.
I'm eager to hear feedback from data scientists and other users of dataframes/notebooks.
ZeroCool2u•2h ago
1: https://marketplace.visualstudio.com/items?itemName=ms-tools...
paddy_m•2h ago
The Buckaroo lowcode UI is capable of working with Polars, but I don't currently have any commands plumbed in. I will work on that.
I'm aware of Data Wrangler and they did nice work, but it's closed source and from what I can tell non-extensible. What features do you like in Data Wrangler, what do you wish it did differently?
paddy_m•1h ago
I need to make some updates to the polars functionality, I just completed some extensive refactorings of the Lowcode UI focussed on pandas, time to clean that up for polars too.
Also the python codegen for polars is non-idiomatic with multiple re-assignments to a dataframe, vs one big select block. I have some ideas for how to fix that, but they'll take time.
https://marimo.io/p/@paddy-mullen/notebook-sctuj8