frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Typedframes – Pandas/polars column name checking at lint time

https://github.com/w-martin/typedframes
3•w-martin•4h ago

Comments

w-martin•4h ago
I built a static type checker (both a standalone Rust binary and a mypy plugin) to catch dataframe schema errors before they hit production. Here is why I built it, the gap in current tooling, and how it works. For code examples, skip to the end.

I've been working in the data science (DS) space for nearly 10 years, and weakly typed column references have been a pet peeve for most of that time. One character off and pandas raises a KeyError at runtime; you find out in production on an edge case.

The way most teams handle the DS to production pipeline is: either a technical DS deploys it themselves, or they throw the notebook over the fence to a machine learning engineer (MLE). SageMaker and Vertex AI made the former common. An MLE's job is often to rewrite it entirely: strip code smells, write tests against fake data, and catch schema issues. Sculley et al.'s 2015 NeurIPS paper on ML technical debt documented how badly this debt accumulates; AWS and Google platforms actively discourage the rewrite that addresses it, because it creates friction that doesn't fit a scientist's workflow.

My team put together a shared setup around this. We pair with DS, write acceptance tests, well-factored code, and things get caught. It works, but imperfectly. Scientists find it unnatural, and acceptance tests give feedback only when someone actually runs them or extends them to new logical branches. A code review takes hours to days. Things still get missed.

Most ML code is Python. Type checkers are the things that actually reduce runtime errors, but historically we were limited to mypy: strict, but slow. Recently, rust-based tools (ty, pyrefly) have popped up, running sub-second. For human workflows, IDEs run language servers that scan continuously; for agentic workflows, the same checker wired into a pre-commit hook means code must pass before the human is involved.

I was previously tentative about enforcing strong type checking on scientists' code, as I've observed it slowing workflows. However, the proportion of LLM-drafted code has shifted considerably, and the guardrails that worked when humans wrote every line are no longer adequate. LLMs replicate antipatterns. Copilot and human reviews of column mismatches are not deterministic. Thus, we've ramped up CICD linting and type checking rules (ruff, bandit, complexipy, ty, pyrefly) and love the results.

Unfortunately, it hasn't helped with dataframes. Type checkers don't test dataframe contracts in pandas or polars. I raised issues in the ty tracker (#2551) and pyrefly tracker (#2805). Both teams were interested but neither has near-term plans. This led me to develop typedframes.

typedframes works in two modes. The simpler one needs no annotation:

df = pd.read_csv("orders.csv", usecols=["order_id", "customer_name", "total"])

result = df["custmer_name"] # error: did you mean 'customer_name'?

No retrofitting, no schema to write, though inference is narrower. For LLM generated code I want harder edges. The stronger mode uses explicit annotations:

def process(df: Annotated[pd.DataFrame, OrderSchema]) -> pd.Series:

    return df["custmer_name"]  # error: did you mean 'customer_name'?
The schema encodes hard expectations against every subscript, closer in spirit to a DTO than to a runtime validator, and what I want LLMs writing against.

typedframes is available on pypi. There is a standalone rust checker running sub-second, as well as a mypy plugin. I would rather ty or pyrefly built this natively; I am not a type system author and the implementation has rough edges. However, this is a proof of concept demonstrating that the gap is real and closeable.

rfgplk•4h ago
brilliant

Show HN: Keybench – Scriptable, extensible performance tool for key value stores

https://github.com/guycipher/keybench
6•alexpadula•1h ago•0 comments

Show HN: Infinite canvas notes in the non-Euclidean Poincaré disk

https://uonr.github.io/poincake/
118•uonr•4d ago•22 comments

Show HN: Ironwall, a safety-first native programming language and compiler

12•bOZbfU4YdRnJQ•1h ago•5 comments

Show HN: VaultSQL – Open-Source Zero-Trust SQL Workbench

https://vaultsql.com/
3•antileet•1h ago•0 comments

Show HN: Aquifer – an MCP runtime for spiky agent tool traffic

https://github.com/rjpruitt16/aquifer
3•rjpruitt16•1h ago•0 comments

Show HN: A beautiful and local-first PDF reader for studying dense things

https://www.tryquincy.live
3•oleksg•2h ago•0 comments

Show HN: Soft Body Jiggle Physics

https://github.com/xloveee/jiggle-physics
51•vesperance•5d ago•19 comments

Show HN: Aegis – post-quantum cyberdefense proxy (471 attacks, 0 breaches)

https://github.com/conchaestradamiguelangel-droid/aegis
3•conchaestrada•3h ago•0 comments

Show HN: Dap-mux – Connect your editor and REPL to the same debug session

6•YesJustWolf•3h ago•1 comments

Show HN: Formally verified polygon intersection – Opus 4.8 oneshots, prev failed

https://github.com/schildep/verified-polygon-intersection
84•permute•2d ago•17 comments

Show HN: Lowfat – pluggable CLI filter that saved 91.8% of my LLM tokens

https://github.com/zdk/lowfat
146•zdkaster•1d ago•73 comments

Show HN: Typedframes – Pandas/polars column name checking at lint time

https://github.com/w-martin/typedframes
3•w-martin•4h ago•2 comments

Show HN: ABC Classic 100 Rankings visualised

https://classic100.gotski.workers.dev/
35•gotski•23h ago•17 comments

Show HN: Resonate – Low-latency, high-resolution spectral analysis

https://alexandrefrancois.org/Resonate/
3•arjf•6h ago•3 comments

Show HN: On-device transcriber that's 97% accurate at identifying speakers

https://mimicscribe.app/
27•marshalla•1d ago•8 comments

Show HN: Ccgs – Collaborative Claude Code sessions, stored in Git branches

https://github.com/ingram-technologies/claude-git-sessions
6•scrollaway•8h ago•2 comments

Show HN: Edsger – A handwritten Clojure REPL for the reMarkable 2

https://handwritten.danieljanus.pl/2026-06-01-edsger.html
258•nathell•4d ago•34 comments

Show HN: I reverse-engineered the world maps of Test Drive III (1990 DOS game)

https://github.com/s-macke/Test-Drive-3-Maps
215•s-macke•6d ago•56 comments

Show HN: Prela – Purely Algebraic Relation Combinators

https://github.com/remysucre/prela
71•remywang•5d ago•13 comments

Show HN: Scale Physics – a physics encyclopedia with WebGL animations

https://scalephysics.com/
3•WizardK•11h ago•0 comments

Show HN: StructOCR – API for parsing global passports, invoices, and containers

https://structocr.com
4•glyph_miner•12h ago•2 comments

Show HN: Uruky (EU-based Kagi alternative) now has Image Search and URL Rewrites

https://uruky.com/?il=en
232•BrunoBernardino•2d ago•224 comments

Show HN: Mercek – A Desktop IDE for AWS ECS

https://www.mercek.dev/
62•utibeumanah•2d ago•29 comments

Show HN: NullRead – A simple HN Android client

https://nullread.0x96f.dev/
4•0x96f•13h ago•1 comments

Show HN: Sub-Agent MCP: LLM delegation and sub-agent orchestration via MCP

https://github.com/stormaref/Sub-Agent-MCP
5•avestura•13h ago•0 comments

Show HN: I used to pin street parking by hand in Google Maps. Now it's an app

https://apps.apple.com/us/app/curbie/id6770876635
5•ayranlahmacun•13h ago•1 comments

Show HN: Eyeball

https://eyeball.rory.codes/
292•mrroryflint•4d ago•88 comments

Show HN: Open-source X Bookmark Manager

3•ssarisen•14h ago•1 comments

Show HN: I ported Xonotic (arena FPS) to WebAssembly with full P2P multiplayer

https://dpgame.xonotic.workers.dev/
10•astlouis44•5h ago•4 comments

Show HN: Omni – Local-first multimodal file search on macOS

https://hanxiao.io/omni/
5•artex_xh•1d ago•2 comments