frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Typedframes – Pandas/polars column name checking at lint time

https://github.com/w-martin/typedframes
2•w-martin•1h ago

Comments

w-martin•1h ago
I built a static type checker (both a standalone Rust binary and a mypy plugin) to catch dataframe schema errors before they hit production. Here is why I built it, the gap in current tooling, and how it works. For code examples, skip to the end.

I've been working in the data science (DS) space for nearly 10 years, and weakly typed column references have been a pet peeve for most of that time. One character off and pandas raises a KeyError at runtime; you find out in production on an edge case.

The way most teams handle the DS to production pipeline is: either a technical DS deploys it themselves, or they throw the notebook over the fence to a machine learning engineer (MLE). SageMaker and Vertex AI made the former common. An MLE's job is often to rewrite it entirely: strip code smells, write tests against fake data, and catch schema issues. Sculley et al.'s 2015 NeurIPS paper on ML technical debt documented how badly this debt accumulates; AWS and Google platforms actively discourage the rewrite that addresses it, because it creates friction that doesn't fit a scientist's workflow.

My team put together a shared setup around this. We pair with DS, write acceptance tests, well-factored code, and things get caught. It works, but imperfectly. Scientists find it unnatural, and acceptance tests give feedback only when someone actually runs them or extends them to new logical branches. A code review takes hours to days. Things still get missed.

Most ML code is Python. Type checkers are the things that actually reduce runtime errors, but historically we were limited to mypy: strict, but slow. Recently, rust-based tools (ty, pyrefly) have popped up, running sub-second. For human workflows, IDEs run language servers that scan continuously; for agentic workflows, the same checker wired into a pre-commit hook means code must pass before the human is involved.

I was previously tentative about enforcing strong type checking on scientists' code, as I've observed it slowing workflows. However, the proportion of LLM-drafted code has shifted considerably, and the guardrails that worked when humans wrote every line are no longer adequate. LLMs replicate antipatterns. Copilot and human reviews of column mismatches are not deterministic. Thus, we've ramped up CICD linting and type checking rules (ruff, bandit, complexipy, ty, pyrefly) and love the results.

Unfortunately, it hasn't helped with dataframes. Type checkers don't test dataframe contracts in pandas or polars. I raised issues in the ty tracker (#2551) and pyrefly tracker (#2805). Both teams were interested but neither has near-term plans. This led me to develop typedframes.

typedframes works in two modes. The simpler one needs no annotation:

df = pd.read_csv("orders.csv", usecols=["order_id", "customer_name", "total"])

result = df["custmer_name"] # error: did you mean 'customer_name'?

No retrofitting, no schema to write, though inference is narrower. For LLM generated code I want harder edges. The stronger mode uses explicit annotations:

def process(df: Annotated[pd.DataFrame, OrderSchema]) -> pd.Series:

    return df["custmer_name"]  # error: did you mean 'customer_name'?
The schema encodes hard expectations against every subscript, closer in spirit to a DTO than to a runtime validator, and what I want LLMs writing against.

typedframes is available on pypi. There is a standalone rust checker running sub-second, as well as a mypy plugin. I would rather ty or pyrefly built this natively; I am not a type system author and the implementation has rough edges. However, this is a proof of concept demonstrating that the gap is real and closeable.

rfgplk•1h ago
brilliant

We are in the golden age of Open Source

https://kerkour.com/open-source-golden-age-ai
1•worik•2m ago•0 comments

MySQL 9.7.0 LTS Is Now Available

https://blogs.oracle.com/mysql/mysql-9-7-0-lts-is-now-available-expanded-community-capabilities-a...
1•ksec•2m ago•0 comments

Show HN: Aegis – post-quantum cyberdefense proxy (471 attacks, 0 breaches)

https://github.com/conchaestradamiguelangel-droid/aegis
1•conchaestrada•5m ago•0 comments

They are looting your life savings

https://social.bau-ha.us/@raganwald/116705256401454865
9•ColinWright•8m ago•3 comments

They Already Need a Bailout

https://www.youtube.com/watch?v=QAn_39-qu6I
2•tcp_handshaker•10m ago•0 comments

The mysterious database that provides clues to China's foreign surveillance

https://www.smh.com.au/world/asia/the-mysterious-database-that-provides-clues-to-china-s-foreign-...
1•cwwc•12m ago•0 comments

No More Hidden Changes: How MySQL 9.6 Transforms Foreign Key Management

https://blogs.oracle.com/mysql/no-more-hidden-changes-how-mysql-9-6-transforms-foreign-key-manage...
1•ksec•13m ago•0 comments

The Dictionary of Obscure Sorrows

https://www.thedictionaryofobscuresorrows.com
2•mhb•14m ago•0 comments

Add a Little Something to the CSS

https://codeberg.org/gedankenstuecke/pages-source/commit/57f7df832d45eb847d1a0af3cca2f3ab81585a2c
1•ColinWright•15m ago•0 comments

Ask HN: How to get my contact info off US political party's list

1•kaycebasques•15m ago•0 comments

An engine-run runtime environment for data sovereignty

https://www.trinitymonolith.io/
1•rahkyt•17m ago•0 comments

Ukrainian Drone Strikes Target Russian Military Facilities in St. Petersburg

https://www.wsj.com/world/russia/mass-ukrainian-drone-strikes-target-russian-military-facilities-...
2•JumpCrisscross•17m ago•0 comments

Database as a Graph for Relational Deep Learning

https://neovintage.org/posts/relational-deep-learning/
1•neovintage•19m ago•0 comments

Programmers Aren't People

https://elliotbonneville.com/programmers-arent-people/
2•elliotbnvl•19m ago•0 comments

Gothic 1 Remake

https://store.steampowered.com/app/1297900/Gothic_1_Remake/
1•doener•20m ago•0 comments

Alley Cat (IBM, 1984)

https://www.playdosgames.com/online/alley-cat/
1•reconnecting•23m ago•0 comments

2026 Methods for Free Compute and AI Credits

https://www.dropbox.com/scl/fi/bvi5v0i94ifnk3mfstewq/SAIRC-Free-Compute.pdf?dl=0&e=1&noscript=1&r...
1•imranmk•27m ago•0 comments

Decoupled RISC-LLM Architectures via Circadian Synaptic Consolidation

https://aermia.com/u/NancySadkov/p/research-proposal-decoupled-risc-llm-architectures-via-circadi...
1•NancySadkov•29m ago•0 comments

AI could drive advances that solve problems it brings, scientist suggests

https://www.rnz.co.nz/news/science-and-technology/597458/ai-could-drive-advances-that-solve-the-p...
3•billybuckwheat•33m ago•0 comments

Why Robotics Is a Pre-Paradigm Field

https://whattotelltherobot.com/p/why-robotics-is-a-pre-paradigm-field
2•stefie10•34m ago•0 comments

NEOM issues temporary work stoppage on The Line until at least 2030

https://www.archpaper.com/2026/06/neom-temporary-work-pause-the-line/
2•JumpinJack_Cash•35m ago•0 comments

The C++ Documentary Won't Show You a Number. I Will

https://hftuniversity.com/post/the-c-documentary-won-t-show-you-a-number-i-will
3•canyp•39m ago•1 comments

Wasting China's solar panel surplus is madness

https://www.ft.com/content/b6cac184-75a4-47ab-94c5-5eb8c92cd407
4•mmarian•40m ago•3 comments

Criticizing the Everything Machine

https://pluralistic.net/2026/06/06/applied-counterescatology/
1•hn_acker•43m ago•0 comments

Refining Humanity

https://pluralistic.net/2026/06/05/defining-humanity/
1•hn_acker•43m ago•0 comments

Show HN: Dap-mux – Connect your editor and REPL to the same debug session

1•YesJustWolf•44m ago•0 comments

DOGE plan would have marked 2.7M living people as dead: Whistleblower

https://thehill.com/homenews/nexstar_media_wire/5912841-doge-plan-would-have-marked-2-7m-living-p...
6•hn_acker•45m ago•0 comments

William Gass and John Hawkes (1971)

https://www.92ny.org/archives/william-gass-and-john-hawkes
1•ofalkaed•46m ago•0 comments

Useful Robots (1968) [video]

https://www.youtube.com/watch?v=cEbSaWNs9pY
2•megamike•48m ago•0 comments

Show HN: PriceHound.app – Price tracking for $1/mo instead of selling your data

3•Brian_Fitz•54m ago•0 comments