Show HN: Incorporator, Turn any API/File into typed Python graph with pipeline

https://github.com/PyPlumber/Incorporator/

1•PyPlumber•1h ago

When landing data I prefer to keep it as close to the original source as possible. Most of the Python data ingestion programs I saw treated Python more like SQL instead of harnessing object orientation. This was my attempt at translating my object-orented columnar approach to Python. I originally did it with Requests and Pandas but the overhead costs did not seem worth it. Claude helped refactor for async and Pydantic.

Now HTTPX’s async capabilities and Pydantic’s class building took this project over the top. By harnessing their abilities I shifted the codebase from data mapper to pipeline orchestrator. I added every format I could that seemed to have an established Python library. Right now I believe I support : JSON, NDJSON, XML, CSV, TSV, PSV, SQLite, and HTML out of the box. Optional extras (~30 MB pyarrow) unlock Parquet, Feather, ORC; Avro and XLSX have their own extras. I also added every compression I could find. Benchmarks at least for a windows machine are on par with other elt packages.

By focusing on function wrappers to make the developer’s syntax as easy as possible for the original data mapping calls, I established simple automated pipelines with one cli command and one JSON reference file. The JSON is basically the same syntax you would use in Python.

Both stream and fjord accept inflow and outflow Python code. Inflow code allows you to set custom conversion functions and mappings for the incoming data. The outflow code allows you to manipulate the exporting data into a new object new entirely.

Also, because your pipeline is basically created by a JSON file. You should eventually be able to automate the creation of the entire pipeline. Enjoy.

https://github.com/PyPlumber/Incorporator/

How you use it: Declare a subclass with no fields, point it at a URL, and it infers a Pydantic model from the response at runtime — with full strict typing, dot-notation, and an optional registry lookup by any key. class Launch(Incorporator): pass launches = await Launch.incorp( inc_url="https://ll.thespacedevs.com/2.2.0/launch/upcoming/" )

These functions handle the rest of your data mapping and export format needs: - test() lets the framework write the call kwargs for you - refresh() re-fetches with the seed call's params auto-replayed - export() serialises to any of the 13 formats

Then these functions create a pipeline. - stream() runs a chunked daemon with bounded memory. Can be used in two modes: pass-through or stateful (in RAM) updates to be manipulated in real-time. - fjord() fans out N sources and fuses them through a user reducer. This accepts multiple sources and exports.

After that all works copy the parameters into pipeline.json and the command can be as simple: incorporator validate pipeline.json incorporator fjord pipeline.json –logs

Comments

PyPlumber•10m ago

The docs and examples folders have: Tutorials 1-7 with matching code files. Should be a nice progression on using the tool.

Appendices have more advanced examples. There's a fantasy racing league example with 6-api calls & 1-file source with 3 outflows all in the form of an automated fjord pipeline cli call.

Claude FM

Judge Bars Kars4Kids from Broadcasting 'Misleading' Ads in California

Hotel check-in system exposed 1M passports and driver's licenses

Ask an Astronaut: 333 hours of Q&A footage with astronauts

Obsidian Radar Plugin

The founder's playbook: Building an AI-native startup

Fedora Hummingbird: Taking the Hummingbird model to the full operating system

Google's Gemini Omni video model surfaces ahead of I/O debut

Analysis of 70 years of "Eurovision" lyrics

WinCE64 – Windows CE 2.11 for N64

The Magic of Meetups

OpenAI Models in OpenClaw, Done Right

Show HN: Claude Code vs. Codex Global Usage Leaderboard

Stop state surveillance in Canada. Stop Bill C-22

Learning, Fast and Slow: Towards LLMs That Adapt Continually

A New Kind of Family-Separation Crisis

I Was Drowning Running 14 Markets Alone. So I Built a $0.41/Day AI Employee

State media control influences large language models

What I've Been Reading

THORChain exploit hits Bitcoin, Ethereum, and BSC: Hackers steal over $10M

Tell HN: Audible app used 19.8GB of data while not being used

Show HN: X open sourced their algorithm

Does Trump Mobile know how many stripes are on the American flag?

Show HN: Emergence World: World building as a way to evaluate LLMs

ABC News has taken all FiveThirtyEight articles offline

Dual Intel Arc Pro B60(48G) Inference, Virtualization, and Gaming Testing

Ask HN: Reviewing Plan from Plan Mode

Cvl: A C++26 library for mutating consteval state

Don't Design Your Emails (2016)

PSVL 1.0 – The most comprehensive source-visible license (276 clauses)