frontpage.

Show HN: fenic – LLMs as dataframe operators, query meaning and structure

https://github.com/typedef-ai/fenic

2•cpard•1h ago

Hey friends. I'd like to share a project that's dear to me. fenic is a dataframe API with LLMs added as first-class citizens, a classic lazy dataframe API extended with new operators that are backed by LLMs.

What this gets you is the ability to work with structured and unstructured data in the same context. Most importantly, the LLMs aren't integrates as opaque UDF black boxes. They're exposed as "semantic" operators that the planner can reason about alongside the classic ones.

(There are examples and code snippets on the repo to see how everything works together)

Why build this? I'm a data infra / systems person. When LLMs showed up, what I saw was a new type of compute that changes the characteristics of the workloads we deal with. I wanted to experiment with how our current systems can absorb these new workloads and compute types, and what it would take to make the DX as seamless as possible, that's where the UDF + arbitrary prompt was feeling too problematic.

To support this properly, we had to introduce a few really cool things:

New plan operators. You don't just send prompts at an LLM. You use operators like semantic join, semantic map and reduce, and semantic filter, among others. They mix with the classic operators, and because the planner sees them as real operators rather than black boxes, it can reorder work around them.

Typed outputs. There's ergonomics to turn the output of a semantic operator straight into a typed dataframe column. A Pydantic schema for the LLM output becomes a typed struct column you can unnest, explode, and so on.

New data types like a markdown data type. Markdown became an important way to share information with LLMs, even though it started life as a way to format text for presentation. It carries structure, and being able to access that structure the way you would a struct or JSON type adds to the developer experience I mentioned.

Async UDFs. One of the more interesting shifts in workloads from the LLM explosion is the need to put heavily I/O-bound steps in your pipeline: fetching a response from an API, crawling a website, and so on. Async UDFs fill that gap, and the implementation handles the nuances for you: concurrency, retries, and the rest.

An LLM-inference-aware planner and runtime. This is one of the parts I'm most excited about, and there's a lot still to do. Today: identical prompts within a batch collapse to a single model call, so duplicates cost zero tokens; requests are dispatched concurrently under per-provider rpm/tpm limits with retries and backoff; null and empty cells skip the model entirely; and you get token and cost metrics per operator. There's also an optional persistent response cache so re-runs skip the model.

MCP as a new catalog primitive. Much like a registered view, you can register a dataframe pipeline as an MCP tool in the catalog. fenic then serves an MCP server with that pipeline as the tool's logic, executed over your data.

These are just some of what's gone into fenic while experimenting with how LLMs can become part of our compute infrastructure. There's more, and plenty more to polish on what's already there.

I've been using fenic for all sorts of things. On the small/personal end, I use it to take my podcast audio recordings and turn them into nicely structured tables of metadata I can research. On the heavier end, I use it as tooling for agents to analyze agent traces exported from Pydantic Logfire, to discover evals and turn them into reproducible artifacts in the form of dataframe pipelines.

  pip install fenic
  Repo: https://github.com/typedef-ai/fenic
  Docs: https://docs.fenic.ai

There's also a skill you can use with claude code, codex etc. to quickly get started with fenic in your favourite agentic coding environment.

I'd love to hear your thoughts, criticism, and anything else that comes to mind.

I'm here to answer questions.

Show HN: Makes local LLMs faster and more reliable by optimizing for your device

Show HN: Clusy – Cursor for data science notebooks in cloud

Show HN: Shot-scraper video tool for recording YAML-defined webapp feature demos

Show HN: My 13-year-old built an ant colony tracker

Show HN: fenic – LLMs as dataframe operators, query meaning and structure

Show HN: Openleetcode – local LeetCode runner with open test suites

Show HN: Don't ask if devs cheat with AI, test if they're good with it

Show HN: Shoaku – Your Coding Navigator

Show HN: Second opinion – A skill to query different models

Show HN: TraceAIO – open-source LLM visibility tracker

Show HN: NodePad – AI agent on a canvas instead of a linear chat

Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU

Show HN: PDFMergely – In-browser PDF tools that never upload your files

Show HN: Classic Minesweeper

Show HN: ServerKit – A mobile UI for server management

Show HN: Privacy policy generator for AI apps (LLM disclosure, EU AI Act)

Show HN: Agentic Orchestrator, a TUI for long-running coding agents

Show HN: DRM-Free Books

Show HN: Rheo 0.4.0

Show HN: Zanagrams

Show HN: I built an agent that uses email as a file system

Show HN: AMA2, messenger built for AI agent

Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs

Show HN: Vaghenu, a meter aware sloka-to-chant, TTS for Sanskrit

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Show HN: Zenith: sota harness for normal models to beat Fable on FrontierSWE

Show HN: Decomp Academy – Learn to decompile GameCube games into matching C

Show HN: The Three Idempotencies of an Agent

Show HN: Fleet – a local-first console for managing Dockerized Hermes AI Agents

Show HN: DBOSify – Drop-in Temporal replacement built on Postgres

Show HN: fenic – LLMs as dataframe operators, query meaning and structure

Show HN: Makes local LLMs faster and more reliable by optimizing for your device

Show HN: Clusy – Cursor for data science notebooks in cloud

Show HN: Shot-scraper video tool for recording YAML-defined webapp feature demos

Show HN: My 13-year-old built an ant colony tracker

Show HN: fenic – LLMs as dataframe operators, query meaning and structure

Show HN: Openleetcode – local LeetCode runner with open test suites

Show HN: Don't ask if devs cheat with AI, test if they're good with it

Show HN: Shoaku – Your Coding Navigator

Show HN: Second opinion – A skill to query different models

Show HN: TraceAIO – open-source LLM visibility tracker

Show HN: NodePad – AI agent on a canvas instead of a linear chat

Show HN: Running Gemma-4 26B at 124 tokens/SEC on a CPU, no GPU

Show HN: PDFMergely – In-browser PDF tools that never upload your files

Show HN: Classic Minesweeper

Show HN: ServerKit – A mobile UI for server management

Show HN: Privacy policy generator for AI apps (LLM disclosure, EU AI Act)

Show HN: Agentic Orchestrator, a TUI for long-running coding agents

Show HN: DRM-Free Books

Show HN: Rheo 0.4.0

Show HN: Zanagrams

Show HN: I built an agent that uses email as a file system

Show HN: AMA2, messenger built for AI agent

Show HN: Bash4LLM+ – A lightweight, dependency-free Bash wrapper for LLM APIs

Show HN: Vaghenu, a meter aware sloka-to-chant, TTS for Sanskrit

Show HN: NanoEuler – GPT-2 scale model in pure C/CUDA from scratch

Show HN: Zenith: sota harness for normal models to beat Fable on FrontierSWE

Show HN: Decomp Academy – Learn to decompile GameCube games into matching C

Show HN: The Three Idempotencies of an Agent

Show HN: Fleet – a local-first console for managing Dockerized Hermes AI Agents

Show HN: DBOSify – Drop-in Temporal replacement built on Postgres