We’ve released Agents as part of Datatune:
https://github.com/vitalops/datatune
In a single prompt, you can define multiple tasks for data transformations, and Datatune performs the transformations on your data at a per-row level, with contextual understanding.
Example prompt:
"Extract categories from the product description and name. Keep only electronics products. Add a column called ProfitMargin = (Total Profit / Revenue) * 100"
Datatune interprets the prompt and applies the right operation (map, filter, or an LLM-powered agent pipeline) on your data using OpenAI, Azure, Ollama, or other LLMs via LiteLLM.
Key Features
- Row-level map() and filter() operations using natural language
- Agent interface for auto-generating multi-step transformations
- Built-in support for Dask DataFrames (for scalability)
- Works with multiple LLM backends (OpenAI, Azure, Ollama, etc.)
- Compatible with LiteLLM for flexibility across providers
- Auto-token batching, metadata tracking, and smart pipeline composition
Token & Cost Optimization
- Datatune gives you explicit control over which columns are sent to the LLM, reducing token usage and API cost:
- Use input_fields to send only relevant columns
- Automatically handles batching and metadata internally
- Supports setting tokens-per-minute and requests-per-minute limits
- Defaults to known model limits (e.g., GPT-3.5) if not specified
- This makes it possible to run LLM-based transformations over large datasets without incurring runaway costs.