We've released Datatune (https://github.com/vitalops/datatune) as a tool to let users connect entire user Data with LLMs and Agents, to help users access their entire data with just a prompt.
While building Agents for a customer who had large amounts of data, we saw that their Agent struggled with certain data transformation tasks, which would have performed better if LLMs had access to the full user data as well. We built Datatune as a first step, to solve this issue.
Datatune supports:
- Diverse data backends such as Databases, DataFrames, etc.
- Closed Source and Open source LLMs from a wide variety of providers
- Batch Processing of data to pass to LLMs + distributed computing using dask, for faster and efficient transformations, while also helping reduce cost and context length limit issues.
- First order primitive data engineering operations such as Map, Filter, etc.
- Chain Multiple transformations together.
- Simplify user tasks with complex chained transformations using an Internal data engineering agent as a super orchestrator to split user prompt into sub prompts for the respective Map, Filter (primitive Agents), or code generating agents.
Next Steps:
- Build an Embedding Layer to work in parallel with LLMs & Agents
- Use Embedding Layer to build Semantic Deduplication, Tabular Querying, etc
Github : https://github.com/vitalops/datatune