As a data scientist, you spend the majority of your time wrangling data. Even though you might have a set of techniques and tricks you like to use, how exactly you treat a particular source of data tends to be fairly bespoke, so you end up writing custom logic each time.
Ragnerock was born from the observation that modern LLMs can be used to automate a lot of the grunt work involved in this process, while still allowing for fully customizable pipelines. What’s more, by leveraging techniques like constrained decoding, it’s possible to provide a unified query interface regardless of the data source - bridging raw data sources like text and images with your existing structured data living in your databases.
Ragnerock has four main components:
- A workflow designer that lets you build LLM-driven data processing and analysis pipelines
- A job orchestration layer that runs those workflows
- A query interface which lets you inspect the results of those workflows with plain SQL
- A notebook system which is 100% API-compatible with Jupyter and runs on your existing kernels, so you can easily pull data into your existing environments and analyses
Ragnerock also supports bring-your-own AI (OpenAI, Anthropic, and Google APIs), databases, and blob storage, so you can join with your existing datasets and have all outputs flow to your data lake. We’re particularly excited about our web crawling feature, which allows you to scrape websites and trigger workflows on updates: for example, you might point Ragnerock at your favorite blog and run a workflow to assess posts for topics and sentiment.
You can try it out at https://www.ragnerock.com ; no credit card needed and the first 20 hours of compute are free. It’s an early-stage product so we’re especially interested in feedback.
Happy to answer any questions - John and I will be around in the comments today.
nitshorey•24m ago
mmahowald27•6m ago