EDIT: I meant to link to the github, not the website: https://github.com/max-hq/max
Like many of us here, I've been commonly reaching for a pattern of "pull data into db; give it to claude" for a while, whilst doing data spelunking or building tooling - for the same reasons mentioned by thellimist over here [1] and a few other recent "CLI vs MCP" posts.
To that end, about a month ago I started building a project called `max` - its goal is to cut the middleman and schematise any data source for you. Essentially, provide a lingua-franca for synchronising and searching data.
In short: Max exposes a CLI for any given data source, and mirrors it locally. As in, puts that data right next to the agent. It means search is local and fast, and ready for cut, sed, grep, sort etc.
More concretely:
> max connect @max/connector-gmail --name gmail-1 > max sync gmail-1
> # show me what data i can search for > max schema @max/conector-gmail
> # do a search > max search gmail-1 --filter"subject ~= Apples" --fields=subject,from,time
I've built a few connectors over at `max-hq/max-connectors` - but the goal is that they're easy to create (sync is done via graph walk - max makes you provide field resolution so it can figure out how to sync).
In practice - I've found that telling claude to run "max -g llm-bootstrap" to get acquainted, and then "make a connector for X" also works pretty well :).
There's a lot still to come(!) - realtime, somewhere to host connectors, exposing and serving max nodes... I'll be updating the roadmap over the next couple of days - but I didn't want to wait any longer before sharing here.
(on that note - max is designed for federation. The core is platform agnostic)
In terms of what this approach makes possible - I ran a benchmark on a challenge (it's the one on the website) asking claude to find me names of a particular form from a fairly chunky hubspot (100k contacts). The metrics are roughly what you'd expect from putting the data local and avoiding any tokens hitting claude's context window:
MCP: 18M tokens | 80m time | $180 cost
Max: 238 tokens | 27s time | $0.003 cost
(I'll explain how these numbers were calculated in a new reply)
It's still early (alpha) but if you're building agents or just want local data, please try it and tell me what breaks.
Thanks!
benvan•5h ago
- Claude (via Hubspot MCP) was paginating over contacts, at 40s per 800 contacts and ~150k tokens (triggering compaction) - full run was 120 of these loops @ 80 minutes and 18M tokens
- Claude + Max was 1 `max search hubspot --filter` command piped to sort | uniq -c - plus 1 `max search gdrive` query matching each of the results of the previous query, piped to sort | uniq -c - The rest of the tokens were spent producing an output from 20 words + 20 numbers
(Both of these calculations ignore cached tokens)