Here’s a quick walkthrough: https://www.youtube.com/watch?v=ZosDytyf1fg
As agents move into real workflows, they need access to more tools (e.g. Slack, Salesforce, Linear). That means a ton of API plumbing: authentication, pagination, filters, handling schema, and matching entities across systems.
Most MCPs don’t fix this. They’re thin wrappers over APIs, so agents inherit their weak primitives and still get it wrong most of the time, especially when working across tools.
An even deeper issue is that APIs assume you already know what to query (think endpoints, Object IDs, fields), whereas agents usually start one step earlier: they need first to discover what matters before they can even start reasoning.
So we built Airbyte Agents to be a context layer between your Agents and all of your data. The core of this is something we call Context Store: a data index optimized for agentic search, populated by our replication connectors. All that work on data connectors the last six years comes in handy here!
This gives agents a structured way to discover data, while still allowing them to read and write directly to the upstream system when needed.
What got us working on this was an insane trace from an agent we were migrating to our new SDK. It was supposed to answer "which customers are at risk of leaving this quarter?" The trace had 47 steps. Most were API calls. The agent first had to find a bunch of accounts, then map them to the right customers, then look for tickets, bla bla... and when the Agent finally responded, the answer sounded ok, but was wrong. Not only that, it was excruciatingly slow. So we had to do something about it.
That 47-step agent is one example of a question where Airbyte Agents does particularly well. Other examples: - “Show me all enterprise deals closing this month with open support tickets." - “Find every support ticket that doesn’t have a Github issue opened”
Some of these might sound simple, but the quality of the answer changes dramatically when the agent doesn’t have to assemble all that context at runtime.
Once we had an early version of the product, I spent a weekend building a benchmark harness to see if it worked. Also for fun, I like writing benchmarks :). I compared calling the Airbyte Agent MCP vs calling a bunch of vendor MCPs directly. I tested retrieval, and search.
For the sake of simplicity, I used token consumption as a unit of measure. I think that’s a good proxy for how well agents are working. A failing agent (like the one that took 47 steps), will churn through lots of tokens while getting nowhere, while a successful one will get straight to the point.
Here's what I found when measuring: for Gong, it used up to 80% fewer tokens than their own MCP, for Zendesk up to 90% fewer, for Linear up to 75%, and for Salesforce up to 16% (Salesforce’s own SOQL does a good job here).
Of course there is the usual obvious bias: we are the builders of what we are benchmarking. So we made the test harness public: https://github.com/airbytehq/airbyte-agents-benchmarks. Feel free to poke at it, and please tell us what you find if you do!
It's still early and some parts are rough, but we wanted to share this with the community asap. We'd love to hear from people building agents: - Are you indexing data ahead of time, or letting the agent call APIs live? - How are you matching entities across systems?
Would also love to hear any thoughts, comments, or ideas of how we could make this better, and if there are obvious things we’re missing. For now, we’re excited to keep building!
ecares•1h ago
aaronsteers•1h ago
Yes, we've definitely found that some API data models are easier for models to navigate than others.
The largest factors of Agent inefficiency we've identified so far are: 1. Many APIs lack robust-enough search, forcing agents to page through hundreds or thousands of paginated responses until they find the record they are looking for (our Context Store addresses this). 2. Many APIs have HUGE response sets. Our MCP helps handle this by letting the agent decide exactly what fields they can return. 3. With our SDK, you can literally build your own MCP on top of any source we support (50+ right now and will grow). This is super powerful, and allows you to build more ergonomic MCP servers and tools - even if the models themselves are not intuitive or easy for the LLM to leverage directly.
Combining all three of these together, we see the vast majority of challenges can be addressed via a strong system prompt for guidance. Fine tuning could get you further but anyway, you'd still want your fine tuned model to build on this same foundation, since the efficiences will transfer across use cases and models.
@ecares - Does this answer your question? What do you think?