AI support agents live and die by the data they have access to and the way they finesse the customer experience. Basic automations worked well in the past, but now people expect more.
Here are some semi-technical notes:
--- 1) Controlled data syncing ---
We wanted tight integrations with existing helpdesks, but also needed to give users flexibility over what the agent actually knows. Building integrations that sync data while still being tweakable turned out to be harder than expected.
People want one source of truth, right up until they need an exception. We built a sync policy system that lets us manage sync and overwrite behaviors for external data.
The data itself is synced via Airbyte’s pyairbyte package. I ended up working with the pyairbyte team on some performance issues around memory leaks with certain types of pagination on large datasets.
--- 2) Retrieval, escalation, and conversation behavior ---
Users hate when they have to fight with an AI agent, especially when they know that the AI agent can’t help. We wanted our AI agent to search well, but also know when it was time to gracefully bow out.
At first it was too eager to give up. We didn’t want hallucinations, so it would just fail with “I don’t know.” The experience sucked.
We found our retrieval pipeline was subpar. We built evals and found some quick wins, but the agent was still too eager to escalate. We considered fine-tuning, but first built more evals. In the end, better retrieval, tools, and prompts were enough.
We also added an action engine that lets users define custom actions. It uses a two-tiered approach: a pre-qualifying model decides whether an action is appropriate, and only then are tools invoked. Too many tools up front made the agent worse, so this tiered staging helped a lot.
--- 3) Analysis of historical ticket data ---
We built a pipeline that analyzes historical ticket data, creates derivatives, and evaluates the agent’s performance against them. It then finds gaps in your agent’s knowledge that you can fill with AI-generated answers from ticket history. Doing this safely, reliably, and at scale has been a challenge.
We use Dagster to run these jobs, but are looking at moving everything to DBOS (which we use in other products and love). Ticket ingestion relies on Airbyte connectors running via pyairbyte. Some integrations use Airbyte’s connectors, others are our own built on their declarative schema.
We use JinaAI for embeddings, reduce dimensionality with UMAP, and currently cluster in-memory with FAISS (though we plan to move away from FAISS). The pipeline runs a small optimization step to tune UMAP and clustering parameters.
This system is powerful but still a wip. We’ve got a new version coming, but it isn’t quite ready for primetime yet.
--- 4) Taste vs evals ---
You hear a lot about taste in AI. We believe taste is incredibly important. Evals helped us catch obvious failures, but nothing can replace taste.
We worked closely with our pre-release customers to develop it. We have so, so many external VIP Slack channels open. I’m really thankful for their input.
--- 5) Compliance ---
Compliance is the C-word for small tech companies, and we jumped straight into SOC2 Type II. Because we’re a multi-product company, scoping everything appropriately was tricky.
We’re using Vanta to manage most of this, but there’s still a lot of manual grunt work that isn’t automated. I know several folks are trying to simplify this for startups… good luck to them.
---
I’m really proud of what our team has accomplished, and I’m excited to share it. Happy to answer any questions, technical or otherwise. You can sign up and try it directly on the site.