Sounds great in theory, until you realize everyone has a different definition of outcome.
Take for instance, customer support Agent , that is supposed to resolve tickets. Assuming it resolves around 30% tickets by an objective measure. Do you think that cannot be captured and agreed upon by both sides?
In many cases though, you don’t know whether the outcome is correct or not but we just have evals for that.
Our product is a SOTA recall-first web search for complex queries. For example, let’s say your agent needs to find all instances of product launches in the past week.
“Classic” web search would return top results while ours return a full dataset where each row is a unique product (with citations to web pages)
We charge a flat fee per record. So, if we found 100 records, you pay us for 100. Of its 0 then it’s free.
I believe it was under British rule, they offered a reward for people bringing in dead cobras as proof of culling. Which worked until people started breeding them just to get the reward. Humans gamed the system and it made the problem worse.
alberth•40m ago
And how do you programmatically measure it?
nerdjon•31m ago
\s (mostly because you know this will be the "Solution" that many will just run with despite the very real issue of how "persuadable" these systems are)...
The real answer is that even that will fail and there will have to be a feedback loop with a human that will likely in many cases lead to more churn trying to fix the work the AI did vs if the human just did it in the first place.
Instead of focusing on the places that using an AI tool can truly cut down on time spent like searching for something (which can still fail but at least the risk when a failure is far lower vs producing output).
malux85•28m ago
But for anything that’s not this trivial example, the person who knows the value most accurately is … the customer! Who is also the person who is paying the bill, so there’s strong financial incentive for them not to reveal this info to you.
I don’t think this will work …
rajvarkala•21m ago
rajvarkala•11m ago
I'd assume an outcome is a negotiated agreement between buyer and Agent provider.
Think of all the n8n workflows. If we take a simple example of Expense receipt processing workflows, or a lead sourcing workflow, I'd think the outcomes can be counted pretty well. In these cases, successfully entered receipts into ERP or number of Entries captured in salesforce.
I am sure there are cases where outcomes are fuzzy, for instances employer-employee agreement.
But in some cases, for instance, my accounting agent would only get paid if he successfully uploads my tax returns.
Surely not applicable in all cases. But, in cases Where a human is measured on outcomes, the same should be applicable for agents too, I guess