60–70% of what makes an agent correct is fully deterministic: tool call schemas, execution order, cost budgets, content format. Routing all of this through an LLM judge is expensive, slow, and unnecessarily non-deterministic. Attest exhausts deterministic checks first and only escalates when necessary.
The 8 layers: schema validation → cost/perf constraints → trace structure (tool ordering, loop detection) → content validation → semantic similarity via local ONNX embeddings (no API key) → LLM-as-judge → simulation with fault injection → multi-agent trace tree evaluation.
Example:
from attest import agent, expect
from attest.trace import TraceBuilder
@agent("support-agent")
def support_agent(builder: TraceBuilder, user_message: str):
builder.add_tool_call(name="lookup_user", args={"query": user_message}, result={...})
builder.add_tool_call(name="reset_password", args={"user_id": "U-123"}, result={...})
builder.set_metadata(total_tokens=150, cost_usd=0.005, latency_ms=1200)
return {"message": "Your temporary password is abc123."}
def test_support_agent(attest):
result = support_agent(user_message="Reset my password")
chain = (
expect(result)
.cost_under(0.05)
.tools_called_in_order(["lookup_user", "reset_password"])
.output_contains("temporary password")
.output_similar_to("password has been reset", threshold=0.8)
)
attest.evaluate(chain)
The .output_similar_to() call runs locally via ONNX Runtime — no embeddings API key required. Layers 1–5 are free or near-free. The LLM judge is only invoked for genuinely subjective quality assessment.Architecture: single Go binary engine (1.7ms cold start, <2ms for 100-step trace eval) with thin Python and TypeScript SDKs. All evaluation logic lives in the engine — both SDKs produce identical assertion results. 11 adapters covering OpenAI, Anthropic, Gemini, Ollama, LangChain, Google ADK, LlamaIndex, CrewAI, and OpenTelemetry.
v0.4.0 adds continuous evaluation with σ-based drift detection, a plugin system, result history, and CLI scaffolding. The engine and Python SDK are stable across four releases. The TypeScript SDK is newer — API is stable, hasn't been battle-tested at scale yet.
The simulation runtime is the part I'm most curious about feedback on. You can define persona-driven simulated users (friendly, confused, adversarial), inject faults (latency, errors, rate limits), and run your agent against all of them in a single test suite. Is this useful in practice for CI, or is it a solution looking for a problem?
Apache 2.0 licensed. No platform to self-host, no BSL, no infrastructure requirements.
GitHub: https://github.com/attest-framework/attest Examples: https://github.com/attest-framework/attest-examples Website: https://attest-framework.github.io/attest-website/ Install: pip install attest-ai / npm install @attest-ai/core