In this (telecom) benchmark you can review agent policies and manuals here: 1) https://github.com/sierra-research/tau2-bench/blob/main/data... 2) https://github.com/sierra-research/tau2-bench/blob/main/data...
Of course these are just parts of the prompt, you can inspect benchamark code to see how these are rendered to actual LLM calls.
In case someone is not familiar with framework methodology I've wrote a separate article covering that (with some of my thoughts) -> https://quesma.com/blog/tau2-from-llm-benchmark-to-blueprint...
1. Structure & Flow
- Decision Trees: Clear branching logic with ├── and └── notation
- Sequential Steps: Numbered, ordered procedures instead of scattered explanations
- Prerequisites: Explicit dependency checks before proceeding
2. AI Agent Optimizations - Tool Call Clarity: Exact function names and parameters
- Binary Decisions: Clear yes/no conditions instead of ambiguous language
- Error Handling: Specific failure conditions and next steps
- Verification Steps: "Recheck" instructions after each fix
3. Cognitive Load Reduction - Reference Tables: Quick lookup for tools and purposes
- Pattern Recognition: Common issue combinations and their solutions
- Critical Reminders: Common AI mistakes section to prevent errors
4. Actionable Language - Removed verbose explanations mixed with instructions
- Consolidated multiple documents' logic into single workflows
- Used imperative commands: "Check X", "If Y then Z"
- Added immediate verification steps
Into the trash it goes.
barrkel•1h ago