- How are you all identifying performance bottlenecks in agents?
- What types of changes have gotten you the biggest speedups?
For us we vibe-coded a profiler to identify slow LLM calls - sometimes we could then switch out a faster model for that step or we'd realize we could shrink the input tokens by eliminating unnecessary context. For steps requiring external access (browser usage, API calls), we've moved to fast start external containers + thread pools for parallelization. We've also experimented some with UI changes to mask some of the latency.
What other performance enhancing techniques are people using?