This is interesting — feels like a lot of effort is going into improving visibility for AI systems (logs, traces, evaluations), but debugging still feels fundamentally different from traditional systems.
In many cases, even if you can see what happened, it’s hard to reproduce the exact execution — especially when prompts, external tools, or timing are involved.
So you end up analyzing behavior rather than actually stepping through the same failure.
Curious how you think about reproducibility vs observability for AI debugging?
shashisrun•1h ago
In many cases, even if you can see what happened, it’s hard to reproduce the exact execution — especially when prompts, external tools, or timing are involved.
So you end up analyzing behavior rather than actually stepping through the same failure.
Curious how you think about reproducibility vs observability for AI debugging?