Three things I've tried so far, none of which are working well:
1. I used a Conductor setup script that runs my local dev setup in each worktree. This didn't work because of port collisions between docker containers.
2. I'm using terraform, so it was trivial to spin up a copy of my staging infra (with fewer resources) for every PR. This let each claude session in Conductor use Playright to test it's code. Two problems: first, this is pretty expensive ($2-5/per day/per pr). I'm pushing 20-30 prs a day, so this was costing me $XXX/month even with automated cleanups. Second, my deploy takes about 10-15 minutes, which isn't that long, but claude would often need to be re-prompted to check on the deployed changes.
3. For new features, I just had Claude yolo code to staging or prod behind feature flags. This caused regressions and requires that Claude have access to privileged data for testing, so not a great solution.
I'm thinking that something like local VMs tied to each worktree could make sense, but wanted to check if I'm just oblivious to an existing solution before diving into that.