Does this handle the case where there are longer-running activities that have low CPU usage? Couldn't these be canceled during scalein?
Temporal would retry them, but it would make some workflow runs take longer, which could be annoying for some user-interactive workflows.
Otherwise I've seen needing to hit the metrics endpoint to query things like `worker_task_slots_available` to scale up, or query pending activities, pending workflows, etc to scale down per worker.
I originally ran this setup on Temporal Cloud, and pulling detailed worker/queue metrics directly from Cloud can be tricky... you need to expose custom worker metrics yourself, then pipe them into CloudWatch. If you host Temporal yourself, it is easier:)
DoofWarrior•3mo ago
norapap•3mo ago
In the github you can find comments to easily switch to EC2 if your workload needs it
leetrout•2mo ago
We're using prefect not Temporal and each prefect flow launches in a discrete ECS task so the waiting added up.