Embarrassingly parallel problems are conceptually simple but can be operationally annoying. You want to run the same computation across many inputs, but end up spending more time on infrastructure than the actual problem.
Here's a demo reprojecting 3,000 satellite images with GDAL across 100 EC2 instances in 5 minutes. The interesting part isn't the satellite imagery—it's that there's no Kubernetes YAML or Terraform. Just `--map-over-file` and Coiled handles the distribution without the scheduler overhead (no Dask).
This works for any embarrassingly parallel job: running bash scripts N times, multi-node GPU training, stress-testing APIs, etc. The pattern is always the same: you have a function that works on one input, and you want to apply it to many inputs in parallel.
Compare this to setting up AWS Batch or similar, where you'd typically need to handle job queues, compute environments, IAM roles, and container orchestration just to run a simple parallel workload.
scj13•6h ago
Here's a demo reprojecting 3,000 satellite images with GDAL across 100 EC2 instances in 5 minutes. The interesting part isn't the satellite imagery—it's that there's no Kubernetes YAML or Terraform. Just `--map-over-file` and Coiled handles the distribution without the scheduler overhead (no Dask).
This works for any embarrassingly parallel job: running bash scripts N times, multi-node GPU training, stress-testing APIs, etc. The pattern is always the same: you have a function that works on one input, and you want to apply it to many inputs in parallel.
Compare this to setting up AWS Batch or similar, where you'd typically need to handle job queues, compute environments, IAM roles, and container orchestration just to run a simple parallel workload.
Demo video: https://youtu.be/m3d2I6-EkEQ