FauxSpark is a discrete event simulator built with SimPy whose objective is to model the internals of Apache Spark. It lets you experiment with Apache Spark workloads under various cluster configurations without spinning up a real cluster – perfect for testing failures, different job schedules, or capacity planning to observe the impact it has on your workload.
The current version includes:
- DAG scheduling with stages, tasks, and dependencies (but perhaps, designing around "RDD" would've been the right call)
- Modeling the input, output, shuffle partition sizes as probability distributions.
- Automatic retries on executor or shuffle-fetch failures
- Single-job execution
- Simple CLI to tweak cluster configuration, simulate failures, and scaling up executors
This tool might be relevant to the following folks:
- Data & Infrastructure engineers running Apache Spark who want to experiment with cluster configurations
- Anyone curious about Spark internals
I'd appreciate feedback from anyone with experience in discrete event simulation, particularly regarding the planned features as well as from anyone who may find this useful to shape its development.
A walkthrough section in the README demonstrates how it can be used.
dadbod•3h ago
The current version includes:
- DAG scheduling with stages, tasks, and dependencies (but perhaps, designing around "RDD" would've been the right call)
- Modeling the input, output, shuffle partition sizes as probability distributions.
- Automatic retries on executor or shuffle-fetch failures
- Single-job execution
- Simple CLI to tweak cluster configuration, simulate failures, and scaling up executors
This tool might be relevant to the following folks:
- Data & Infrastructure engineers running Apache Spark who want to experiment with cluster configurations
- Anyone curious about Spark internals
I'd appreciate feedback from anyone with experience in discrete event simulation, particularly regarding the planned features as well as from anyone who may find this useful to shape its development.
A walkthrough section in the README demonstrates how it can be used.
GH repo https://github.com/fhalde/fauxspark