I've built FauxSpark. A discrete event simulation of Apache Spark, built with SimPy.
It's designed to let users experiment with and understand the runtime characteristics of Apache Spark workloads under different cluster configurations, failures & job schedules without spawning a real cluster.
In this initial version, FauxSpark implements a simplified version of Apache Spark which includes:
- DAG scheduling with stages, tasks, and dependencies
- Automatic retries of tasks & stages on executor failure
- Stage resubmission on shuffle-fetch failures
- Basic shuffle read (like really simple)
- Runs a single job at a time
- A simple CLI with a few knobs to configure cluster, simulate failures, scale up etc,.
Repo → https://github.com/fhalde/fauxspark
I'd appreciate your feedback and tips from anyone into discrete event simulation (DES).