I might be wrong regarding more sophisticated infra though.
With Stategraph, you'll get all the benefits and isolation of separate state files, but when you changed resources, you'll get meaningful plans around all of the infrastructure they impact, not just the statically defined boundaries of a state file.
Splitting state files is the common workaround but that only creates new problems like cross state dependencies and orchestration glue. The real issue is the storage model which is a single JSON blob with a global lock. Treating state as a graph with proper concurrency control avoids contention while keeping a cohesive view of infrastructure.
eschatology•37m ago
I don’t see the state file as a complete downside. It is very simple and very easy to understand. It makes it easy to tell or predict what terraform will do given the current state and desired state.
Its simpleness makes troubleshooting easier: the state files are easy to read and manipulate or repair in the event of a drift, mismatch, or botched provider update.
With the solution proposed it feels like the state becomes a black box I shouldn’t put my hands in. I wonder how the troubleshooting scenarios change with it.
Personally, I haven’t ran into the scaling issue described; at any given time there is usually only one entity working with the state file. We do use terragrunt for larger systems but it is manageable. ~1000 engineer org.