Why We're Building Stategraph: Terraform State as a Distributed Systems Problem

https://stategraph.dev/blog/why-stategraph/

11•lawnchair•1h ago

Comments

eschatology•37m ago

Hmm

I don’t see the state file as a complete downside. It is very simple and very easy to understand. It makes it easy to tell or predict what terraform will do given the current state and desired state.

Its simpleness makes troubleshooting easier: the state files are easy to read and manipulate or repair in the event of a drift, mismatch, or botched provider update.

With the solution proposed it feels like the state becomes a black box I shouldn’t put my hands in. I wonder how the troubleshooting scenarios change with it.

Personally, I haven’t ran into the scaling issue described; at any given time there is usually only one entity working with the state file. We do use terragrunt for larger systems but it is manageable. ~1000 engineer org.

pst•32m ago

This is awesome. Having a single state for all resources in an environment is critical for keeping all the moving pieces in check and a core design aspect of Kubestack. But the growing state files quickly become a bottleneck. I'm definitely giving this a good test drive. Very excited.

sausagefeet•10m ago

Thank you, that is great to hear! We're pushing pretty hard to get a pre-alpha out to get some foundations testable by the community.

tuananh•24m ago

can it be a sqlite db in s3 with locking implemented with s3?

sausagefeet•14m ago

Hello, Stategraph developer here, the answer is: probably not. That doesn't resolve the core issue of state being managed as a big blob.

sausagefeet•15m ago

Hey! One of the Stategraph developers here and can answer any questions. The major motivation is just how small scale Terraform/Tofu start to breakdown and creates work for users when they have to refactor for performance issues that shouldn't exist. So we want a drop in solution that just dissolves those issues without the user having to do anything.

giveita•10m ago

Not an expert, but doesn't microservices help with this. Each microservice has its own YAMLesque resource descriptor (TF, cloudformation, whatever) and is managed independently. My team can add a SQS or S3 without locking your team.

I might be wrong regarding more sophisticated infra though.

sausagefeet•4m ago

Not necessarily. The guidance is to split your TF code across multiple states which might feel like it make sense but for your microservices to communicate that beed to share some base infrastructure, such as networking, so where does that live? Putting dependencies in their own state means that you lose the ability to understand how changing them impacts all of your infrastructure because you have this information black hole at the boundary of their state.

With Stategraph, you'll get all the benefits and isolation of separate state files, but when you changed resources, you'll get meaningful plans around all of the infrastructure they impact, not just the statically defined boundaries of a state file.

lawnchair•4m ago

Author here. You are right that splitting by microservice reduces overlap. The problem is shared resources never go away such as VPCs IAM or databases so contention shows up there.

Splitting state files is the common workaround but that only creates new problems like cross state dependencies and orchestration glue. The real issue is the storage model which is a single JSON blob with a global lock. Treating state as a graph with proper concurrency control avoids contention while keeping a cohesive view of infrastructure.

arccy•5m ago

so kind of like crossplane where each resource is managed individually?

How to resolve common compatibility issues with ODF files

Determination of the fifth Busy Beaver value

Erlang OTP 28.1 Released

Scanoss GitHub Actions Adds Dependency Track Integration

1969: Computer Banking and the End of Cash – BBC Archive [video]

The Fink Project

PureVPN IPv6 Leak

Bob is always on lunch break

EU Chat Control: Germany's position has been reverted to UNDECIDED

Ei Embedded Inverser: automatic inversion of expressions at the structural level

Popular Windows 11 hardware requirement bypass tool Flyoobe gets new initial

New AI deal could rapidly boost UK economy, says Microsoft boss

Oh no, not again a meditation on NPM supply chain attacks

Lapsoss: Vendor-neutral error tracking for Rails apps

LLMs and Beyond: All Roads Lead to Latent Space

A better future for JavaScript that won't happen

Show HN: I Open-Sourced the Future of Search

Alibaba's New AI Chip Unveiled: Key Specifications Comparable to H20

How to Review Code That Deals with Money

New URL shorter than more professional

How Yichao "Peak" Ji became a global AI app hitmaker

Jcd: Rust-based CLI tool for enhanced directory navigatio

Mysterious changes near Earth's core revealed by satellites in space

Are we heading for a future of superintelligent AI mathematicians?

An Intentional Mistake: The Anatomy of Google's Wi-Fi Sniffing Debacle

Cache Me If You Can (2017)

Show HN: Locovote – An open-source dashboard for exploring local government data

What Makes System Calls Expensive: A Linux Internals Deep Dive

Indian Vulture Crisis

Show HN: TrueType Rasterizer