frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
101•theblazehen•2d ago•22 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
654•klaussilveira•13h ago•189 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
944•xnx•19h ago•549 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
119•matheusalmeida•2d ago•29 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
38•helloplanets•4d ago•38 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
48•videotopia•4d ago•1 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
228•isitcontent•14h ago•25 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
14•kaonwarb•3d ago•17 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
219•dmpetrov•14h ago•113 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
328•vecti•16h ago•143 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
378•ostacke•19h ago•94 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
487•todsacerdoti•21h ago•241 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•20h ago•181 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
286•eljojo•16h ago•167 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
409•lstoll•20h ago•276 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
21•jesperordrup•4h ago•12 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
87•quibono•4d ago•21 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
59•kmm•5d ago•4 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
4•speckx•3d ago•2 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
31•romes•4d ago•3 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
251•i5heu•16h ago•194 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
15•bikenaga•3d ago•3 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
56•gfortaine•11h ago•23 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1062•cdrnsf•23h ago•444 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
144•SerCe•9h ago•133 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
180•limoce•3d ago•97 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
287•surprisetalk•3d ago•41 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
147•vmatsiiako•18h ago•67 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
72•phreda4•13h ago•14 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
29•gmays•9h ago•12 comments
Open in hackernews

Stategraph: Terraform state as a distributed systems problem

https://stategraph.dev/blog/why-stategraph/
136•lawnchair•4mo ago

Comments

eschatology•4mo ago
Hmm

I don’t see the state file as a complete downside. It is very simple and very easy to understand. It makes it easy to tell or predict what terraform will do given the current state and desired state.

Its simpleness makes troubleshooting easier: the state files are easy to read and manipulate or repair in the event of a drift, mismatch, or botched provider update.

With the solution proposed it feels like the state becomes a black box I shouldn’t put my hands in. I wonder how the troubleshooting scenarios change with it.

Personally, I haven’t ran into the scaling issue described; at any given time there is usually only one entity working with the state file. We do use terragrunt for larger systems but it is manageable. ~1000 engineer org.

lawnchair•4mo ago
You are right that the simplicity of the state file is a strength and we do not want to lose that. One of our goals with Stategraph is to make state just as easy to inspect through both the command line and the UI.

Not every Terraform setup runs into scaling pain. The trouble tends to show up in larger repos with thousands of resources where teams share big chunks of infra. That is where global locks and full refreshes become a bottleneck and where we think graph semantics help.

philipallstar•4mo ago
> inspect through both the command line

This is a bit worrying, though. Do you mean through regular tools like cat or vim, or do we have to install a stategraph-manager tool (and upgrade it ad nauseum) just to look at the state?

lawnchair•4mo ago
Regular tools (jq, cat, etc.) still work. That ability doesn't go away.
philipallstar•4mo ago
Oh, nice.
mdaniel•4mo ago
> easy to tell or predict what terraform will do

predict is the operative word there, because Terraform is so disconnected from the underlying provider's mental model that it is the expression "no plan survives first contact with the enemy" made manifest

Now, I am one million percent open to the pushback that "well, that's a provider's problem" but I also can't easily tell if they are operating within the bounds of TF's mental model, or is it literally that every provider ever is just that lazy?

pst•4mo ago
This is awesome. Having a single state for all resources in an environment is critical for keeping all the moving pieces in check and a core design aspect of Kubestack. But the growing state files quickly become a bottleneck. I'm definitely giving this a good test drive. Very excited.
sausagefeet•4mo ago
Thank you, that is great to hear! We're pushing pretty hard to get a pre-alpha out to get some foundations testable by the community.
tuananh•4mo ago
can it be a sqlite db in s3 with locking implemented with s3?
sausagefeet•4mo ago
Hello, Stategraph developer here, the answer is: probably not. That doesn't resolve the core issue of state being managed as a big blob.
tuananh•4mo ago
but that big blob is a database. surely it's better than a json file right?
sausagefeet•4mo ago
In terms of the semantics er care about, not really. You still have to lock the whole thing to some with it.
tuananh•4mo ago
for locking individual resources right?
sausagefeet•4mo ago
Hey! One of the Stategraph developers here and can answer any questions. The major motivation is just how small scale Terraform/Tofu start to breakdown and creates work for users when they have to refactor for performance issues that shouldn't exist. So we want a drop in solution that just dissolves those issues without the user having to do anything.
codethief•4mo ago
Hi sausagefeet! I'm a bit late to the party but maybe you'll still see my message.

First of all: Very cool project! I have spent the last couple months studying this problem space and arrived at the exact same conclusions as you. So Stategraph would be very interesting to us. However, we use Pulumi (with Azure blob storage as "DIY storage backend", i.e. rather similar to a TF state file) or are in the process of migrating to it. Do you think it would be feasible to write a storage backend (or a "meta" provider) for Pulumi which uses Stategraph behind the scenes?

giveita•4mo ago
Not an expert, but doesn't microservices help with this. Each microservice has its own YAMLesque resource descriptor (TF, cloudformation, whatever) and is managed independently. My team can add a SQS or S3 without locking your team.

I might be wrong regarding more sophisticated infra though.

sausagefeet•4mo ago
Not necessarily. The guidance is to split your TF code across multiple states which might feel like it make sense but for your microservices to communicate that beed to share some base infrastructure, such as networking, so where does that live? Putting dependencies in their own state means that you lose the ability to understand how changing them impacts all of your infrastructure because you have this information black hole at the boundary of their state.

With Stategraph, you'll get all the benefits and isolation of separate state files, but when you changed resources, you'll get meaningful plans around all of the infrastructure they impact, not just the statically defined boundaries of a state file.

lawnchair•4mo ago
Author here. You are right that splitting by microservice reduces overlap. The problem is shared resources never go away such as VPCs IAM or databases so contention shows up there.

Splitting state files is the common workaround but that only creates new problems like cross state dependencies and orchestration glue. The real issue is the storage model which is a single JSON blob with a global lock. Treating state as a graph with proper concurrency control avoids contention while keeping a cohesive view of infrastructure.

spinningarrow•4mo ago
Do you have an example you can share?

We have about 30 services with each managing their own terraform state. We also have a shared infra repo managing some top level items. We haven’t run into any issues (with any regularity at least) that I can think of but I’m wondering if this could be a good tool for us as we grow and things become even more complex?

lawnchair•4mo ago
The pain really shows up when teams manage large sets of infrastructure in one place with thousands of resources. Even a small change forces a global refresh and a global lock, so you end up waiting on operations that have nothing to do with your change. Splitting reduces contention but fragments your view of the system. We want state to behave like the dependency graph it already is.
giveita•4mo ago
Thanks this is fascinating. I now have a thread to pull on as I want to understand how my corp does this.
codethief•4mo ago
Hi lawnchair, I'm a bit late to the party, but in case you do see my message, I'd be very interested in an answer to this question -> https://news.ycombinator.com/item?id=45414381 . Thanks so much! :)
mystifyingpoi•4mo ago
It is the usual DRY/WET concern. Having microservices be completely independent and relying only on shared message broker or service discovery has its benefits, but the cost is generally duplication of things. Things like "whitelist this inbound IP for all services" or "configure telemetry endpoint" often end up in making N changes to N separate repos, and it becomes hell if you have to talk to N teams.
arccy•4mo ago
so kind of like crossplane where each resource is managed individually?
dwroberts•4mo ago
If you use a tool like Atmos (https://atmos.tools/) you kind of fix this issue already for free - because it takes the place of the root module, it actually manages the state of each sub module separately (they each have their own individual state file rather than being converged into one).
lawnchair•4mo ago
I don't think it fixes it. Atmos makes splitting and managing multiple states easier, but it still splits the graph. It doesn't change the underlying execution model.
angio•4mo ago
How does this compare with Pulumi? AFAIK they also don't have a state file and relay on an external database to store state. Is your locking granularity better?
lawnchair•4mo ago
I don't know enough about Pulumi to make a fair comparison on locking granularity. Pulumi's model is pretty different from Terraform/OpenTofu in general and state management is only one part of that. We're focused on optimizing the Terraform execution model and making the state layer match the graph semantics it already uses.
cyberpunk•4mo ago
I mean take this with a grain of salt and purely anecdotal; but everywhere I've heard of who chose pulumi over tf are no long using pulumi. I'd love to hear some opposing experiences to that though!
cedws•4mo ago
I was in a platform team using Pulumi (TypeScript) for a while. An issue I observed is that the team members with weaker programming skills were contributing not so great changes, and parts of the codebase diverged in style. The Output type also took some time for us to get our heads round and it felt awkward to work with, we were having to chain a lot of calls and had callback hell sometimes.

We were all experienced with Go but at the time the Go SDK was very awkward, although I think some of that has been resolved with generics now. TF is less expressive but I think that’s actually better for 99% of cases.

angio•4mo ago
I'm also in the camp that stopped using Pulumi, in part because despite the lack of state file it feels even more sluggish than tf.
johanneskanybal•4mo ago
I think that’s the article but tl;dr that’s only part of the problem and already widly adopted with mutexes in say dynamo or whatever flavor you chose. This is about not having global locks or 10 arbitary random locks per subdomain but rather figuring out the exact resources affected and locking only those.

Sounds very neat if you’re an big enough org.

sylens•4mo ago
It's an interesting proposal because they correctly call out that segmenting state files by workspace/environment in a very judicious way causes its own issues as you approach scale or have to work across environments. There is an entire industry of tools and services that help to streamline this process for you, but it still feels very hacky.

I'm curious if this will be compatible with tools like Spacelift or Env Zero, or if they are going to build their own runner/agent to compete in that space.

lawnchair•4mo ago
We are already in that space [0] though that's not the focus of this post. Working with teams at scale on orchestration is what pushed us to look deeper at state itself and eventually create this project.

0: https://terrateam.io

anonymousDan•4mo ago
Are there any statistics/analyses for the popularity of these different configuration management languages/frameworks (Terraform, Pullumi etc) in cloud settings? Trying to figure out which one(s) are worth learning.
sausagefeet•4mo ago
I believe the DORA report has some information on this. Terraform/Tofu dominate, by far.
sgarland•4mo ago
This is very cool. I love the idea of querying the state, and it opens up a ton of very easy reporting options.
lawnchair•4mo ago
Agreed. One of the frustrating things about using Terraform or OpenTofu is that all the data is sitting there in state but you can't really query or report on it. Making that information accessible is a big part of why we are building this.
stackskipton•4mo ago
Disagree since you are relying on non authoritative state to be correct. This can be a path to madness.
linuxftw•4mo ago
If you're at the point of managing thousands of resources inside a single statefile, and that makes the most sense to your setup: you've outgrown terraform.
lawnchair•4mo ago
From my experience the problem at scale isn't that Terraform stops being useful. The problem is how state gets managed. Every IaC approach still has to coordinate changes across shared resources and none of them escape that. Other tools just shift the trade offs around. In house systems usually end up rebuilding the same thing in another form.

At scale the choices are pretty simple. You split state and live with orchestration glue. You move to a controller model and take on the operational overhead (see Crossplane). Or you keep a cohesive graph and fix the state layer. Those are the real options (imo). It's not about outgrowing Terraform.

linuxftw•4mo ago
Changes across shared resources should be few and far between. Terraform really should only be used to setup long-lived resources, such as your VPC, initial IAM, and a bootstrapping system/management plane (eg, your kubernetes cluster). Once your infrastructure is up and running, further operations should be api-driven (aka controllers).

I'm not really a fan of crossplane, it's much simpler to roll your own custom operator, especially now that things like the Azure Service Operator exist (I think there's something equivalent for aws as well). This gives you a lot more flexibility for writing unit tests for your business logic.

alance•4mo ago
I'm wondering how access controls play into this.

Team A manages VPCs and Security groups, for example.

Team B manages autoscaling groups, EC2, etc.

It's great that now the two teams can look after their own things and not be too worried about resource contention with the other team. But if it's a centralized Postgres database (as you seem to be suggesting?) and both teams have write access to it...

How do we prevent teams from making changes to stuff that isn't "theirs" ?

And if the answer is "well this team only has IAM access to resources xyz", well then might it be a little tricky to represent the Stategraph DAG permission boundaries in IAM policy?

(ps: huge fan of terrateam's offerings -- Alex from tfstate.com)

lawnchair•4mo ago
Hi Alex. Great question and definitely something top of mind as we build Stategraph. The short answer is there is always a service layer in front of the database. Users and teams interact with that service, not the database directly. That is where access control and ownership boundaries live.
solatic•4mo ago
Very cool. Biggest question I have is how users with large setups where Terraform state has already been split would migrate to this. Would existing state blobs be namespaced into the One True State Graph in PG? Would Stategraph know how to merge different state blobs that are pointing to the same real-life resources? Will Stategraph promote some kind of convention for how state should be named, so that new projects can on-board to using the same state? Should Terraform fit a monorepo or polyrepo model with this kind of backend?
sausagefeet•4mo ago
The high-level vision of Stategraph is that the entire world's infrastructure should be representable as a single root module, with proper isolate and RBAC. It should scale and be secure. With that, the way Stategraph works best is everything being in a single repo and in a single root module.

Additionally, Stategraph should Just Work with your existing TF codebase. You important the state and you're off to the races.

Once all of your state is in Stategraph, though, moving state around really becomes a question of what name those piece of state should have. So, if you want to merge two root modules, it could be the case that you can check "do my resource names overlap?" If no, you can tell Stategraph to merge the states and then copy your code into a single root module and go. Otherwise, you need to do some resource renaming.

While we don't have all the details in place, I think it is quite likely that Stategraph will support metadata on your resources, perhaps with a new block. This way you could provide namespaces to collections of resources, and that could make merging even easier. But, there is a bit to figure out before that is a reality or determined to be the best way to go.

solatic•4mo ago
Yeah I get the feeling that the correct approach is something like defining (1) a naming convention for the One True State, (2) defining some kind of namespace, passed in the backend config, to which state identifiers are copied when their state IDs don't conform with One True State, (3) some kind of UI to propose/communicate + CLI to rewrite the state identifiers and add moved blocks to import code to start to use One True State.
VectorLock•4mo ago
Having done some work with building graph representation of compute resources (by importing tfstate & syncing from aws) and its a very useful representation for building things like visualizations and detecting dependencies. Moving tfstate to a real distributed graph system would help solve a LOT of the nasty hacks they mention in this article that I have run into HARD when working with very large numbers of resources and multiple teams/team sizes.
alphazard•4mo ago
I don't think a DAG makes sense as a model here. I've always likened terraform apply to setting what would be called a "waypoint" in robotics, or a "keyframe" in animation. The system doesn't immediately change state to the new state all at once, but it starts to tend towards the new desired state.

While the intermediary states might form a graph when considering different subsystems, the desire to move towards a given state should probably appear serial over time. When you make a terraform transaction, you have to coordinate in a serializable way with everyone else managing the desired state, even though the actual state will be messy along the way.

sausagefeet•4mo ago
Your Terraform/Tofu code is a graph. Resources are nodes and the dependencies between them are edges, so a graph is a very natural and useful representation of infrastructure. There are well-understood algorithms for transitioning a graph between two representations that can be leveraged as well. I think how you describe representing state would work fine, but a graph works quite well and is more natural for the types of operations we want to perform on the graph.
cyberax•4mo ago
I deeply believe that the whole "state" approach with TF is flawed. You end up doing a 3-way merge between the actual state, the desired state, and the recorded state every time you try to make changes.

Long time ago, I was simply doing stuff like this:

> resources = describe_resources_by_tag(env_name=env, some_tag=tag) > if resource_doesnt_exist(resources, some_resource): > create_resource(resource)

This was very robust and easy to explain. You look around the system, using some tag-based filtering (in AWS, GCP, Azure) and then perform actions to bring the system to the desired state.

octopoc•4mo ago
The proper way to solve this would be to have a cloud provider (possibly based on AWS or Azure) whose UI simply edits an auto-generated IaaC script. That way ALL changes go through the code. And if there's a need to do a 3-way merge, it's obvious, because you're having to do a 3-way merge in git.
elevation•4mo ago
> The Terraform ecosystem has spent a decade working around a fundamental architectural mismatch: we're using filesystem semantics to solve a distributed systems problem. The result is predictable and painful.

You portray this as a design flaw, but it's just the Hashicorp marketing funnel towards hosted Terraform, which solves the arbitration problems that you encounter at scale while allowing Hashicorp to give the cli tooling away for free.

lawnchair•4mo ago
Terraform Cloud does not actually solve this. It moves the state into their service and manages locking centrally, but the underlying model is unchanged. It is still one file with a global lock. That is why refresh bottlenecks and contention show up there too. What we are pointing at is the model itself, not just where the file is hosted. We know this space well from building Terrateam, which competes with Terraform Cloud on orchestration.
wiether•4mo ago
As someone who worked on projects which "solved" this issue by basically having one stack in one repo by resource sub-set (an ALB fronting an ASG will result in one repo for the ALB and one repo for the ASG), I can only welcome any solution trying to simplify team collaboration on Terraform stacks!
time0ut•4mo ago
Very interesting. I found myself nodding YES the whole way through the post. Something like this could lead to a large shift in how we manage infrastructure. We split terraform configs for more reasons than just splitting state of course, but something like this could make other approaches to organizing things more viable. Really cool and will be keeping an eye on this.
foreigner•4mo ago
This looks awesome but what is the end state (pun intended) going to be? Will this be an open source project? Commercial closed-source? Somewhere in between?
nikolay•4mo ago
This is the least of the problems. The big problem is the shared cache. Supposedly OpenTofu fixed it but Terraform proper is still not allowing multiple Terraform plans and/or applies in parallel over the same cache. Atlantis claimed that it fixed it, but it didn't. So pathetic!
otterley•4mo ago
I can't help but wonder whether the problem being addressed is the result of two antipatterns.

The first is that the scope being managed by a single Terraform application is too broad (e.g., thousands of resources instead of tens or hundreds). File-level locking is fine for small databases with few to no concurrent writers, but as more users come in, and the database gets bigger, you need record-level locking. For Terraform state files, it begs the question why the database got so big and why there were so many concurrent users in the first place.

Second, Terraform state files are a cache but they're being mistreated as a source of truth. This isn't the user's fault but it is the result of (understandable) impatience which results in inevitable shortcut-seeking. It's been a risk since Terraform's inception, and it won't go away as long as people complain that collecting current actual state from the resource provider is too slow.

thaneross•4mo ago
Unfortunately many providers don't expose all the state used to create certain resources through the API, so there's no way to download the real source of truth after the fact.
Too•4mo ago
This is cool and useful for teams with huge states and slow locks. Terraform, while great, still has lots of room for improvement.

I keep wondering though if split state files isn’t a good thing in some sense anyway. It isolates a lot of problems. Access, obviously. Another common theme is that applying tends to be a game of roulette, even if you haven’t changed anything. Cloud vendor added or renamed something in their backed. Provider update made some field deprecated. Expected state drift that wasn’t properly marked as ”ignore_changes”. Unexpected state drift by your over-excited intern. When I as an app-developer apply a simple config file change, I really don’t want to be bothered about dirty state in the networking backbone that I understand nothing about.

It’s also not entirely clear how the solution keeps track of the desired state if multiple actors are changing it at the same time. Wouldn’t one successful apply make the next person, who don’t have your local changes, revert it back on first apply? Does this expect heavy use of resource targeting?