Terraform requires a DAG. AWS allows cycles. Here's how I map the difference.

9•davidlu1001•2w ago

Error: Cycle: aws_security_group.app -> aws_security_group.db -> aws_security_group.app

If you've ever seen this error while importing AWS infrastructure to Terraform, you know the pain.

Terraform's core engine relies on a Directed Acyclic Graph (DAG). It needs to know: "Create A first, then B."

But AWS is eventually consistent and happily allows cycles.

The Deadlock

The most common culprit is Security Groups. Imagine two microservices:

- SG-App allows outbound traffic to SG-DB - SG-DB allows inbound traffic from SG-App

If you write this with inline rules (which is what terraform import defaults to), you create a cycle:

  resource "aws_security_group" "app" {
    egress {
      security_groups = [aws_security_group.db.id]
    }
  }

  resource "aws_security_group" "db" {
    ingress {
      security_groups = [aws_security_group.app.id]
    }
  }

Terraform cannot apply this. It can't create app without db's ID, and vice versa.

The Graph Theory View

When building an infrastructure reverse-engineering tool, I realized I couldn't just dump API responses to HCL. We model AWS as a graph: Nodes are Resources, Edges are Dependencies.

In a healthy config, dependencies are a DAG: [VPC] --> [Subnet] --> [EC2]

But Security Groups often form cycles: ┌──────────────┐ ▼ │ [SG-App] [SG-DB] │ ▲ └──────────────┘

Finding the Knots

To solve this for thousands of resources, we use Tarjan's algorithm to find Strongly Connected Components (SCCs). It identifies "knots" — clusters of nodes that are circularly dependent — and flags them for surgery.

In our testing, a typical enterprise AWS account with 500+ SGs contains 3-7 of these clusters.

The Fix: "Shell & Fill"

We use a strategy to break the cycle:

1. Create Empty Shells: Generate SGs with no rules. Terraform creates these instantly. 2. Fill with Rules: Extract rules into separate aws_security_group_rule resources that reference the shells.

  Step 1: Create Shells
    [SG-App (empty)]      [SG-DB (empty)]

  Step 2: Create Rules
          ▲                     ▲
          │                     │
    [Rule: egress->DB]    [Rule: ingress<-App]

The graph is now acyclic.

"Why not just always use separate rules?"

Fair question. The problem is: 1. terraform import often generates inline rules. 2. Many existing codebases prefer inline rules for readability. 3. The AWS API presents the "logical" view (rules bundled inside).

The tool needs to detect cycles and surgically convert only the problematic ones.

Why terraform import isn't enough

Standard import reads state as-is. It doesn't build a global dependency graph or perform topological sorting before generating code. It places the burden of refactoring on the human. For brownfield migrations with 2,000+ resources, that's not feasible.

---

I've implemented this graph engine in a tool called RepliMap. I've open-sourced the documentation and IAM policies needed to run read-only scans safely.

If you're interested in edge cases like this (or the root_block_device trap), the repo is here:

https://github.com/RepliMap/replimap-community

Happy to answer questions.

Comments

davidlu1001•2w ago

Author here. A few implementation notes:

1. We use NetworkX for the graph operations. Tarjan's SCC detection is O(V+E), so it scales well even for large accounts.

2. The trickiest part isn't the algorithm — it's mapping AWS API responses to graph edges. AWS APIs are... inconsistent. Some resources return IDs, some ARNs, some Names. Security Groups can reference themselves, reference by ID or by name, and have rules scattered across inline blocks and separate resources. Normalizing this soup into a clean adjacency matrix is where 80% of the engineering work lives.

3. For those wondering about the "Shell & Fill" naming: it's essentially forcing Terraform's create_before_destroy lifecycle behavior manually, by decoupling the resource identity from its configuration.

Would love to hear if others have hit similar graph problems with other IaC tools (Pulumi, CDK, CloudFormation).

talolard•2w ago

Not IAC, but I’ve been doing a similar trick to sequence adding type annotations to python code,

Eg take the module graph, break the SCCs in a similar manner , then take a reverese topological sort of the imports (now a dag by construction).

davidlu1001•2w ago

That's a spot-on parallel! Python circular imports (especially for type hinting) are basically the software equivalent of this infrastructure deadlock.

Do you use string-based forward references ("ClassName") to break the cycles? That's essentially our "empty shell" trick — decoupling the resource identity from its configuration to satisfy the graph.

Did you stick with Tarjan's for the SCC detection on the module graph?

talolard•2w ago

I haven’t had major issues with sccs yet. The linter enforces forward references so the cycle pain we do have is with dynamic/deffered imports, and it’s usually solved by splitting a module.

If you look at the pyrefly repo (metas new type checker), there are some deep thoughts about sccs, but I didn’t fully grok them.

davidlu1001•2w ago

Thanks for the Pyrefly pointer — I hadn't tracked Meta's Rust rewrite yet. Will dig into their SCC handling.

Your "splitting a module" framing is exactly right. In the IaC world, a Security Group with inline rules is like a Python module with circular imports — it couples identity with logic. The fix is the same: extract the logic into separate resources (or modules), keep the original as a pure identity/interface.

Interesting that the same pattern shows up in both compiler design and infrastructure tooling.

andyjohnson0•2w ago

Please don't do this. Ask HN isn't your blogging platform. Per the guidelines its for asking questions of the community.

davidlu1001•2w ago

Appreciate the feedback. To be transparent: I originally submitted this as a standard text post, but after it hit a spam filter, the HN moderators kindly restored it and moved it to /ask themselves to help with visibility.

I'm definitely here for the dialogue, specifically looking to compare notes on graph algorithms with other IaC engineers.

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

AI Regex Scientist: A self-improving regex solver

Tell HN: Another round of Zendesk email spam

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Kernighan on Programming

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

AI Regex Scientist: A self-improving regex solver

Tell HN: Another round of Zendesk email spam

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Kernighan on Programming

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: Has anybody moved their local community off of Facebook groups?

Ask HN: OpenClaw users, what is your token spend?

Terraform requires a DAG. AWS allows cycles. Here's how I map the difference.

Comments