frontpage.

Why codex /goal fails on complex workflows: compaction amnesia and context rot

1•shaurya-sethi•50m ago

Hi HN,

When Openai released `/goal` earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex multi-issue workflows: context rot.

This isn't anything new, but given how openai positioned this feature to developers, i was let down by how they'd implemented context management.

Though /goal is a step forward in long-horizon coding, it lacks task decomposition and proper handling of context - it uses a multi-tier approach that includes persistent context chaining (PCC) to memory, local vector embeddings for RAG, sliding windows, and compaction.

In principle, giving codex a directive of `/goal work towards closing my open issues on github` should work but this specific execution model hits a fatal wall - Even with massive context windows and RAG, llm reasoning quality degrades significantly beyond 100-150k~ tokens, the agent continues working with worsening performance and finally to prevent token exhaustion it uses compaction to summarize old logs. In practice, this causes compaction amnesia. The model is asked to summarize a massive blob of mixed-relevance information when its reasoning quality is already at its lowest. This compaction leads to forgetting critical constraints, makes way for hallucinations of past decisions, and introduces noise that makes the new context unreliable for long-horizon work.

I wanted to see if enforcing strict outer-loop boundaries would solve this, so I put together an open-source Rust utility called Nightshift (https://github.com/Shaurya-Sethi/nightshift) to test this theory. Instead of running a single long-running session, it isolates the work like this:

1. You write a PRD as a parent Github issue that defines what needs to be implemented and break it down into vertically sliced child issues with explicit kanban-style dependencies. 2. You run `nightshift --prd 1 --agent <any-agent-of-your-choice>` 3. nightshift utilises `gh` cli to resolve the dependency graph and pick the next unblocked issue. 4. it syncs the repo, puts together essential context for just that issue and starts a new agent session piping the prd and issue context directly to stdin for the agent to pick up. 5. the agent is now responsible for the usual coding - new feature branch, implementation and testing, pr and self-review, and finally closes the issue. 6. nightshift finds the next unblocked issue after maintaining git hygiene and loops until all issues linked to the prd are resolved.

It's a very simple orchestration. The agent has no memory of previous runs and it doesn't need to - each task is isolated and gets a fresh agent session. The state is managed entirely through filesystem and git operations and you get determinstic scheduling, failure isolation, and robust autonomy.

It currently supports claude-code, codex, cursor, antigravity, and pi coding agent, and im working on adding support for more agents as this project grows. It's totally open-source if you want to inspect how the session management is implemented.

I'd love to hear your thoughts on this and check out your experiments with long-horizon task orchestration. Maybe the way going forward is combining macro management with micro management?

I truly believe that by adding a dynamic task decomposition orchestrator that manages individual agents, /goal would solve half its problems.

Thanks!

Cloud meeting recorders record everyone in the room. Not just you

Show HN: Presentforme.ai – Make slide decks explain themselves

What's so special about Emacs? [video]

The power struggle in the narrow seas, a visual story

A look inside ITER, the world's largest fusion energy project

Google Health Sucks

Heimdall: Formally Verified eBPF-to-Rust Migration

Contrastive Decoding Diffing: Recovering Finetuning Data Without Weight Access

Cognitive Security as an AI Safety Cause Area

How to make a well-structured business architecture diagram?

Orchestrating AI code review at scale

Switchberry: Sometimes a good time costs extra [video]

We need to add 6k seats to Congress

In-Browser Container Builds

Bird–Meertens Formalism

The first class of AI natives is graduating

Show HN: A high-performance audio visualizer using Rust, WASM, and React

Show HN : Building Production MPC Wallets: Architecture, Solana Implementation

Show HN: GPTFortress, a 24/7 live-stream playing Dwarf Fortress with GPT-5

AI guardrails stripped from Meta and Google models in minutes

Ship Early, Learn Fast: What 10 Days of User Feedback Taught Me About My App

The Quiet Death of the Senior Individual Contributor

Show HN: Riot, a modern multicore actor-based ecosystem for OCaml

Why can't anyone build a decent deployment platform for plain HTML?

Frontier Model Training Methodologies

Microsoft to Publishers: Don't Block the AI Bots

Zero-knowledge encryption may not stop password theft if servers are hacked

AI Making Work Easy for Data Analysts and Founders

Why codex /goal fails on complex workflows: compaction amnesia and context rot

AI Gurus Are Charging Wall Street Banks $25,000 a Day

Why codex /goal fails on complex workflows: compaction amnesia and context rot

Cloud meeting recorders record everyone in the room. Not just you

Show HN: Presentforme.ai – Make slide decks explain themselves

What's so special about Emacs? [video]

The power struggle in the narrow seas, a visual story

A look inside ITER, the world's largest fusion energy project

Google Health Sucks

Heimdall: Formally Verified eBPF-to-Rust Migration

Contrastive Decoding Diffing: Recovering Finetuning Data Without Weight Access

Cognitive Security as an AI Safety Cause Area

How to make a well-structured business architecture diagram?

Orchestrating AI code review at scale

Switchberry: Sometimes a good time costs extra [video]

We need to add 6k seats to Congress

In-Browser Container Builds

Bird–Meertens Formalism

The first class of AI natives is graduating

Show HN: A high-performance audio visualizer using Rust, WASM, and React

Show HN : Building Production MPC Wallets: Architecture, Solana Implementation

Show HN: GPTFortress, a 24/7 live-stream playing Dwarf Fortress with GPT-5

AI guardrails stripped from Meta and Google models in minutes

Ship Early, Learn Fast: What 10 Days of User Feedback Taught Me About My App

The Quiet Death of the Senior Individual Contributor

Show HN: Riot, a modern multicore actor-based ecosystem for OCaml

Why can't anyone build a decent deployment platform for plain HTML?

Frontier Model Training Methodologies

Microsoft to Publishers: Don't Block the AI Bots

Zero-knowledge encryption may not stop password theft if servers are hacked

AI Making Work Easy for Data Analysts and Founders

Why codex /goal fails on complex workflows: compaction amnesia and context rot

AI Gurus Are Charging Wall Street Banks $25,000 a Day