frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Why codex /goal fails on complex workflows: compaction amnesia and context rot

1•shaurya-sethi•50m ago
Hi HN,

When Openai released `/goal` earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex multi-issue workflows: context rot.

This isn't anything new, but given how openai positioned this feature to developers, i was let down by how they'd implemented context management.

Though /goal is a step forward in long-horizon coding, it lacks task decomposition and proper handling of context - it uses a multi-tier approach that includes persistent context chaining (PCC) to memory, local vector embeddings for RAG, sliding windows, and compaction.

In principle, giving codex a directive of `/goal work towards closing my open issues on github` should work but this specific execution model hits a fatal wall - Even with massive context windows and RAG, llm reasoning quality degrades significantly beyond 100-150k~ tokens, the agent continues working with worsening performance and finally to prevent token exhaustion it uses compaction to summarize old logs. In practice, this causes compaction amnesia. The model is asked to summarize a massive blob of mixed-relevance information when its reasoning quality is already at its lowest. This compaction leads to forgetting critical constraints, makes way for hallucinations of past decisions, and introduces noise that makes the new context unreliable for long-horizon work.

I wanted to see if enforcing strict outer-loop boundaries would solve this, so I put together an open-source Rust utility called Nightshift (https://github.com/Shaurya-Sethi/nightshift) to test this theory. Instead of running a single long-running session, it isolates the work like this:

1. You write a PRD as a parent Github issue that defines what needs to be implemented and break it down into vertically sliced child issues with explicit kanban-style dependencies. 2. You run `nightshift --prd 1 --agent <any-agent-of-your-choice>` 3. nightshift utilises `gh` cli to resolve the dependency graph and pick the next unblocked issue. 4. it syncs the repo, puts together essential context for just that issue and starts a new agent session piping the prd and issue context directly to stdin for the agent to pick up. 5. the agent is now responsible for the usual coding - new feature branch, implementation and testing, pr and self-review, and finally closes the issue. 6. nightshift finds the next unblocked issue after maintaining git hygiene and loops until all issues linked to the prd are resolved.

It's a very simple orchestration. The agent has no memory of previous runs and it doesn't need to - each task is isolated and gets a fresh agent session. The state is managed entirely through filesystem and git operations and you get determinstic scheduling, failure isolation, and robust autonomy.

It currently supports claude-code, codex, cursor, antigravity, and pi coding agent, and im working on adding support for more agents as this project grows. It's totally open-source if you want to inspect how the session management is implemented.

I'd love to hear your thoughts on this and check out your experiments with long-horizon task orchestration. Maybe the way going forward is combining macro management with micro management?

I truly believe that by adding a dynamic task decomposition orchestrator that manages individual agents, /goal would solve half its problems.

Thanks!

Cloud meeting recorders record everyone in the room. Not just you

https://thoth-app.com/blog/2026-05-13-why-your-meeting-recorder-shouldnt-upload-your-audio/
1•MattVePhD•1m ago•0 comments

Show HN: Presentforme.ai – Make slide decks explain themselves

1•cheecheongfan•2m ago•0 comments

What's so special about Emacs? [video]

https://www.youtube.com/watch?v=mJZDmO5yOxE
1•internet_points•3m ago•0 comments

The power struggle in the narrow seas, a visual story

https://ig.ft.com/maritime-chokepoints/
1•helsinkiandrew•5m ago•0 comments

A look inside ITER, the world's largest fusion energy project

https://www.cnet.com/science/climate/inside-the-worlds-biggest-bet-on-fusion-energy/
2•giuliomagnifico•6m ago•0 comments

Google Health Sucks

https://joebaldwin.me.uk/blog/google-ruins-fitbit/
2•edent•7m ago•0 comments

Heimdall: Formally Verified eBPF-to-Rust Migration

https://arxiv.org/abs/2605.25411
1•igortru•13m ago•0 comments

Contrastive Decoding Diffing: Recovering Finetuning Data Without Weight Access

https://arxiv.org/abs/2605.25902
1•Timofeibu•14m ago•0 comments

Cognitive Security as an AI Safety Cause Area

https://www.lesswrong.com/posts/KGcE7eAdfxHchk25X/cognitive-security-as-an-ai-safety-cause-area
1•joozio•14m ago•0 comments

How to make a well-structured business architecture diagram?

https://www.processon.io/blog/business-architecture-diagrams
1•kapababala•15m ago•0 comments

Orchestrating AI code review at scale

https://blog.cloudflare.com/ai-code-review/
1•pramodbiligiri•17m ago•0 comments

Switchberry: Sometimes a good time costs extra [video]

https://www.youtube.com/watch?v=wxFHw57XGjA
1•teleforce•20m ago•0 comments

We need to add 6k seats to Congress

https://www.usatoday.com/story/opinion/2026/05/25/congress-larger-size-house-representatives/9014...
1•Cider9986•23m ago•0 comments

In-Browser Container Builds

https://ochagavia.nl/blog/fully-in-browser-container-builds/
1•gurjeet•24m ago•0 comments

Bird–Meertens Formalism

https://en.wikipedia.org/wiki/Bird%E2%80%93Meertens_formalism
1•tosh•24m ago•0 comments

The first class of AI natives is graduating

https://www.wsj.com/tech/ai/ai-natives-graduates-job-cuts-6bab8ac9
1•FDETalkDotCom•26m ago•1 comments

Show HN: A high-performance audio visualizer using Rust, WASM, and React

https://audiofftimage.netlify.app/
1•dmaynard•28m ago•1 comments

Show HN : Building Production MPC Wallets: Architecture, Solana Implementation

https://nethsara.substack.com/p/byowbuild-your-own-wallet-a-field
1•nethsarask•29m ago•0 comments

Show HN: GPTFortress, a 24/7 live-stream playing Dwarf Fortress with GPT-5

https://www.twitch.tv/gptfortress
1•leostera•33m ago•0 comments

AI guardrails stripped from Meta and Google models in minutes

https://www.ft.com/content/5630ed79-a263-41ed-9a1a-321617ae310e
4•thunderbong•33m ago•1 comments

Ship Early, Learn Fast: What 10 Days of User Feedback Taught Me About My App

https://qebapps.statichost.page/devnotes/ship-early-learn-fast/
1•qeb_newsairy•36m ago•0 comments

The Quiet Death of the Senior Individual Contributor

https://medium.com/@yalovoy/the-quiet-death-of-the-senior-individual-contributor-why-staff-engine...
1•zero-ground-445•36m ago•0 comments

Show HN: Riot, a modern multicore actor-based ecosystem for OCaml

https://riot.ml
1•leostera•37m ago•0 comments

Why can't anyone build a decent deployment platform for plain HTML?

https://foliodrop.app
1•jaxxchen•41m ago•1 comments

Frontier Model Training Methodologies

https://djdumpling.github.io/2026/01/31/frontier_training.html
1•xdotli•45m ago•1 comments

Microsoft to Publishers: Don't Block the AI Bots

https://www.adexchanger.com/publishers/microsoft-to-publishers-dont-block-the-ai-bots/
3•SVI•46m ago•0 comments

Zero-knowledge encryption may not stop password theft if servers are hacked

https://techxplore.com/news/2026-02-knowledge-encryption-password-theft-servers.html
2•Ember_Wipe•47m ago•0 comments

AI Making Work Easy for Data Analysts and Founders

https://anallyst.app/
1•Sechele•49m ago•0 comments

Why codex /goal fails on complex workflows: compaction amnesia and context rot

1•shaurya-sethi•50m ago•0 comments

AI Gurus Are Charging Wall Street Banks $25,000 a Day

https://www.bloomberg.com/news/features/2026-05-25/the-ai-trainers-charging-25-000-a-day-to-push-...
5•helsinkiandrew•51m ago•1 comments