async def fetch_active_mandates(): # Pull IN_PROGRESS and APPROVED work from Postgres
async def fetch_pending_horizon(): # Pull top PENDING items — awareness only, not authorization
qdrant_hits, mandates, horizon = await asyncio.gather( fetch_qdrant_hits(), fetch_active_mandates(), fetch_pending_horizon(), return_exceptions=True # enrichment failure never blocks execution ) The injected block looks like: --- INSTITUTIONAL MEMORY (read-only, do not modify):
Prior relevant operations: - [SECURITY] Score:9/10 | Action:detonate_package Result: PERC H710 Mini does not support JBOD/Non-RAID on iDRAC 7. Used single-drive RAID-0 as workaround.
Active mandates: - [Phase 15] R720xd Provisioning — APPROVED (priority 8)
Upcoming pipeline (awareness only — not yet authorized): - [priority 6] Visual & Strategy Audit — PENDING
CONSTRAINTS: Read this context to inform your work. You may NOT update mandates, write to memory, or modify fleet state. All outputs are returned as string payloads to the L3 Architect for review and commit. --- The hard constraint block matters. Without it, a capable local model will attempt to act on context it shouldn't touch. The read-only boundary is enforced in the prompt, not technically — but in practice it works because the model is explicitly told its role. Results so far: The hardware-specific mistake that prompted this (local model looping on invalid RAID commands for 20 minutes) wouldn't happen now — the correct workaround is in execution_memory and would surface on the next similar task. The open question: How do you prevent the context window from getting polluted over time as execution_memory grows? Right now using score_threshold=0.5 and limit=3 on the semantic search. Curious whether others have found better filtering strategies for long-running agentic systems. Code is self-hosted, stack is Qdrant + Postgres + Neo4j + Ollama. Happy to share more details on any piece.