frontpage.

Giving local LLMs read-only institutional memory before task execution

2•LavaDMan•2h ago

I run a three-tier agentic system: cloud LLM for architecture/review, local 32B model for code generation and execution, smaller models for evaluation. The local model (Qwen2.5-Coder 32B) kept making avoidable mistakes — suggesting approaches that had already failed, ignoring active project context, reinventing solutions we'd already discarded. The problem isn't capability. It's that each delegation is stateless. The local model gets a task string and nothing else. The fix: Before every local model call, run a parallel enrichment pipeline and inject the results into the system prompt. async def fetch_qdrant_hits(): # Embed the task, search execution_memory for similar prior operations # Returns: operation_type, outcome_score, last_action, result summary

async def fetch_active_mandates(): # Pull IN_PROGRESS and APPROVED work from Postgres

async def fetch_pending_horizon(): # Pull top PENDING items — awareness only, not authorization

qdrant_hits, mandates, horizon = await asyncio.gather( fetch_qdrant_hits(), fetch_active_mandates(), fetch_pending_horizon(), return_exceptions=True # enrichment failure never blocks execution ) The injected block looks like: --- INSTITUTIONAL MEMORY (read-only, do not modify):

Prior relevant operations: - [SECURITY] Score:9/10 | Action:detonate_package Result: PERC H710 Mini does not support JBOD/Non-RAID on iDRAC 7. Used single-drive RAID-0 as workaround.

Active mandates: - [Phase 15] R720xd Provisioning — APPROVED (priority 8)

Upcoming pipeline (awareness only — not yet authorized): - [priority 6] Visual & Strategy Audit — PENDING

CONSTRAINTS: Read this context to inform your work. You may NOT update mandates, write to memory, or modify fleet state. All outputs are returned as string payloads to the L3 Architect for review and commit. --- The hard constraint block matters. Without it, a capable local model will attempt to act on context it shouldn't touch. The read-only boundary is enforced in the prompt, not technically — but in practice it works because the model is explicitly told its role. Results so far: The hardware-specific mistake that prompted this (local model looping on invalid RAID commands for 20 minutes) wouldn't happen now — the correct workaround is in execution_memory and would surface on the next similar task. The open question: How do you prevent the context window from getting polluted over time as execution_memory grows? Right now using score_threshold=0.5 and limit=3 on the semantic search. Curious whether others have found better filtering strategies for long-running agentic systems. Code is self-hosted, stack is Qdrant + Postgres + Neo4j + Ollama. Happy to share more details on any piece.

Show HN: Ducktel – observability when the consumer is an AI agent, not a human

Stop telling people to sanitize user input

A Crypto Coin Is Gobbling Up U.S. Treasuries

AI Mythology Stories Generator

2.0.0-beta.1, or how I avoided working on my thesis

What will the paper of the future look like?

Coding My Handwriting

Owning Code in the Age of AI

Online age-verification in U.S. for child safety, but adults being surveilled

Anthropic sues Trump admin over supply-chain risk label

Show HN: Llmpm – NPM for LLMs

Trump's Canada Trade War Hits Jack Daniel's and Jim Beam with 'Devastating' Loss

Following AI generated reviews, Resident Evil AI guide books flood Amazon

2026 F1 Cars Are Shorter but Still Longer Than a Chevy Tahoe

Muon on Graph Neural Networks

ZeniMax Files Trademark for Quake

Why Are Viral Capsids Icosahedral?

How to Recalculate a Spreadsheet (2020)

How do you track and optimize your AI API spend?

Show HN: Zenòdot – Find if a book has been translated into your language

NutriAI – AI nutrition and fitness planner

Just Move to Europe

AI-powered I Ching oracle for reflective decision-making

AMD VP uses AI to create Radeon Linux userland driver in Python

Why London Could Become "Agent Capital"

MediaVault: Secure Call Recording Storage for Contact Centers

eBay – What's Ending Soon?

Datahäxan

Show HN: League Donation – Comprehensive Fantasy Baseball Analytics Dashboard

Show HN: Robotics runtime in the browser (flight controller, WebAssembly)