Every session my agent would start fresh with no memory of the architectural decisions we had made. It would confidently ignore naming conventions, bypass security patterns, and quietly undo things I had spent weeks getting right.
I tried CLAUDE.md, .cursorrules, plan files, task files. They all have the same ceiling: the agent treats them as suggestions, context windows bury them as sessions grow, and there is zero enforcement when they get violated. The prompt is the spec in AI-native development, and right now that spec disappears every time the chat closes.
MarkdownLM is my attempt to fix the layer below the agent, not by writing better prompts, but by treating your team's engineering rules as infrastructure.
How it works:
Your knowledge base lives in structured categories: architecture decisions, security constraints, business logic, naming conventions, whatever your team actually cares about. When an agent makes a call, MarkdownLM uses semantic embeddings to pull only the relevant rules rather than flooding the prompt with your entire knowledge base. Out of 500 documents, the agent sees the 3 that matter for this specific task. That keeps context focused, tokens low, and the agent from getting lost in irrelevant rules.
Before generation, relevant context is injected. After generation, a validation gate checks the output against your rules and blocks violations with a receipt showing the specific rule, the reason, and the smallest suggested fix. When the agent hits something ambiguous with no rule coverage, it does not guess and ship. It stops, flags it as a gap, and routes it to whoever you have designated as the decision maker for that category.
Everything is MCP-native so it works across Cursor, Claude Code, and any MCP-compatible host without changing your workflow. The CLI lets you manage your knowledge base from the terminal like code: clone, diff, push, sync across your team.
What I learned building it:
The interesting part was realizing the cost structure. Using Google's text-embedding-004 at $0.15 per million input tokens to retrieve the right 3 documents means the retrieval layer costs fractions of a cent per call. That cheap embedding lookup replaces what would otherwise be a 100k-token prompt. Lower cost and better results because focused context beats large context almost every time.
The gap resolution feature surprised me most in practice. Teams do not just have rule violations. They have rule gaps, situations the agent encounters that nobody thought to write a rule for yet. Surfacing those gaps as actionable items rather than silent guesses turned out to be as useful as the enforcement itself.
Current state:
Public beta. BYOK is free, no credit card. Your code never touches our servers. The CLI and MCP server are open source on GitHub. Will stay free for individuals because I know the pain.
Site: https://markdownlm.com CLI: https://github.com/MarkdownLM/cli MCP: https://github.com/MarkdownLM/mcp
I am the solo founder. Brutal feedback is the only feedback I want.