A few weeks ago I shared a design pattern I've been building: a governance protocol that lets Claude Code Skills accumulate domain knowledge across sessions without bloating.
The core idea is a Five-Gate protocol that controls what gets written into a living knowledge base — the most common outcome of the gates is "do nothing." 63.6% rejection rate in my first experiment.
What's new since the last post:
I've now completed three rounds of validation on the same database (smart building management, 29 tables). Each round tested progressively more sophisticated capabilities:
v3 passed 6/6 verification points. The highlight was T3.3: a user claimed incorrect enum values, and Gate 2 correctly rejected them because they contradicted SQL-verified data already in the knowledge base. The system defended its own knowledge integrity against incorrect human input.
v3 also found a critical bug in the inject command — it silently wrote to the wrong path when given a relative --target. Fixed, patched, and verified.
What the computation layer looks like now:
The confidence math no longer lives in SKILL.md prompts (which caused LLM calculation errors). It's been moved to a Python tool layer:
C(t) = C0 × e^( -λ_base × (β+1)/(α+1) × t )
143 pytest cases passing. LLM judges, Python computes.
What this is not:
This isn't a finished product. The knowledge base for my test domain has 8 entries after 17 tasks — deliberately sparse. The design philosophy is that a mature Skill that stops growing is healthy, not stalled. Convergence is the goal.
What I'm still iterating on: λ calibration across domains requires a second experiment (pending data ethics clearance on production data), the α/β upper bound question is open, and protocol compliance still depends on LLM discipline with no mechanical enforcement.
Repo: github.com/191341025/Self-Evolving-Skill
Still building. Feedback welcome.
chistev•1h ago
Why do the new green accounts post about Claude a lot?
tiansenxu•1h ago
Not promotion — I'm the author. This is an ongoing personal project I've been building and validating over the past few weeks. Happy to discuss the technical details.
tiansenxu•1h ago
v1: Basic Five-Gate protocol — rejection rate, self-correction behavior v2: Confidence decay model — C(t) = C0 × e^(-λ × (β+1)/(α+1) × t) v3: Phase 5 enhancements — entities tagging, search-driven retrieval, hard/soft signal distinction
v3 passed 6/6 verification points. The highlight was T3.3: a user claimed incorrect enum values, and Gate 2 correctly rejected them because they contradicted SQL-verified data already in the knowledge base. The system defended its own knowledge integrity against incorrect human input. v3 also found a critical bug in the inject command — it silently wrote to the wrong path when given a relative --target. Fixed, patched, and verified. What the computation layer looks like now: The confidence math no longer lives in SKILL.md prompts (which caused LLM calculation errors). It's been moved to a Python tool layer: C(t) = C0 × e^( -λ_base × (β+1)/(α+1) × t ) 143 pytest cases passing. LLM judges, Python computes. What this is not: This isn't a finished product. The knowledge base for my test domain has 8 entries after 17 tasks — deliberately sparse. The design philosophy is that a mature Skill that stops growing is healthy, not stalled. Convergence is the goal. What I'm still iterating on: λ calibration across domains requires a second experiment (pending data ethics clearance on production data), the α/β upper bound question is open, and protocol compliance still depends on LLM discipline with no mechanical enforcement. Repo: github.com/191341025/Self-Evolving-Skill Still building. Feedback welcome.