Core idea: each “secret” (per user or per dataset) has its own independent post-quantum keypair. There is no master key.
Architecture summary:
Control plane: verifiable ownership, delegation, revocation, and append-only audit records (tamper-evident authorization history)
Custody plane: 5 custody nodes running in TEEs, each storing 1 key fragment
Orchestration: validates authorization then collects fragments to reconstruct keys only when needed
Key custody model:
Private key is generated in a TEE then immediately split via Shamir secret sharing into 5 fragments
Fragments are distributed to independent custody nodes
Original private key is destroyed
Threshold is 3-of-5 for reconstruction
Each custody node independently verifies authorization against the control plane before releasing its fragment
Clusters are disabled if membership changes (node exits disable the cluster)
Encryption scheme (hybrid PQ + symmetric):
Fetch ML-KEM-768 public key from content-addressed storage, verify integrity
ML-KEM-768 encapsulation per message/file/chunk to derive a fresh shared secret
Derive one-time AES key + IV via HKDF-SHA256
Encrypt payload with AES-256-GCM
Output includes a fixed-size envelope: ML-KEM ciphertext (1088 bytes) + GCM tag (16 bytes)
Decryption flow:
Requester signs a decryption request
Orchestrator checks owner/delegate status + freshness window (replay defense)
Orchestrator requests fragments in parallel, accepts the first 3 valid fragments
Reconstructs the private key and decrypts
Audit logs record the operation
Reconstructed keys may be cached in memory for 36 hours (availability vs exposure tradeoff)
Design goal: reduce blast radius from insider threats and single-node compromise, and address long-term confidentiality via post-quantum KEM.
I would love feedback on:
TEE trust assumptions and practical hardening for custody nodes
Whether 36h key caching is acceptable, and safer alternatives
Control plane failure modes (partition, reorg) and best practices for “deny by default” behavior
Metadata strategy for large-file workflows (I currently keep filename/size in plaintext metadata)
Better approaches for custody node independence and anti-collusion guarantees