> subtext-codec is a proof-of-concept codec that hides arbitrary binary data inside seemingly normal LLM-generated text. It steers a language model's next-token choices using the rank of each token in the model's logit distribution. With the same model, tokenizer, prefix, and parameters, the process is fully reversible -- enabling text that reads naturally while secretly encoding bytes.
Basically, use the fact that LLMs learn a deterministic probability distribution over next-token generation to create seemingly innocuous ciphertext that is hard to detect.