A fast Trusted Execution Environment protocol utilizing the H100 confidential mode. Prompts are decrypted and processed in the GPU enclave. Key innovation is that it can be really fast especially on ≥10B parameter models, with latency overhead less than 1%.
Like in CPU confidential cloud computing, that opens up a channel for communication with cloud GenAI models that even the provider cannot intercept.
Wonder whether something like that could boost the trust for all the AI neoclouds out there.
wolecki•4h ago