Some of the dimensions they store are prosody, intensity, timbre, non-verbal vocalizations, pauses, timing and emotional inflection. In other words, another large layer of information on top of just the prompt text. This data doesn't get translated into text, it goes straight into a speech-to-speech model.
It strikes me that from just a few minutes of such data and the associated semantic content, an AI can assemble a detailed and accurate emotional/psychological dossier of any user, on demand. In the hands of a federal agent it would be a powerful tool to impose their department's will, or their own. And if that were already in place we would have no way to know.
Talking to a machine seems banal already, but the metadata contains an instruction manual on where your buttons are and how to press them.
Just imagine how it'll be now... for decades you'll be fending off some hidden receipts from an IG comment you made.
Hupriene•1h ago