this is great. i think extensions that detect generated music, speech, video, or text will become really important. im curious how light and performant these detection models can get. maybe a single extension could handle multiple media types.
one concern (speaking as someone who doesnt know what these internal pipelines look like) is that suno/udio could tweak their model weights just enough to change the fingerprint, making a detector obsolete with each new release (or even more simple - maybe just apply post processing? id imagine a small reverb could diffuse the content enough to make the fingerprint difficult to detect). that turns it into a cat‑and‑mouse game. if its cheaper for them to mutate models/tweak post processing than for others to train new detectors, they could spin up a new fingerprint every day.
nonhaver•21h ago
one concern (speaking as someone who doesnt know what these internal pipelines look like) is that suno/udio could tweak their model weights just enough to change the fingerprint, making a detector obsolete with each new release (or even more simple - maybe just apply post processing? id imagine a small reverb could diffuse the content enough to make the fingerprint difficult to detect). that turns it into a cat‑and‑mouse game. if its cheaper for them to mutate models/tweak post processing than for others to train new detectors, they could spin up a new fingerprint every day.