this is great. i think extensions that detect generated music, speech, video, or text will become really important. im curious how light and performant these detection models can get. maybe a single extension could handle multiple media types.
one concern (speaking as someone who doesnt know what these internal pipelines look like) is that suno/udio could tweak their model weights just enough to change the fingerprint, making a detector obsolete with each new release (or even more simple - maybe just apply post processing? id imagine a small reverb could diffuse the content enough to make the fingerprint difficult to detect). that turns it into a cat‑and‑mouse game. if its cheaper for them to mutate models/tweak post processing than for others to train new detectors, they could spin up a new fingerprint every day.
qosmo•6mo ago
What kind of tweak has enough of an impact is still an open question. According to the paper it does generalize a bit between different models, but at least different architectures require retraining for coverage.
nonhaver•6mo ago
one concern (speaking as someone who doesnt know what these internal pipelines look like) is that suno/udio could tweak their model weights just enough to change the fingerprint, making a detector obsolete with each new release (or even more simple - maybe just apply post processing? id imagine a small reverb could diffuse the content enough to make the fingerprint difficult to detect). that turns it into a cat‑and‑mouse game. if its cheaper for them to mutate models/tweak post processing than for others to train new detectors, they could spin up a new fingerprint every day.
qosmo•6mo ago