> California lawmakers are again considering A.B. 412, a bill that would require AI developers to identify and disclose copyrighted works used to train generative AI systems.
> The problem this year is the same as last year: it’s practically impossible to comply with this law. The bill demands information that often does not exist, and cannot realistically be obtained.
> Its definition of “developer” extends to anyone who makes a generative AI model available to Californians.
I get that this would burden up-and-coming companies that want to train new models, but in general I don't think it's a bad thing that a company needs to know where the material they train their model comes from, and know its copyright status, and if it's actually an impossible problem then maybe the whole system is unworkable. Assuming that model training isn't fundamentally considered fair use, how else can you approach this problem?
ElevenLathe•1m ago
It's wild how software BoM is taking off at the same time that LLM BoM is being declared literally impossible. IMO the threat model is roughly the same: if you can't account for the provenance of all the text in your training set, how can say that it hasn't been poisoned?
yladiz•8m ago
> The problem this year is the same as last year: it’s practically impossible to comply with this law. The bill demands information that often does not exist, and cannot realistically be obtained.
> Its definition of “developer” extends to anyone who makes a generative AI model available to Californians.
I get that this would burden up-and-coming companies that want to train new models, but in general I don't think it's a bad thing that a company needs to know where the material they train their model comes from, and know its copyright status, and if it's actually an impossible problem then maybe the whole system is unworkable. Assuming that model training isn't fundamentally considered fair use, how else can you approach this problem?
ElevenLathe•1m ago