FTA: “The final authority must sit behind a deterministic, non-bypassable gate. AI must never hold direct permissions for destructive, irreversible actions (deleting a production database, moving funds, pushing to prod). So the last line of defense must always be either human oversight or a deterministic script with no AI workarounds.”
That’s fine in theory, but won’t fly in practice for all destructive, irreversible actions. As an example, how do you prevent a chatbot from generating a highly insulting/racist remark or incorrect or illegal advice that will, later cost you millions?
Human oversight is (deemed) too expensive.
A deterministic script can detect known profanities, but may suffer from a variant of the Scunthorpe problem (https://en.wikipedia.org/wiki/Scunthorpe_problem), and won’t detect unknown profanities or creative ones that don’t use any words that are considered profane. A deterministic script also is very bad at detecting legal issues with responses.
“Don’t reply a chatbot” will work for that, but for many, that doesn’t seem to be an option.
taleodor•49m ago
It's not about that we should drop LLM completely from the mix, but something like AI -> LLM control -> old-school classifier control -> script / human oversight is the way. If something has potential to cause millions in damages, it should be subjected to human oversight (likelihood / impact analysis needs to happen early in the system design).
Someone•1h ago
That’s fine in theory, but won’t fly in practice for all destructive, irreversible actions. As an example, how do you prevent a chatbot from generating a highly insulting/racist remark or incorrect or illegal advice that will, later cost you millions?
Human oversight is (deemed) too expensive.
A deterministic script can detect known profanities, but may suffer from a variant of the Scunthorpe problem (https://en.wikipedia.org/wiki/Scunthorpe_problem), and won’t detect unknown profanities or creative ones that don’t use any words that are considered profane. A deterministic script also is very bad at detecting legal issues with responses.
“Don’t reply a chatbot” will work for that, but for many, that doesn’t seem to be an option.
taleodor•49m ago