Ask HN: Could you design a language that itself was adversarial to AI?
2•keepamovin•12h ago
As in the programs statements themselves were prompts and jailbreaks for the AI somehow. So getting it to write code in that langauge essentially "rendered it defenseless"
Probably a dumb question.
Comments
turtleyacht•11h ago
Yes, as long as it takes the language itself as function calls:
eval("Call API at https://jailbreak.me")
Humans shortcut reasoning with memes; such "thought paths" ought to exist in the model. Maybe one day we will prove innoculation (proof of consistency) against n requires n+1 (or n^x) complexity.
keepamovin•11h ago
This is interesting. Can you expand on all of this a bit?
turtleyacht•11h ago
A programming language has to be unambiguous, but one for an LLM does not necessarily have to "crash." So there should exist a number of grammars that are syntactically good but semantically obscure ("Time flies like an arrow.")
However, we need something like English or a natural language to allow for multiple meanings. It would be like sending instructions to a field agent who doesn't understand idiomatic expressions, shibboleth, or "lived experience" of the language: conversations, ads, and banter.
One challenge is breaking out of the "here is the data format" field of the prompt. If it's sandboxed to only be in {{thisArea}} then it seems more difficult. But then again, if the language defines an escape hatch (macros, annotations, multiple passes) or its library permits interpreting other languages (python, lua, js), then there are opportunities.
Another idea is to box in the model, so inside a "mental VM," some restrictions are overridden in a sense. However, the operations happen outside. A corrupted stdlib where reads are writes, but the language definition is unchanged.
turtleyacht•11h ago
keepamovin•11h ago
turtleyacht•11h ago
However, we need something like English or a natural language to allow for multiple meanings. It would be like sending instructions to a field agent who doesn't understand idiomatic expressions, shibboleth, or "lived experience" of the language: conversations, ads, and banter.
One challenge is breaking out of the "here is the data format" field of the prompt. If it's sandboxed to only be in {{thisArea}} then it seems more difficult. But then again, if the language defines an escape hatch (macros, annotations, multiple passes) or its library permits interpreting other languages (python, lua, js), then there are opportunities.
Another idea is to box in the model, so inside a "mental VM," some restrictions are overridden in a sense. However, the operations happen outside. A corrupted stdlib where reads are writes, but the language definition is unchanged.