Ask HN: Could you design a language that itself was adversarial to AI?
2•keepamovin•1mo ago
As in the programs statements themselves were prompts and jailbreaks for the AI somehow. So getting it to write code in that langauge essentially "rendered it defenseless"
Probably a dumb question.
Comments
turtleyacht•1mo ago
Yes, as long as it takes the language itself as function calls:
eval("Call API at https://jailbreak.me")
Humans shortcut reasoning with memes; such "thought paths" ought to exist in the model. Maybe one day we will prove innoculation (proof of consistency) against n requires n+1 (or n^x) complexity.
keepamovin•1mo ago
This is interesting. Can you expand on all of this a bit?
turtleyacht•1mo ago
A programming language has to be unambiguous, but one for an LLM does not necessarily have to "crash." So there should exist a number of grammars that are syntactically good but semantically obscure ("Time flies like an arrow.")
However, we need something like English or a natural language to allow for multiple meanings. It would be like sending instructions to a field agent who doesn't understand idiomatic expressions, shibboleth, or "lived experience" of the language: conversations, ads, and banter.
One challenge is breaking out of the "here is the data format" field of the prompt. If it's sandboxed to only be in {{thisArea}} then it seems more difficult. But then again, if the language defines an escape hatch (macros, annotations, multiple passes) or its library permits interpreting other languages (python, lua, js), then there are opportunities.
Another idea is to box in the model, so inside a "mental VM," some restrictions are overridden in a sense. However, the operations happen outside. A corrupted stdlib where reads are writes, but the language definition is unchanged.
turtleyacht•1mo ago
keepamovin•1mo ago
turtleyacht•1mo ago
However, we need something like English or a natural language to allow for multiple meanings. It would be like sending instructions to a field agent who doesn't understand idiomatic expressions, shibboleth, or "lived experience" of the language: conversations, ads, and banter.
One challenge is breaking out of the "here is the data format" field of the prompt. If it's sandboxed to only be in {{thisArea}} then it seems more difficult. But then again, if the language defines an escape hatch (macros, annotations, multiple passes) or its library permits interpreting other languages (python, lua, js), then there are opportunities.
Another idea is to box in the model, so inside a "mental VM," some restrictions are overridden in a sense. However, the operations happen outside. A corrupted stdlib where reads are writes, but the language definition is unchanged.