Your LLM, however, may experience cross-format feature superposition and consequential spurious activation.
That said: I like the idea!
Which doesn't address the question: do LLMs understand TOON the same as they would JSON? It's quite likely that this notation is not interpreted the same by most LLM, as they would JSON. So benchmarks on, say, data processing tasks, would be warranted.
[0] https://github.com/johannschopplich/toon?tab=readme-ov-file#...
1. Retrieval Accuracy - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
2. Performance by dataset - https://github.com/johannschopplich/toon?tab=readme-ov-file#...
I guess its about LLMs so the idea is has to be plaintext? But if you can train it on TOON can't you train it on BSON?
Supporting this use case doesn’t require perfectly marshaling every data structure ever.
But to your point the tool could have wider use cases without the limitations.
Let's take the example:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
We can keep it JSON, but use more compact list expressions, as tuples when pragmatic: ["users",
[1, "Alice", "admin"],
[2, "Bob", "user"]
]
The thing is the game with LLMs is not what's shortest, but what's:1. Mainstream, so they understand it.
2. What they're tuned for, and their tuned for what's mainstream (JSON).
If you want to go extreme compression you can shove it all in JSON strings too and keep the larger structure JSON:
["users",
"1:admin:Alice",
"2:user:Bob",
]
You may say "how is this better". Well it's better because it's still JSON, there's less to explain to the LLM, and to your other devs. Even if we use a weird compact format like "id:role:name" this is still shorter to explain than a completely different syntax with its whole world of rules.
anonymoushn•22h ago