Was doing a deep dive on structured generation and basically in json mode, since structure is fixed, most tokens like keys of json and grammer tokens don't even need to be predicted via LLM. Openai would still charge for these tokens basis their pricing?
https://lmsys.org/blog/2024-02-05-compressed-fsm/?ref=aidancooper.co.uk
stephenlf•14h ago
The model described in your paper still uses some amount of inference to generate JSON keys. Plus, each JSON key becomes part of the expanding context window. These keys aren’t free to generate.