I was playing around with the JSON and similar formats. I've questioned myself if it's really an optimized way to present information in and out of LLMs. Did a bit of research and experiments and came up with a solution that's essentially a new format that uses fewer tokens than JSON and TOON. Speaking of the last, it's used primarily for inputting data to the model and not making the model output in a TOON format.
None of those formats fully satisfied my curiosity, so my RAIF thing isn't just lighter on syntax, but also features a self-repair feature as a core principle. LLMs are non-deterministic and sometimes the structured output may be corrupted. jsonrepair fixes the issue, but still I wanted to push it further and make a format built around repairability. Just like a QR code (of course it works a bit differently, but still was an inspiration subject).
Numbers are nice so far. About -14% of tokens on a worst-case scenario benchmark and up to -35% tokens during normal usage. Repairing works as intended also, I'm still gathering the data on repairing cases. The most common ones so far is output truncation. Minor syntax-related errors rarely happen.
It's worth mentioning that RAIF works only as a LoRA, but I liked the results even using the small Qwen2.5-0.5B model. It builds structures noticeably more stable than the base model did on its own. Medium models handle RAIF even better and perfectly switch from JSON to RAIF using LoRAs without any artifacts.
I see RAIF as a useful thing for any self-hosted agent or LLM. Especially subagents that use smaller models.
Still very much an experiment, so any feedback and ideas welcome.
truehazker•1h ago
None of those formats fully satisfied my curiosity, so my RAIF thing isn't just lighter on syntax, but also features a self-repair feature as a core principle. LLMs are non-deterministic and sometimes the structured output may be corrupted. jsonrepair fixes the issue, but still I wanted to push it further and make a format built around repairability. Just like a QR code (of course it works a bit differently, but still was an inspiration subject).
Numbers are nice so far. About -14% of tokens on a worst-case scenario benchmark and up to -35% tokens during normal usage. Repairing works as intended also, I'm still gathering the data on repairing cases. The most common ones so far is output truncation. Minor syntax-related errors rarely happen.
It's worth mentioning that RAIF works only as a LoRA, but I liked the results even using the small Qwen2.5-0.5B model. It builds structures noticeably more stable than the base model did on its own. Medium models handle RAIF even better and perfectly switch from JSON to RAIF using LoRAs without any artifacts.
I see RAIF as a useful thing for any self-hosted agent or LLM. Especially subagents that use smaller models.
Still very much an experiment, so any feedback and ideas welcome.