That said, for `createAliasMap`, don't you think you could create a deterministic mapping from and to UUIDs <-> word chains? That way, no additional state would be needed. [Might require fairly long word chains...]
I have built id-agent to solve the 3 major pain points with using UUIDs with LLMs:
1. Increased token usage 2. Incorrect ids in the output tokens because of hallucination 3. Poor readability when debugging prompts and traces
While building agent-qa, I encountered challenges with ids, since each test, suite, and run artifact has a UUID associated with it. The AI agents often made mistakes and referenced incorrect entities. With id-agent, I was able to achieve a token-efficient way of defining unique ids which can be easily differentiated. The prefixed id path drastically reduces the hallucination rate.
Do check out agent-qa: Open-source Agentic QA Harness with Memory https://vostride.com/
> Where UUIDs cost ~23 tokens and get hallucinated by LLMs
How does this solve the hallucination problem?
Just removing the - from the example UUID takes it from 26 tokens to 18
You can use the .from method https://github.com/vostride/id-agent/#idagentfrominput-opts
To convert uuid or any text to id-agent based id. Then do the LLM inference and then convert it back to UUID.
I sort of get the "problem", but the fact that this is even needed is stupid.
I feel like people just jam poorly specified input into LLMs and hope for the best. Then pile more tools on top when they don’t get what they want.
People call this exact process "vibe coding".
I can see this being useful when feeding raw table dump csvs into models, isomorphism means it's a simple pre-post processing step which could give you a cheap decrease of tokens and increase in accuracy.
I guess you’re another bot
Furthermore, this could be compressed even further with a dynamic legend of every UUID in the context. So UUID@Bravo and UUID@Delta would be the actual symbols in the context but dynamically replaced when calling tools.
1. LLMs might lack intrinsic entropy and reuse some UUIDs much more often.
2. Referential integrity is as important as collision resistance. An LLM must be able to reuse the correct id in the correct place.
On the other hand, using a dictionary for the ids helps with readability, but depending on the models strenghts, it might also add a confounder. After all, tokens that represent real words will probably influence the attention in a different way than random numbers.
nither•46m ago