even frontier models (...) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely
Wow, 25% corrupted seems like a lot. The abstract and the intro of this paper emphasizes "documents" and it's Microsoft, so I assumed Word docs, but that's not true, they used a wide variety of things, graphs, text files, possibly images, or some machine readable description of textile weaving. A proof reader might not catch 25% corrupted textile description file, or 25% corruption in a graph.
Is this "corruption" what in text files we've all been taught to call "hallucinations"?
jqpabc123•13m ago
Our analysis shows that current LLMs are unreliable delegates:
Who knew that a tool that relies on probability could make such a mess?
bediger4000•1h ago
Wow, 25% corrupted seems like a lot. The abstract and the intro of this paper emphasizes "documents" and it's Microsoft, so I assumed Word docs, but that's not true, they used a wide variety of things, graphs, text files, possibly images, or some machine readable description of textile weaving. A proof reader might not catch 25% corrupted textile description file, or 25% corruption in a graph.
Is this "corruption" what in text files we've all been taught to call "hallucinations"?