Ran seven trials against flat summarization. UF led by 15-18pp on fact recall in every trial. One hit significance (p=0.039), the rest are directional. The interesting finding: flat summaries drop "footnote" facts (cron schedules, webhook paths) because they compete against headline facts for space. Per-cluster summaries don't have that pressure.
Code and trial logs: https://github.com/kimjune01/union-find-compaction