No discussion on problem difficulty, or on result quality besides "the Edgee run generated slightly more output tokens than the baseline".
sachamorard•1d ago
More info in the GitHub repo, in the reports folder (sorry, I'm not sure I can add the link here without being flagged).
"Codex + Edgee consumes roughly half the fresh tokens of the normal Codex baseline. Output tokens are marginally higher (+3,312, +19.5%), suggesting the Edgee scenario produces slightly more verbose responses but dramatically reduces context ingestion."
kokakiwi•1d ago
I think the problem being given to Codex for the benchmark is the one in the attached video, where two Codex run side-by-side, working a "standard" dev thingy
gilles_oponono•1d ago
what part do you compress more specifically ?
kokakiwi•1d ago
For coding agents, mainly the tools' output, they're often the heaviest "messages" sent by the user and also the most "noisy" (like for "cargo test", Codex don't really care about all the build part, only the test results)
MallocVoidstar•1d ago
sachamorard•1d ago
"Codex + Edgee consumes roughly half the fresh tokens of the normal Codex baseline. Output tokens are marginally higher (+3,312, +19.5%), suggesting the Edgee scenario produces slightly more verbose responses but dramatically reduces context ingestion."
kokakiwi•1d ago