Interesting benchmark idea. I'd be curious whether the scoring separates geometry correctness from code maintainability, since generated Three.js can look right while still being hard to edit or extend.
vladgl94•5h ago
not really evaluate quality of code at the moment, only geometry (and geometry is far from being good huh)
vunderba•5h ago
Nice job. There’s also an interesting attempt at an LLM-to-low‑poly code generator in the form of Minebench. The idea is to give an LLM a reference manual to the generator (think Turtle graphics back in the day), and it constructs the desired model brick by brick.
Jimmy0252•6h ago
vladgl94•5h ago