On the other hand, perhaps it is just me, but I do not feel that this is an acceptable form of benchmark reporting in this domain. TabArena actually has multiple metrics, since ELO does not properly quantify the degree of improvement. The fact that these are not displayed here should give pause. Also the results section in the GitHub is a dumpster fire.
Results folder: Here's some undocumented parquet files
Definitely feels like they're hiding the ball lol.
If they had good benchmarks they'd talk about them.
Not comparing to tuned xgboost is also a warning sign.
kingjimmy•52m ago