frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A Long-Tail Professional Forum-Based Benchmark for LLM Evaluation

https://arxiv.org/abs/2511.06346
1•wslh•2mo ago