So they use LLM to evaluate LLMs: with LLM writing the questions, another LLM writing the country-specific answers, and yet another LLM getting the country from an answer. The only manual steps seem to be "manually reviewed [questions] to remove repetitions or accidental location references."
This seems like a pretty lazy methodology, as if there are LLM-specific country biases, they could be introduced at any stage of the process.
theamk•1h ago
This seems like a pretty lazy methodology, as if there are LLM-specific country biases, they could be introduced at any stage of the process.