Author here. The core idea is pretty simple: train linear probes on the model's internal state before it generates anything to predict if it'll succeed. Then use those predictions to route queries: Send easy ones to cheap inference, hard ones to expensive reasoning.
Two findings that surprised us:
1. The same model has completely different internal representations of "difficulty" depending on decoding settings. What GPT-oss thinks is hard with greedy ≠ what it thinks is hard with sampling.
2. Model difficulty and human difficulty are orthogonal. The problems they struggle with aren't the ones we struggle with, and this gap increases with extended reasoning.
stansApprentice•1h ago
Two findings that surprised us:
1. The same model has completely different internal representations of "difficulty" depending on decoding settings. What GPT-oss thinks is hard with greedy ≠ what it thinks is hard with sampling.
2. Model difficulty and human difficulty are orthogonal. The problems they struggle with aren't the ones we struggle with, and this gap increases with extended reasoning.
Code: https://github.com/KabakaWilliam/llms_know_difficulty Probes: https://huggingface.co/CoffeeGitta/pika-probes
Happy to answer questions.