In simplebench gpt-oss (120 bn) flopped hard so it doesn't appear particularly good at logical puzzles either.
So presumably, this comes down to...
- training technique or data
- dimension
- lower number of large experts vs higher number of small experts
Edit: found this analysis, it's on the HN frontpage right now
> this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.
“What is the word that starts with S, ends with E, and contains A? → SAEA”
Then when I said that’s not a word and you gave me the answer already, no fun, it said
“I do not have access to confirm that word.”
"I need to find a word that starts with S, ends with E, and contains A.
Let me think through this step by step:
Starts with S
Ends with E
Contains A somewhere in the middle
One word that fits this pattern is "SAGE" - but that only has 4 letters and would be S-A-G-E, which doesn't work because it doesn't contain A in the middle.Wait, let me reconsider. The word "SAGE" actually works if I read it correctly:
Starts with S
Ends with E
Contains A
Actually, "SAGE" is S-A-G-E, so it starts with S and ends with E, but it contains A in the second position.Let me think of other possibilities:
"SQUARE" - starts with S, ends with E, contains A (in the middle)
"SALE" - starts with S, ends with E, contains A
"SAME" - starts with S, ends with E, contains A
Wait, "SAME" is S-A-M-E - that works! It starts with S, ends with E, and contains A.The word is SAME. "
So I just use qwen3. Fast and great ouput. If for some reason I don't get what I need, I might use search engines or Perplexity.
I have a 10GB 3080 and Ryzen 3600x with 32gb of RAM.
Qwen3-coder is amazing. Best I used so far.
homarp•1h ago