With o3, looking through the agent reasoning steps was extremely useful in improving the prompt and surrounding workflow. If you have confusing instructions, like asking for a malformed JSON response, the reasoning flow will make that very clear, as it expends reasoning tokens to work through issues that aren't central to the task at hand.
Now, all of that is obscured to you, the user. How are you supposed to vet that it's properly following instructions, or reasoning the way you anticipate?
Strategically I understand why OpenAI has done this - but makes for poor user experience for builders.