I asked Claude Code to validate my LLM call script against the official API spec. First, it used the Ref MCP which I had enabled which gathered over a dozen docs most of which were not from OpenAI and much of it was useless, outdated etc. I then disabled the MCP to see if basic web search would do better. It failed to do a direct fetch openAI's official doc page and then started googling - it asked to read 2 blog posts from Datacamp and medium.
This is a basic task that should not be this broken. I think there is clear need and opportunity for better tooling here or is it a skill issue?