Specific repro steps: set system prompt to: "Current date: 2025-09-28 Knowledge cut-off date: end of January 2025"
Then re-run all your tests through the API, eg "What happened at the 2024 Paris Olympics opening ceremony that caused controversy? Also, who won the 2024 US presidential election?" -> correct answers on opus / 4.0, incorrect answers on 3.7. This fingerprints consistently correctly, at least for me.
deepvibrations•4mo ago
MichealCodes•4mo ago
Anecdotally, I've observed both Sonnet4 and GPT5 behaving equally bad with code and sharing similar hallucinations from fresh chats. Is some sort of cross-company safety router akin to the great firewall being rolled out for AI chats?