* If you need an LLM to parse OR emit a timestamp, use:
RFC 3339 ( e.g. 2024-03-26 10:30:00-05:00 )
* python date format also works well* Do NOT use unix epoch or javascript date formats.
* Smaller models and non-reasoning models still make a LOT of mistakes in time parsing / formatting.
---
There are lots of temporal reasoning benchmarks (like TimeBench, TRAM, etc.) but they test whether models understand time concepts. Nothing on which datetime output format models get right most often. So we just built the benchmark ourselves.
We tested 22 models across Google, Anthropic, OpenAI, Qwen, and GLM on 235 scenarios and 7 different formats.
The two that surprised us the most were JavaScript Date and unix epoch. JavaScript Date is probably the most commonly used format and it's wrong ~1 in 4 times on parsing. Unix epoch drops to 40% on arithmetic tasks. If you need epoch, just have the model output a string and convert it yourself in code.
ishita159•43m ago
why do you think sonnet is better than opus on this?