I have been using Claude code for the past 6 months. In that time, multiple revisions of each model have come out. I have seen some improvement, especially in regards to sycophancy, with recent iterations.
However, I can't differentiate the outputs of either. To me, sonnet seems just as capable as opus.
Have any of y'all run real life tests? Mine seem to be too random to say either way.
nawi•1h ago