- Narrator
If you had multiple people look at your PRs multiple times on different days results would be very similar.
typically this means there is some ambiguity in the specification, and the model flips between alternative interpretations
For a normal review loops you can ask the model to return with nothing found if nothing is found and not invent things and it will do a better job of exiting without anything found.
It’s not perfect but usually it works pretty well, and I’ve had the model come back to me with oh actually the test passed, the bug doesn’t work exist
As a bonus, you’ve now got a test that can detect that bug if it comes up again.
Like when you do recursive programming, have you tried providing more/better stop conditions? If you literally just say "Continue until there are no more issues" then it'll do just that, but if you scope it better, like "Only mention issues related to X, Y or that leads to Z" and so on, you'll get less noise and more focus on issues that actually matter (to you).
I've had the same experience, but whenever I've reviewed what it finds it's basically right. It's pedantic, and a lot of the problems aren't things I really care about, but they definitely are real problems.
I'm not sure you can blame the AI for always finding problems if a) you asked it to, and b) there are problems to find.
Anyway it will never match your judgemend completely unless you upload your brain dump into model.
(The fixed prices are just temporary discounts)
Tiberium•1h ago
> I upgraded to the Claude Max $200/month plan (I was previously on $100/month) to increase my Fable allowance for the remaining time until the July 7th Fablepocalypse, when even Claude Max subscribers will have to pay full API cost for the model.
I really wonder if Anthropic will stick with their decision to keep Fable on extra usage credits until they "get more compute", especially in the light of GPT 5.6 very likely coming out next week (it's confirmed to have the exact same pricing as GPT 5.5)
andy_ppp•1h ago
embedding-shape•58m ago
Finally have an explanation why GPT 5.5 xhigh felt dumber and dumber these last few weeks, always the same thing when a new model release is about to come out...
toxik•33m ago
user43928•6m ago
Yet the same claim is being posted every single day, including new claims that the Fable 5 model has degraded compared to the initial release, guardrails aside.
embedding-shape•4m ago
Anyways, heard about A/B testing before? ML people tend to like it a lot, hard to imagine neither OpenAI or Anthropic are already deep into categorizing people into buckets and running an wild amount of A/B testing all over the place, especially in the weeks leading up to new model releases, in various ways.