Seems a clever technique for anything that needs strong defense against hallucination. Kind of an “average across runs”. Manually auditing results isn’t very scalable (cf. the author says they missed that the LLM caught the second half of the bug in some runs but they missed that detail). In future an LLM could do that bit too so the technique becomes scalable. One can imagine being given a meta-report of what’s in all the reports produced by the runs.
sebstefan•8mo ago
> I'll be convinced when LLMs start making valuable pull requests, non-obvious corner cases or non-trivial bugs in mature FOSS projects
https://pivot-to-ai.com/2025/05/13/if-ai-is-so-good-at-codin...
uskasagh•8mo ago
Based on the linked article I think Gerard would balk at this post, consider the content the exact kind of contribution that people would hate to deal with , and the headline “ceo weasel wording”.
sebstefan•8mo ago
> sebstefan: There’s this one from yesterday
>https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-...
> David Gerard: > My experiment harness executes this N times (N=100 for this particular experiement) and saves the results
> That’s just fuzzing but vastly less efficient?
> Also, that’s not the question being asked, is it. It wasn’t “did someone use an AI for anything in open source.”
I'll try to explain it to him but the guy seems pretty full of shit already