Attempts to impose always small PRs might get me to argue that optimal change sizes are normally distributed.
If you look at how simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) is done, while the average size of jumps shrinks in the wikipedia annealing animation, there's always _some_ probability of large jumps in the optimized metropolis-hasting process as the jumps are still normally distributed just with shrinking variance over time.
"[I split a large PR into multiple] But also, I could not have developed the quotas feature in real life in that artificial order. The grants structure evolved as my understanding of pricing and quota enforcement evolved. The original quota semantics sucked, so I rewound back to the data structures, which affected how the pricing got imported, which changed how the quotas were stored. The code reviewers didn't have to worry about that but I did."
This is also one way LLMs are fundamentally different from prior language models which worked by searching over parse trees top-down or bottom up trying to fit independently evolved pieces. LLMs lay everything out in a large matrix of randomized weights and try to slide everything into place jointly.
This means organizing all the pieces well into a single context window unlocks a special AI power: to efficiently jointly converge these pieces to fit better with each other (like a smart human having loaded up on context would do). Splitting the work into multiple PRs or contexts might stymie this powerful aspect of AI.
It is a challenge and somewhat of an art to pack and organize the information in a context window to exploit this type of reasoning LLMs are made for.
BenoitEssiambre•59m ago
If you look at how simulated annealing (https://en.wikipedia.org/wiki/Simulated_annealing) is done, while the average size of jumps shrinks in the wikipedia annealing animation, there's always _some_ probability of large jumps in the optimized metropolis-hasting process as the jumps are still normally distributed just with shrinking variance over time.
"[I split a large PR into multiple] But also, I could not have developed the quotas feature in real life in that artificial order. The grants structure evolved as my understanding of pricing and quota enforcement evolved. The original quota semantics sucked, so I rewound back to the data structures, which affected how the pricing got imported, which changed how the quotas were stored. The code reviewers didn't have to worry about that but I did."
This is also one way LLMs are fundamentally different from prior language models which worked by searching over parse trees top-down or bottom up trying to fit independently evolved pieces. LLMs lay everything out in a large matrix of randomized weights and try to slide everything into place jointly.
This means organizing all the pieces well into a single context window unlocks a special AI power: to efficiently jointly converge these pieces to fit better with each other (like a smart human having loaded up on context would do). Splitting the work into multiple PRs or contexts might stymie this powerful aspect of AI.
It is a challenge and somewhat of an art to pack and organize the information in a context window to exploit this type of reasoning LLMs are made for.