Underlying paper is from 2022 and should be indicated in the title.
jey•22m ago
This seems to be incorporated into current LLM generations already -- when code execution is enabled both GPT-5.x and Claude 4.x automatically seem to execute Python code to help with reasoning steps.
logicprog•5m ago
Yeah, this is honestly one of the coolest developments of new models.
eric-burel•6m ago
I call that self-destructive prompting in the sense that you use AI to output programs that replace calling the AI in the future. The paper seems to indicate that this also brings much better results. However it's subject to attacks as running generated code is usually unsafe. A sandbox has to be used, major agentic AI players are providing some solutions, like Langchain sandbox released earlier this year.
jhart99•24m ago