Has anyone done anything similar on their own machine? I am interested to hear other thoughts.
General overview of the current workflow (without a lot of the finer details):
Call Gemini API to generate Python code for specific function/problem ->
Run AI-generated code in Docker container ->
Take any runtime/compilation errors and feed back to Gemini ->
Run hardcoded tests for functionality and send results back to Gemini ->
Repeat step 1 until max iterations are met (or testing passes)
I had a few general questions:
1) What patterns, antipatterns, or architectures do people find best for these workflows?
2) Is using Docker considered the easiest and safest way of quarantining AI-generated code?
3) I have read a couple posts online about “compressing” the context of previous changes through quick summaries. We can also use vectors for traversing documentation to speed up context retrieval for the AI. Does anyone have any general advice here on what works and what doesn’t?
4) Should I have an external Agent that reviews the errors of the previous iterations to see if Gemini is falling into a loop? Sometimes when I use LLMs for coding, I notice that they fall into loops (they will just alternate between two buggy solutions). Should we just break the iterations in this case and rely on human intervention?
I am also trying to utilize AutoML packages (like FLAML) with this workflow. I am implementing this workflow to perform “automatic” data analysis on datasets to get better predictions. Obviously, I understand that this will not perform as well as a professional data scientist. However, has anyone done anything similar and seen some positive results?
capestart•1h ago