In real ML work, you don’t just generate code and move on. You explore data, train models, evaluate results, adjust assumptions, rerun experiments, compare metrics, generate artifacts, and iterate; often over hours or days.
Most modern coding agents already go beyond single prompts. They can plan steps, write files, run commands, and react to errors. Where things still break down is when ML workflows become long-running and feedback-heavy. Training jobs, evaluations, retries, metric comparisons, and partial failures are still treated as ephemeral side effects rather than durable state.
Once a workflow spans hours, multiple experiments, or iterative evaluation, you either babysit the agent or restart large parts of the process. Feedback exists, but it is not something the system can reliably resume from.
NEO tries to model ML work the way it actually happens.
It is an AI agent that executes end-to-end ML workflows, not just code generation. Work is broken into explicit execution steps with state, checkpoints, and intermediate results. Feedback from metrics, evaluations, or failures feeds directly into the next step instead of forcing a full restart. You can pause a run, inspect what happened, tweak assumptions, and resume from where it left off.
Here's an example as well for your reference: You might ask NEO to explore a dataset, train a few baseline models, compare their performance, and generate plots and a short report. NEO will load the data, run EDA, train models, evaluate them, notice if something underperforms or fails, adjust, and continue. If training takes an hour and one model crashes at 45 minutes, you do not start over. Neo inspects the failure, fixes it, and resumes.
Docs for the extension: https://docs.heyneo.so/#/vscode
Happy to answer questions about Neo.