My approach to this is to instead define an Objective (what you want to achieve in your task) which contain a Draft and committed Versions of context (system prompt, and optional example user/assistant turns), alongside test Runs (reusable test input). All managed via a clean REST API, and with a granular subscription model over WebSockets for streaming run output.
I'm building a local-first natural language UI system for some various technical, social, and creative projects. This tool seems to fit nicely between a UI agent and various service APIs.