Super fun idea though, I love the concept. But I’m getting the chills imagining the havoc this could cause
Yes, and the one thing that was asked about was "deterministic" not "stable to small perturbations in the input.
Ah, dang it! I was about to deploy this to my clients... /s
Otherwise, interesting concept. Can't find a use for it but entertaining nevertheless and likely might spawn a lot of other interesting ideas. Good job!
The utility lies in having the proper framework for a fitness function (how to choose if the generated code is healthy or needs iterations). I used whether it threw any interpretation-time errors, run-time errors, and whether it passed all of the unit tests as a fitness function.
That said, I think programming will largely evolve into the senior programmer defining a strategy and LLM agents or an intern/junior dev implementing the tactics.
That's basically what goog wants alphaevolve to be. Basically have domain experts give out tasks that "search a space of ideas" and come up with either novel things, improved algorithms or limits / constraints on the problem space. They say that they imagine a world where you "give it some tasks", come back later, and check on what it has produced.
As long as you can have a definition of a broad idea and some quantifiable way to sort results, this might work.
from autogenlib.games import doom
doom(resolution=480, use_keyboard=True, use_mouse=True)
I love it
As a joke, that doesn't feel quite so far-fetched these days. (https://xkcd.com/353/)
- Each time you import a module, the LLM generates fresh code
- You get more varied and often funnier results due to LLM hallucinations
- The same import might produce different implementations across runs
Never is a long time...
If you have a task that is easily benchmarkable (i.e. matrix multiplication or algorithm speedup) you can totally "trust" that a system can non-deterministically work the problem until the results are "better" (speed, memory, etc).
But there's progress on many fronts on this. There's been increased interest in provers (natural language to lean for example). There's also been progress in LLM-as-a-judge on open-ish problems. And it seems that RL can help with extracting step rewards from sparse rewards domains.
It’s like the strong form of self-modifying code.
>>> from stackoverflow import quick_sort
>>> print(quick_sort.sort([1, 3, 2, 5, 4]))
[1, 2, 3, 4, 5]
thornewolf•5h ago
Noumenon72•4h ago