Super fun idea though, I love the concept. But I’m getting the chills imagining the havoc this could cause
Yes, and the one thing that was asked about was "deterministic" not "stable to small perturbations in the input.
Sure, that would work.
> How would changing the hardware/other software make a difference to what comes out of the model?
Floating point arithmetic is not entirely consistent between different GPUs/TPUs/operating systems.
I've actually done this, setting aside a virtual machine specifically for the purpose, trying to move a step towards a full-blown AI agent.
I think there was another, later retrospective? Can't find it now.
Like self-driving cars and human drivers, there will be a point in the future when LLM-generated code is less buggy than human-generated code.
Ah, dang it! I was about to deploy this to my clients... /s
Otherwise, interesting concept. Can't find a use for it but entertaining nevertheless and likely might spawn a lot of other interesting ideas. Good job!
The utility lies in having the proper framework for a fitness function (how to choose if the generated code is healthy or needs iterations). I used whether it threw any interpretation-time errors, run-time errors, and whether it passed all of the unit tests as a fitness function.
That said, I think programming will largely evolve into the senior programmer defining a strategy and LLM agents or an intern/junior dev implementing the tactics.
That's basically what goog wants alphaevolve to be. Basically have domain experts give out tasks that "search a space of ideas" and come up with either novel things, improved algorithms or limits / constraints on the problem space. They say that they imagine a world where you "give it some tasks", come back later, and check on what it has produced.
As long as you can have a definition of a broad idea and some quantifiable way to sort results, this might work.
Exactly. As always the challenge is (1) deciding what the computer should do, (2) telling the computer to do it, and (3) verifying the computer did what you meant. A perfect fitness function is a perfect specification is a perfect program.
from autogenlib.games import doom
doom(resolution=480, use_keyboard=True, use_mouse=True)
I love it
As a joke, that doesn't feel quite so far-fetched these days. (https://xkcd.com/353/)
- Each time you import a module, the LLM generates fresh code
- You get more varied and often funnier results due to LLM hallucinations
- The same import might produce different implementations across runs
Never is a long time...
If you have a task that is easily benchmarkable (i.e. matrix multiplication or algorithm speedup) you can totally "trust" that a system can non-deterministically work the problem until the results are "better" (speed, memory, etc).
But there's progress on many fronts on this. There's been increased interest in provers (natural language to lean for example). There's also been progress in LLM-as-a-judge on open-ish problems. And it seems that RL can help with extracting step rewards from sparse rewards domains.
I'm not saying nothing will change. AIs may be constantly writing their own code for themselves internally in a much more fluid mixed environment, AIs may be writing into AI-specific languages built for their own quirks and preferences that make it harder for humans to follow than when AIs work in relatively human stacks, etc. I'm just saying, the concept of "code" that we could review is definitely going to stick around indefinitely, because the performance gains and reduction in resource usage are always going to be enormous. Even AIs that want to review AI work will want to review the generated and executing code, not the other AIs themselves.
AIs will always be nondeterministic by their nature (because even if you run them in some deterministic mode, you will not be able to predict their exact results anyhow, which is in practice non-determinism), but non-AI code could conceivably actually get better and more deterministic, depending on how AI software engineering ethos develop.
It’s like the strong form of self-modifying code.
Sufficiently advanced technology is indistinguishable from magic.
We're basically headed in that direction.
[1] https://archive.org/details/1958-02_IF/page/4/mode/2up?view=...
>>> from stackoverflow import quick_sort
>>> print(quick_sort.sort([1, 3, 2, 5, 4]))
[1, 2, 3, 4, 5]
"fuck, it's python!" *throws it in the garbage*
https://jfrog.com/blog/leaked-pypi-secret-token-revealed-in-...
One example, arr.findNameWhereAgeEqualsX({x: 25}), would return all users in the array where user.age == 25.
Not based on LLMs, though. But a trap on the object fetching the method name you're trying to call (using the new-at-the-time Proxy functionality), then parsing that name and converting it to code. Deterministic, but based on rules.
Apart from the fun that I got out of it, it's been there doing nothing :D
it will be WASM-containerized in the future, but still
- Import your function.
- Have your AI editor implement tests.
- Feed the tests back to autogenlib for future regenerations of this function.
The web devs tell me that fuckit's versioning scheme is confusing, and that I should use "Semitic Versioning" instead. So starting with fuckit version ה.ג.א, package versions will use Hebrew Numerals.
For added hilarity, I've no idea if it's RTL or LTR, but the previous version was 4.8.1, so I guess this is now 5.3.1. Presumably it's also impossible to have a zero component in a version.
I immediately got this. So true!
experiment showed that independent [human] software developers make the same mistakes
you need at least $ODD_NUMBER > 7
https://leepike.wordpress.com/2009/04/27/n-version-programmi...
The library uses python dirty tricks, in this case using call stack, where the library looks for code from the user, gets the name of the file and reads it.
It reads the calling code to understand the context of the call. Builds a prompt to submit to the LLM. It only uses OpenAI.
It does not have search, yet.
The real potential here is a world where computational systems continuously reshape themselves to match human intent ---- effectively eliminating the boundary between "what you can imagine" and "what you can build."
This says, "trust all code coming from OpenAI".
thornewolf•1mo ago
Noumenon72•1mo ago