The API is deliberately minimal. You provide what to optimize and how to measure it:
import gepa.optimize_anything as oa
def evaluate(candidate: str) -> tuple[float, dict]: result = run_my_system(candidate) return result.score, {"error": result.stderr, "runtime": f"{result.time_ms}ms"}
result = oa.optimize_anything( seed_candidate="<your artifact>", evaluator=evaluate, )
The evaluator returns a score plus diagnostic feedback (we call it "Actionable Side Information" — stack traces, rendered images, profiler output, whatever helps diagnose failures). An LLM proposer reads this feedback during a reflection step and proposes targeted fixes, not blind mutations. Candidates are selected via a Pareto frontier across metrics/examples, so a candidate that's best at one thing survives even if its average is mediocre.
Two ideas distinguish this from AlphaEvolve/OpenEvolve/ShinkaEvolve-style LLM evolution: (1) diagnostic feedback is a first-class API concept rather than a framework-specific mechanism, and (2) the API unifies three optimization modes — single-task search (solve one hard problem), multi-task search (solve related problems with cross-transfer), and generalization (build artifacts that transfer to unseen inputs). Prior frameworks only express mode 1.
We tested across 8 domains. Selected results:
Coding agent skills: Learned repo-specific skills push Claude Code to near-perfect task completion and make it 47% faster Cloud scheduling: Discovered algorithms that cut costs 40%, topping the ADRS leaderboard over expert heuristics and other LLM-evolution frameworks Agent architecture: Evolved a 10-line stub into a 300+ line ARC-AGI agent, improving Gemini Flash from 32.5% → 89.5% Circle packing (n=26): Outperforms AlphaEvolve's published solution Blackbox optimization: Generated problem-specific solvers matching or exceeding Optuna across 56 EvalSet problems CUDA kernels: 87% match or beat baseline; multi-task mode outperforms dedicated single-task runs
``` pip install gepa ```
Blog with full results and runnable code for all 8 case studies: https://gepa-ai.github.io/gepa/blog/2026/02/18/introducing-o...
GitHub: https://github.com/gepa-ai/gepa