frontpage.

Show HN: ARISE – Agents that create their own tools at runtime when they fail

https://github.com/abekek/arise

3•abekek•2h ago

I built a framework that lets LLM agents create their own tools at runtime. Most agent frameworks assume you'll hand-craft every tool upfront. That works until your agent hits something you didn't plan for. ARISE (Adaptive Runtime Improvement through Self-Evolution) lets agents synthesize their own tools at runtime when they detect gaps

ARISE sits between your agent and its tool library. When the agent keeps failing at a class of tasks, it analyzes what's missing, uses a cheap LLM to synthesize a new Python function, tests it in a sandbox with adversarial edge cases, and if it passes, promotes it. The agent picks it up on the next run. Over time, the agent accumulates tools shaped by the actual tasks it encounters, not just what you imagined at build time.

There's a bunch of research on this idea — VOYAGER did it in Minecraft, LATM (LLMs as Tool Makers) showed LLMs can write reusable tools, CRAFT and CREATOR explored similar directions. But none of them resulted in something you can actually pip install and use with your own agent. That's what I'm trying to build.

For safety, generated code undergoes sandboxed execution, auto-generated tests, and adversarial validation before entering the active library. Everything is versioned with rollback. I don't fully trust it yet for unsupervised production use, but it's getting there.

By default, everything runs locally with SQLite. For deployment, there's a distributed mode where the agent is stateless — it reads skills from a remote store and reports trajectories to a queue. A separate worker process picks those up and runs evolution independently. So you can scale the agent without worrying about evolution blocking your hot path. I tested this end-to-end with real infra and real LLM calls.

Works with any agent that takes a task and returns a result. Native Strands adapter, raw OpenAI/Anthropic function calling works too.

This is very early — just shipped it. There's a lot to improve. Would really appreciate feedback and contributions if this is interesting to you.

Putting my stamp on a lost art: Why I still send postcards

In This Cleveland Newsroom, AI Is Writing (But Not Reporting) the News

Extend or replace – how to evaluate your billing stack at AI scale

Ask HN: How to Learn C++ in 2026?

PulseLog – Python logger that opens a live browser dashboard (263k logs/SEC)

OpenJarvis: Personal AI, on Personal Devices

Show HN: Free OpenAI API Access with ChatGPT Account

The Pentagon Went to War with Anthropic. What’s Really at Stake?

Show HN: iFrame Tester Gator

Show HN: Graft – Your local environment, everywhere

Canada's Bill C-22 Mandates Mass Metadata Surveillance of Canadians

Russia's new elite hit squad was compromised by using Google Translate

DriverExplorer – Windows kernel driver loader and viewer in Rust

I'm Too Lazy to Check Datadog Every Morning, So I Made AI Do It

Turing, Gödel, and Church at Princeton in the 1930s (2012) [video]

Wizaskdo

Show HN: Lux – Drop-in Redis replacement in Rust. 5.6x faster, ~1MB Docker image

LessWrong Policy on LLM Use

It Ought to Be a Pull Door

Show HN: Flutterby, an App for Flutter Developers

Sewage Dump Is Now One of America's Best Bird Sanctuaries [video]

Show HN: PostSupremo – Generate authentically inauthentic LinkedIn content

Show HN: HUMANTODO

State Department Cuts Price of Renouncing U.S. Citizenship to $450

Show HN: What Is Your Face Worth in the Modeling Industry?

Show HN: Whspe – We decomposed TTFB to build a real hosting quality score

Ghost Logits: Simulating missing partition mass in sampled softmax [pdf]

The Toyota 4Runner Trailhunter's Snorkel Isn't Even a Snorkel, So Be Careful

UK Companies House security blunder leaves director data exposed

Demos of 2025 from the Demoscene