frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: AgentCommander - workflow engine for evolutionary code optimization

https://github.com/mx-Liu123/AgentCommander
2•mx-Liu123•2w ago

Comments

mx-Liu123•2w ago
I built AgentCommander to automate the manual "trial-and-error" loops in my PhD Physics/ML research.

While tools like OpenEvolve (population evolution) and RD-Agent (Kaggle-style automation) exist, I found them difficult to customize for specific, multi-step research workflows. I needed a system that allowed granular control over the agent's decision process—specifically, how it learns from errors and inherits code states.

AgentCommander solves this by providing:

Visual Graph Execution: Workflows are defined as directed graphs, allowing for complex loops, conditional branches, and human-in-the-loop checkpoints.

Evolutionary Tree Tracking: It treats every iteration as a node in a tree. The agent automatically branches off the current "global optimum" rather than a linear history, preventing regression.

Snapshot Integrity: To prevent LLM hallucination or "cheating" (e.g., modifying test cases), the system uses filesystem snapshots to enforce strict read-only permissions on evaluation logic.

Native CLI Wrapper: Built on top of Gemini/Qwen CLI to leverage their native tool-use capabilities while enforcing a sandboxed working directory.

The project is open source (Apache 2.0) and written in Python.

Repo: https://github.com/mx-Liu123/AgentCommander

mx-Liu123•2w ago
Author's Note:

A few technical details for those looking to try AgentCommander:

Why Gemini/Qwen CLI?: I chose these as backends because they offer robust directory isolation. I tried integrating Claude Code, but found it difficult to restrict its file-system reach. Qwen CLI is a great alternative if you want an OpenAI-compatible API with a generous free tier (2,000 requests/day).

Environment: Ensure you have Python 3.10+ and the latest Node.js for the Gemini CLI. If you see Node version warnings, please upgrade to the latest LTS to avoid CLI instability.

Verification: You can audit the agent's "thought process" by running gemini -r inside any generated experiment directory. It’s crucial for verifying that the agent isn't hallucinating its research logic.

I'm currently in Singapore (SGT). I'll stay online for as long as I can to discuss architecture or implementation details, but I'll catch up on all pending questions first thing in the morning!

Repo: https://github.com/mx-Liu123/AgentCommander

CCBot – Control Claude Code from Telegram via Tmux

https://github.com/six-ddc/ccbot
1•sixddc•1m ago•1 comments

Ask HN: Is the CoCo 3 the best 8 bit computer ever made?

1•amichail•3m ago•0 comments

Show HN: Convert your articles into videos in one click

https://vidinie.com/
1•kositheastro•6m ago•0 comments

Red Queen's Race

https://en.wikipedia.org/wiki/Red_Queen%27s_race
2•rzk•6m ago•0 comments

The Anthropic Hive Mind

https://steve-yegge.medium.com/the-anthropic-hive-mind-d01f768f3d7b
2•gozzoo•9m ago•0 comments

A Horrible Conclusion

https://addisoncrump.info/research/a-horrible-conclusion/
1•todsacerdoti•9m ago•0 comments

I spent $10k to automate my research at OpenAI with Codex

https://twitter.com/KarelDoostrlnck/status/2019477361557926281
2•tosh•10m ago•0 comments

From Zero to Hero: A Spring Boot Deep Dive

https://jcob-sikorski.github.io/me/
1•jjcob_sikorski•11m ago•0 comments

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

https://zenodo.org/records/18395618
1•alemonti06•16m ago•1 comments

Cook New Emojis

https://emoji.supply/kitchen/
1•vasanthv•18m ago•0 comments

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

https://mcp-tool-shop-org.github.io/LoKey-Typer/
1•mikeyfrilot•21m ago•0 comments

Long-Sought Proof Tames Some of Math's Unruliest Equations

https://www.quantamagazine.org/long-sought-proof-tames-some-of-maths-unruliest-equations-20260206/
1•asplake•22m ago•0 comments

Hacking the last Z80 computer – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/FEHLHY-hacking_the_last_z80_computer_ever_made/
1•michalpleban•22m ago•0 comments

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

https://github.com/webllm/browser-use
1•unadlib•23m ago•0 comments

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

https://www.nytimes.com/2026/02/07/magazine/michael-pollan-interview.html
2•mitchbob•24m ago•1 comments

Software Engineering Is Back

https://blog.alaindichiappari.dev/p/software-engineering-is-back
2•alainrk•24m ago•0 comments

Storyship: Turn Screen Recordings into Professional Demos

https://storyship.app/
1•JohnsonZou6523•25m ago•0 comments

Reputation Scores for GitHub Accounts

https://shkspr.mobi/blog/2026/02/reputation-scores-for-github-accounts/
2•edent•28m ago•0 comments

A BSOD for All Seasons – Send Bad News via a Kernel Panic

https://bsod-fas.pages.dev/
1•keepamovin•32m ago•0 comments

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

https://orcha.nl
1•buildingwdavid•32m ago•0 comments

Omarchy First Impressions

https://brianlovin.com/writing/omarchy-first-impressions-CEEstJk
2•tosh•37m ago•1 comments

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501
5•onurkanbkrc•38m ago•0 comments

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

https://github.com/Concode0/Versor
1•concode0•39m ago•1 comments

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

https://medresearch-ai.org/hypotheses-hub/
1•panossk•42m ago•0 comments

Big Tech vs. OpenClaw

https://www.jakequist.com/thoughts/big-tech-vs-openclaw/
1•headalgorithm•44m ago•0 comments

Anofox Forecast

https://anofox.com/docs/forecast/
1•marklit•44m ago•0 comments

Ask HN: How do you figure out where data lives across 100 microservices?

1•doodledood•44m ago•0 comments

Motus: A Unified Latent Action World Model

https://arxiv.org/abs/2512.13030
2•mnming•45m ago•0 comments

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

https://www.thedailybeast.com/obsessed/rotten-tomatoes-desperately-claims-impossible-rating-for-m...
4•juujian•47m ago•2 comments

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

https://www.science.org/doi/10.1126/scisignal.adv0660
1•thunderbong•48m ago•0 comments