A desktop app for isolated, parallel agentic development

100•mercat•2mo ago

Comments

ttoinou•2mo ago

Something I never see mention is that all agents / CLI tools seem to modify local files. Which makes editing current files, working with the git, asking to only modify some parts of the files etc. cumbersome, as the agent is constantly reading and writing to files that we are also accessing. This is usually solved by using git worktree but this solution requires 1 new folder name and branch for each new agent, and each folder will have its own unique name and others issues.

While it can be super powerful, I wish there was a quicker "in memory" agent solution where each agent keeps in its own RAM the list of files modifications ("patch") it recommends to apply to solve current issue. Then we could apply that patch depending on what we're doing, if we have others patches to apply before etc.

Also even if agents can work in parallel, sometimes we only have 1 of them in front of us and if we already know what's the next thing we're gonna ask, we'll still wait for the previous task to be completed before sending the new prompt. I'm not sure how to improve this async problem, I guess I could launch multiple agents in parallel but I wouldn't get sharing of the chat history between the different agents, and when I work I usually work on related issues that depend on each others, thus I do need some kind of global or shared context between agents analyzing codebases and creating patches.

Anyone has ideas over how to improve those AI coding agents workflows ? Maybe latest versions of GitButler https://gitbutler.com/ but I'm not sure, and it does use git worktree behind the hood

IanCal•2mo ago

This is a good point.

Docker? I typically want other kinds of isolation for services and things anyway - it’s got its own file system, you could have N versions on the same branch working without conflict (at least the conflict you’d have with work trees).

There is some more plumbing involved but…not much?

Edit - task boards are my first thought for the comms side for agents sharing info.

ttoinou•2mo ago

Docker is very heavy and more for Linux, I'm on macOS and Windows for desktop software development and can't put my software to test inside Docker. But yeah I could have sandboxes inside macOS and inside Windows (virtualization, VMs, WSL etc.), I'd still need one main orchestrating agent + GUI to rule them all

netcoyote•2mo ago

Here are a couple of (open-source Apache license) projects I wrote to sandbox on Mac, which I use to run my agents, while still being able to build/run macOS apps:

Limited user account: https://github.com/webcoyote/sandvault

Virtual machine: https://github.com/webcoyote/clodpod

chrisweekly•2mo ago

Not 100% sure it will solve your complaints about Docker, but https://OrbStack.com makes working w/ docker, docker-compose, linux vms, and k8s so much better.

mbreese•2mo ago

I’m pretty sure they were referring to building macOS or Windows desktop programs, which Docker doesn’t help with.

plutonium3345•2mo ago

Yeah, I think as agents become more capable, more isolation will be necessary. Hence, I also agree that either containers or VMs will eventually be required. We can see how tools like Cursor already have a built-in browser so that the agent can "see" (probably as text for now) what component you want to modify. In the future, I believe the workflow for an agent will be something like: [make changes] -> [get user input] -> [take a screenshot] -> [process the screenshot and user input] -> [make changes].

I doubt something like this can be implemented easily in a single environment without running into client and server port issues, etc.

Jhsto•2mo ago

What about using CoW file system snapshots and then mounting it on overlayfs as the lowerdir while having the agent's working directory be the upper directory? I wonder how the agent reacts to finding some files being immutable.

Maxious•2mo ago

Cursor has a "Shadow workspace" option like this https://cursor.com/blog/shadow-workspace

_pdp_•2mo ago

Given that all of these agents are written in javascript I have always wondered why they cannot simply use https://isomorphic-git.org/ and do everything in memory.

adastra22•2mo ago

Why is in-memory a selling point? You want a record on disk.

ttoinou•2mo ago

To choose when to apply and not loose changes made by the agent

adastra22•2mo ago

Yeah, you get that with git. What am I missing?

_pdp_•2mo ago

It is still git - but in memory. Why? Because we can and it is cool ;)

shunia_huang•2mo ago

So true when I'm running multiple agents in one project with multiple terminal windows. For example, with one working on implementing tests and another working on features, the feature agents will complain that the tests are not working and need fixing, while the test agent(s) will report outdated test coverage results due to newly introduced files.

It's annoying and hilarious at the same time.

lucid-dev•2mo ago

You use git worktrees, and then merge-in. Or rebase, or 3-way merge, as necessary.

I have a local application I developed that works extremely well for this. I.e. every thread tied to a repo creates it's own worktree, then makes it edits locally, and then I sync back to main. When conflicts occur, they are either resolved automatically if possible (i.e. another worktree merged into main first, those changes are kept so long as they don't conflict, if conflicted we get the opportunity to resolve, etc.).

At any merge-into-main from a worktree, the "non-touched" files in the worktree are automatically re-synced to main, thus updating the worktree with any other changes from any other worktree that have been already pushed to main.

Of course, multiple branches can also be used and then eventually merged into a single branch later..

---

Also, this is very clearly exactly the same thing OP does in their system, as per the README on their github link..

CGamesPlay•2mo ago

> While it can be super powerful, I wish there was a quicker "in memory" agent solution where each agent keeps in its own RAM the list of files modifications ("patch") it recommends to apply to solve current issue. Then we could apply that patch depending on what we're doing, if we have others patches to apply before etc.

Then the changes can't be tested to even verify that they pass the compiler/linter, much less tested to confirm they actually work. The only way to fix this is to "modify local files", where "local files" is either a separate worktree (that you don't manage or need to know the location of, possibly even on a separate machine); or a hacky, vibe-coded, in-memory VFS; or somewhere in between.

ryandv•2mo ago

> or a hacky, vibe-coded, in-memory VFS

ramfs still exists, you know.

CGamesPlay•2mo ago

When you set out to implement what GP is talking about with ramfs, you're either going to use git worktrees to do it, or reinvent them.

plutonium3345•2mo ago

I personally have had a frustrating experience with GitButler, and would prefer if things don't become too complicated. For example, when you accidentally break GitButler, it can become really difficult to recover all your unpushed progress. It is also hard to find the exact location of where your code changes are stored across different branches.

My suggestion would be to keep things simple, pragmatic, and save development time. While git worktrees are not perfect, and they require extra space, it is easier for people to understand and easily locate this kind of structure while also being able to execute commands in a somewhat isolated environment. I would be happy if the app simply automates this for me and creates a new worktree, branch, and agent at that location as soon as I click 'Add agent'.

Then the only issues become merge conflicts between different branches... this is where extra dev time could be allocated, and implement agents that automatically merge branches.

videlov•2mo ago

(co-founder of gb here) I am really sorry for the frustration - the app should do better and we will do better. In the past few months we have been putting a very deliberate effort to eliminate all conditions from which such poor experience can come about.

The work is not complete but we have stability and correctness as a primary goal, and something that is a requirement for us to declare a v1.0.

ttoinou•2mo ago

Are there automatic tools / command lines to run to try to recover work ? Would easy any problem

videlov•2mo ago

The app has a built-in mechanism for going back in time (an operations log) which can be used for undoing situations that should not arise in the first place. It can be accessed via the app (there's a history tab) as well as via the CLI https://docs.gitbutler.com/commands/but-oplog

NB - the CLI version of GitButler is not yet at feature parity with the graphical version of the app yet

alganet•2mo ago

It is very likely that agent tooling will get better at doing asynchronous things and being aware of the user interacting in parallel, without git, and probably within a single session.

It's feasible, and it makes more sense than separating and merging later (or keeping patches in memory then applying in bulk).

Why do I say it's feasible? We have the technology, right? The IDE knows which file the user has in focus, and can orient agents to use tooling that would inform them of that fact when they're running. Similarly, that same tooling could just spend a little bit of time planning focus to spread to multiple agents in a way they won't overlap.

Maybe big repos, monorepos and so on are a limitation. If we were on the previous "small-to-middle interlinked projects" era, that division would come in naturally. You only really need multiple agents in parallel on a single project if that thing is big enough to have more than one angle to work on. It's a push-and-pull that changes with the times, maybe we're heading to a more granular way of doing things.

undeveloper•2mo ago

gitbutler is neat, but doesn't separate out files -- the changes are still visible to other sessions

videlov•2mo ago

You are right - it is something we did intentionally, but I would like to learn more from your use case - what is the reason to prefer isolation of changes?

Is it the case that you wish to have multiple agents working on the same task and then picking the best implementation? Or do you have a reason to prefer multiple tasks to be implemented in complete isolation from one another?

undeveloper•2mo ago

Hey, I just wanted to say I do like the product you guys made, really love it. ideally I want to be able to have multiple agent sessions running, and it feels to me odd to have different sessions run into each other. ideally, i could have an agent running per stack of branches. additionally, sometimes i'd also like to edit some code while having claude run in the background.

brainless•2mo ago

In my coding agent, nocodo (1), I am thinking about using copy on write filesystems for cheaper multi-agent operations. But to be honest git worktree may be good enough for most use cases. nocodo checks existing worktree in the local repo and I will add creation and merge support too.

1. https://github.com/brainless/nocodo

videlov•2mo ago

(co-founder of GitButler here)

We chose not to use separate git worktrees under the hood for this functionality. Let me try to break down why, maybe there's an opportunity for me to learn more here.

In my head I separate between use cases of 1) "different tasks" and 2) "best of n, same task".

The app that we built already had the ability to separate changes into branches while in the worktree (on disk) it renders the integration of the branches. Our canonical use case back in the days was "A developer works on a feature branch and wishes to commit & publish a bugfix from separate branch". When we learned that people were using this for running multiple parallel agents we added some additional tooling for it.

So in practice what happens when you have multiple agents coding in parallel with GitButler is that the system captures information after an agent completes an edit (via the agent hooks) and uses that to 1) stage the particular edit to a branch dedicated to the agent and 2) perform a commit into that branch (GB can have multiple staging areas, one per applied branch).

The system will not allow multiple agents to edit the same file at the same time (via a locking mechanism in the pre-edit hook), but agents do see each others changes.

In the context of the "different tasks for different agents" use case, we have found that them seeing edits by others to have a positive effect on the outcomes. The first one that comes to mind is - no merge conflicts. But beyond merge conflicts, we have found that there is a lower likelihood of reaching a state where code diverges semantically.

In my own usage, I have found it helpful when I am hands on programming on something and wish to have an agent do some auxiliary task, for us to share a workspace (so that I can nudge it one way or another).

Is there something I am missing here? Of course for best-of-n of the same task this doesn't exactly make sense, but with regards to different tasks, what are some additional reasons to require full isolation? (as different worktrees would provide)

ttoinou•2mo ago

Thanks! That’s an interesting feedback I’ll try GitButler again. You know more than me so I can’t answer

nojs•2mo ago

This looks very useful.

I would love something similar that lets me plug in actual Claude Code/Codex with their original agent loop, prompting etc, and just handles the multiplexing, worktrees, isolation, etc automatically (it looks like this tool doesn’t support that). Because I think a lot of the power of eg CC comes from the engineering they’ve done to the tool rather than the underlying model.

How are people doing this at the moment?

widenrun•2mo ago

I'm using conductor.build and running both Claude Code and Codex. It's taken 95% of my workflow and I'm loving it (I'm not affiliated with them at all, genuinely enjoying it and hoping it succeed)

scottmf•2mo ago

Same here. Codex support is a recent addition however, and it’s not clear if MCP servers and other rules apply to Codex. Also it would be nice to be able to just have a session working on the main branch as concurrent work in worktrees can get messy

CuriouslyC•2mo ago

Amazing to me all these apps duplicating well tuned Github functionality.

Coolin96•2mo ago

I’ve been using (and loving) https://github.com/raine/workmux which brings together tmux, git worktrees, and CLI agents into an opinionated workflow.

jadbox•2mo ago

Oh this looks perfect. This feels like the Linux way to keep tools separated to their primary function. I don't want my MUX tool to do AI stuff, as I have other (better) tools meant for that.

nojs•2mo ago

Thanks for posting this - just set it up and liking it so far

CuriouslyC•2mo ago

Since CC is a terminal app you can spin up a container with your project, dependencies and CC already installed easily via docker, then just ssh in and run claude remotely. Tmux if desired. Save work by doing pull requests.

ttobi•2mo ago

I have build my own tool for this https://github.com/tobias-walle/agency It uses tmux to run the agents and offers some convenience commands that, e.g. lets you merge the changes back into its original branch. I added some simple idle and change detection, so you can see which agent needs your attention. As all agents are just simple cli commands it is very easy to extend the config with your tool of choice.

ammario•2mo ago

Author of Mux here:

I started building it out that way but found it very challenging to create parity between the models. E.g. they have different tools, system prompts, interruption semantics, cost tracking etc. The spirit of the product is decoupling the LLM from the UI, so we went with a custom loop / tools that can perform decently across all models.

asdev•2mo ago

Made the same except you get terminal sessions instead of overriding it with a custom UI: https://github.com/built-by-as/FleetCode

Seattle3503•2mo ago

The issue I run into is with integration tests that rely on docker. If two agents try to run tests at the same time it doesn't work. I need to manually do air traffic control on the agents use of higher level tests suites.

adastra22•2mo ago

I don't understand. Why is docker an issue?

foreigner•2mo ago

Struggling with the same thing right now. I'm trying to make it work by generating unique ports for each instance so they don't conflict. Not quite 100% successful yet.

ammario•2mo ago

Curious to hear more.. why is random port selection not working for your case? The other issue we've seen is machines tend to get overloaded with tons of agents running tests concurrently, hence the SSH remote isolation mode.

foreigner•2mo ago

It's working, just a bit fiddly to setup.

justinmayer•2mo ago

Interesting to see two similar projects with the same "cmux" name, at least until this one was renamed two weeks ago to "mux". Naming is hard.

I have not yet tried either one, but here is the other project for those who want to compare and contrast them:

https://github.com/manaflow-ai/cmux

foreigner•2mo ago

I've been using Catnip for this: https://github.com/wandb/catnip

Very similar features, Catnip is Claude Code specific and does everything in a Docker container so you can more safely run in YOLO mode and the Git worktrees don't make a mess on your host filesystem or checkout. Also is mobile responsive which is cute.

dividedcomet•2mo ago

I made a TUI to do something similar. It’s taking a backseat during parental leave, but it’s a fun project to see n number of agents iterating on the same problem and to see how they differ.

https://github.com/paradise-runner/kaleidoscope

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

KV Cache Transform Coding for Compact Storage in LLM Inference

A quantitative, multimodal wearable bioelectronic device for stress assessment

Why Big Tech Is Throwing Cash into India in Quest for AI Supremacy

How to shoot yourself in the foot – 2026 edition

Eight More Months of Agents

From Human Thought to Machine Coordination

The new X API pricing must be a joke

Show HN: RMA Dashboard fast SAST results for monorepos (SARIF and triage)

Show HN: Source code graphRAG for Java/Kotlin development based on jQAssistant

Python Only Has One Real Competitor

Tmux to Zellij (and Back)

Ask HN: How are you using specialized agents to accelerate your work?

Passing user_id through 6 services? OTel Baggage fixes this

DavMail Pop/IMAP/SMTP/Caldav/Carddav/LDAP Exchange Gateway

Visual data modelling in the browser (open source)

Show HN: Tharos – CLI to find and autofix security bugs using local LLMs

Oddly Simple GUI Programs

The New Playbook for Leaders [pdf]

Interactive Unboxing of J Dilla's Donuts

OneCourt helps blind and low-vision fans to track Super Bowl live

Rudolf Vrba

Autism Incidence in Girls and Boys May Be Nearly Equal, Study Suggests

Wellness Hotels Discovery Application

NASA delays moon rocket launch by a month after fuel leaks during test

Sebastian Galiani on the Marginal Revolution

Ask HN: Are we at the point where software can improve itself?

Binance Gives Trump Family's Crypto Firm a Leg Up

Reverse engineering Chinese 'shit-program' for absolute glory: R/ClaudeCode

Indian Culture

Show HN: Maravel-Framework 10.61 prevents circular dependency

A desktop app for isolated, parallel agentic development

Comments