I've had to convince it to do things it should just be able to do but thinks it can't for some reason. Like reading from a file outside of the project directory- it can do it fine, but refuses to unless you convince it that no it actually can.
Also has inserted "\n" instead of newlines on a number of occasions.
I'd argue these behaviors are much more important than being able to use interactive commands.
Speaking of system instructions, Gemini always forgets them or doesn't follow them. And it still puts code comments nearly everywhere, it drives me nuts.
Codex is much better at following system instructions but the CLI is..... very bad.
Claude is better at this.
I’ve noticed the latter with several image generation refusals I could eventually easily talk them out of (usually by mentioning fair use in a copyright/trademark context).
In a world where you have 100 options, trust is of utmost importance. The CLI’s integration with node‑pty and the ability to stream pseudo‑tty output into mini‑terminal viewports is clever, and I’d love to see that layer documented or open‑sourced so other tools can build on it. I see this feature as something you’d use for short‑lived tasks like running a quick script, checking a log, or doing a one‑off database query. For longer editing sessions I’d still use a real terminal multiplexer and editor. If Google can fix the reliability issues and make the API for interactive sessions open, that would be hella good for everyone!
If not, the model is just shooting in the dark and guessing.
Terminal serializer code: https://github.com/google-gemini/gemini-cli/blob/main/packag...
Uses @xterm/headless npm package.
I've gotten Claude to run a nested Claude instance this way. One of the dumbest laughs I've had was by "pranking" main Claude to think that child Claude ran `rm -rf` on the entire repo we were working on. The thing had a virtual panic attack.
It's a choice some teams make, presumably because _they_ see value in it (or at least think they will). The team I'm on has particular practices which I'm sure would not work on other teams, and might cause you to look at them with the same incredulity, but they work for us.
For what it's worth, the prefixes you use as examples do arise from a convention with an actual spec:
To be honest, at this point having Claude Code monitor the output of a `tmux pipe-pane` is probably going to be superior.
nacs•3h ago
It looks like they've added a layer on top of node-pty to allow serializing/streaming of the contents to the terminal within the mini-terminal viewports they're allocating for the terminal rendering. I wonder if they're releasing that portion as open source?