Show HN: Cua Driver – background multi-cursor via macOS SkyLight.framework

2•frabonacci•1h ago

Comments

frabonacci•1h ago

Hi HN, Francesco from Cua here. I hacked this together over a weekend after getting curious about whether macOS could support real background computer-use outside a single vendor's agent product.

The first thing we are using it for is recording product demos. We used to use Screen Studio; now we ask Claude Code + cua-driver to drive the app while cua-driver recording start captures the trajectory, screenshots, actions, and click markers. We canceled our Screen Studio subscription, which started as a joke and then became true.

The problem: most GUI agents still assume the desktop has one shared cursor, one focused app, and one human who is okay being interrupted. That makes local desktop agents awkward. The agent can do the task, but it steals your screen while doing it.

cua-driver is our attempt to make background computer-use a commodity primitive for macOS: let an agent drive a real Mac app while your cursor, focus, and Space stay where they are. The default interface is a CLI, so it is easy to script, easy for coding agents to call from a shell, and still compatible with MCP clients when you want that.

You can try it on macOS 14+:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...)" CLI example:

cua-driver serve &

cua-driver recording start ~/cua-trajectories/demo1

cua-driver launch_app '{"bundle_id":"com.apple.calculator"}'

cua-driver list_windows '{"pid":12345}'

cua-driver get_window_state '{"pid":12345,"window_id":67890}'

cua-driver click '{"pid":12345,"window_id":67890,"element_index":14}'

cua-driver recording stop

The recording command writes turn-NNNNN/ folders with the post-action app state, screenshot, action JSON, and a click.png marker overlay for click-family actions. You can replay a saved run with cua-driver replay_trajectory '{"dir":"~/cua-trajectories/demo1"}', which is useful for regression captures even when you are not trying to make a polished marketing video.

What made this harder than expected:

- CGEventPost warps the cursor (it goes through the HID stream, same one your physical mouse uses)

- CGEvent.postToPid doesn't warp the cursor but Chromium silently drops the event at the renderer IPC boundary

- Activating the target first raises the window AND drags you across Spaces on multi-monitor setups

- Electron apps stop keeping useful AX trees alive when their windows are occluded, unless you register the observer through a private remote-aware SPI

The unlock was a private Apple framework called SkyLight. SLEventPostToPid is a sibling of the public per-pid call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai's focus-without-raise pattern (two SLPSPostEventRecordTo calls, deliberately skip SLPSSetFrontProcessWithOptions) plus an off-screen primer click at (-1, -1) to tick Chromium's user-activation gate, and the click lands without the window ever raising.

The thing we learned while building it: the primary addressing mode should not be pixels. cua-driver exposes ax, vision, and som (set-of-marks) modes, but element-indexed AX actions are the happy path. Pixels are the fallback for canvas/WebGL/video surfaces. That makes agents much less brittle because they can click "the Send button" instead of guessing coordinates, while still having a screenshot when the AX tree is ambiguous.

Other things we have used it for:

- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost

- A personal-assistant style flow that sends a Messages reply without switching Spaces

- Pulling visual context from Chrome/Figma/Preview/YouTube windows I am not looking at

Long technical writeup: https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...

I would especially like feedback from people building Mac automation, agent harnesses, MCP clients, or accessibility tooling. If you try it and it breaks on an app you care about, that is useful data.

Tensorlake is now an official Harbor environment runtime

AI Field Notes on the DGX Spark

The Art of Crossword Creation

Ask HN: I wanna hear about your experience with Claude Code and Codex

Full Stack Open: Deep Dive into Modern Web Development

Zodiac Killer may be tied to Black Dahlia case after 'code cracked,' DNA taken

PIX – Share Images Without the Cloud

Retrieval-Augmented Generation Is an Engineering Problem, Not a Model Problem

Variant – Endless designs for your ideas, just scroll

Intel shutters open-source evangelism program, archives key community projects

A Catechism for Robots

Show HN: Porting Open3D to Python without writing a LoC

Tesla (TSLA) discloses $2B AI hardware company acquisition buried

AI models, power, politics, and performance

A deep dive into the wild world of GitHub Actions' tagging formats

Relatives of dead or missing scientists grapple with impact of wild speculation

How do you handle context compression cloud workflows?

'Scattered Spider' Member 'Tylerb' Pleads Guilty

JackDanger/gzippy ·The fastest gzip on any hardware

Redesigning the Recurse Center application to inspire curious programmers

Which one is more important: more parameters or more computation? (2021)

Show HN: Claude proxy to record interactions-browse, search sessions, usage, MCP

Oral Argument Preview: Chatrie vs. United States

I built PixelGuard – a privacy tool to blur faces in videos

Why BookScan Is Different from Book Sales (Different from Royalty Statements)

AI Progress doesn't feel as fast as we're told

Ask HN: Is code quality and design systems the new SWE?

Tiny 1000bhp 13Kg YASA Motor Cuts 200kg from EVs [video]

Tenth Circuit Broadens CFAA 'Loss' Beyond Technological Harm–Moxie vs. Nielsen

Rust-coreutils – Program Security Assesment [pdf]