frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Drive any macOS app in the background without stealing the cursor

https://github.com/trycua/cua
18•frabonacci•6h ago
Hi HN, Francesco from Cua here. I hacked this project together last weekend, inspired by the Codex Computer-Use release and lessons learned from deploying GUI-operating agents for our customers.

The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.

But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.

We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.

Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.

Try it on macOS 14+:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...)"

The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.

Other things we have used it for:

- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.

- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.

- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.

- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.

What made this harder than expected:

- CGEventPost warps the cursor because it goes through the HID stream.

- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.

- Activating the target first raises the window and can drag you across Spaces.

- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.

The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.

One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.

Long technical writeup: https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...

I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.

Comments

LatencyKills•5h ago
Ex-Apple engineer here. I really like your implementation. A few years ago I built a similar tool to help me automate the testing of some of my native macOS apps. Being able to run multiple UI automation tests simultaneously was the big win in my case.

My only criticism is enabling telemetry by default. I'm a fan of having people opt-in.

frabonacci•5h ago
Fair criticism. We took a similar approach to established dev tools like Homebrew, with an anonymous, opt-out telemetry to understand install issues, crashes, and high-level usage. For cua-driver specifically, telemetry is limited to command/tool-level events and basic environment metadata. We don’t send screenshots, recordings, app contents, prompts, typed text, file paths, or tool arguments. That said, we should make the opt-out path clearer
kveykva•1h ago
Would you be open to sharing what you built for running the automation tests? I could really use this right now.
frabonacci•39m ago
We don't have a specific testing framework yet. cua-driver is closer to an automation interface than a test runner. that said, you could definitely build one on top of it. For reference these are some of our integration tests: https://github.com/trycua/cua/tree/main/libs/cua-driver/Test...

One useful trick is to cua-driver 'launch_app' instead of the default 'open' or other osascript, since it can start the app without raising/focusing it, and the tests don't disturb your active desktop while they run

jorvi•1h ago
The problem with opt-in telemetry is that 95% of users don't change defaults, and the 5% who do are your power users. They're not representative of the average user. And only a subset of them will turn it on

Ironically enough the opposite happens with opt-out telemetry, for the same reason: a lot of power users will turn off telemetry, thus you will never see their usage patterns and will have to infer them. Dogfooding helps.

crazygringo•43m ago
I'm confused.

You claim power users opt in to telemetry, and then immediately say power users opt out.

pnw_throwaway•35m ago
The problem with opt-in telemetry is that 95% of users are sick and tired of being spied on with every little thing they do.
davey2wavey•1h ago
Its looking great.

The audit trail question is interesting and I haven't seen it come up much. When an agent clicks through an ERP or edits a file, you've got logs, but how do you explain the "why" behind each decision to, say, a compliance team?

Curious if that's something you're thinking about or if it's too early.

krackers•54m ago
Nice! Thanks for the technical writeup, ~2 weeks from me wondering how it's implemented [1] to being able to play with a replicated version!

[1] https://news.ycombinator.com/item?id=47799128

frabonacci•51m ago
Thanks for starting that thread, I definitely drew some inspiration from it. But ultimately the secret sauce for the background click came from discovering yabai's window_manager_focus_window_without_raise https://github.com/asmvik/yabai/blob/f17ef88116b0d988b834bb2...
alsetmusic•48m ago
I tried out their Loom vm software a couple of months back. Worked well, fwiw. I'm not using it anymore because I decided to just give agents direct (supervised) access to my devices.
frabonacci•19m ago
Thanks for trying out Lume! We definitely haven't given up on the idea of sandboxing GUI agents in local macOS VMs. Cua Driver is aimed at a different use case though, letting coding agents and general agents use the Mac you're already on, asynchronously and in the background. That also makes the economics better since multiple agents can share the same machine instead of each needing its own VM
dtran•5m ago
This is one of the coolest hacks I've seen recently. Having done some much less involved MacOS hacking, I can't help but wonder if we may finally see momentum behind some flavor of agent-friendly Linux/Android if Apple doesn't give us more ways to let agents interact with our machines.

Ghostty is leaving GitHub

https://mitchellh.com/writing/ghostty-leaving-github
1100•WadeGrimridge•2h ago•309 comments

Before GitHub

https://lucumr.pocoo.org/2026/4/28/before-github/
63•mlex•48m ago•9 comments

OpenAI models coming to Amazon Bedrock: Interview with OpenAI and AWS CEOs

https://stratechery.com/2026/an-interview-with-openai-ceo-sam-altman-and-aws-ceo-matt-garman-abou...
115•translocator•2h ago•41 comments

Warp is now Open-Source

https://github.com/warpdotdev/warp
164•doppp•4h ago•85 comments

Intel Arc Pro B70 Review

https://www.pugetsystems.com/labs/articles/intel-arc-pro-b70-review/
53•zdw•4d ago•22 comments

I won a championship that doesn't exist

https://ron.stoner.com/How_I_Won_a_Championship_That_Doesnt_Exist/
38•SEJeff•1h ago•26 comments

A playable DOOM MCP app

https://chrisnager.com/blog/doom-runs-in-chatgpt-and-claude/
67•chrisnager•2h ago•27 comments

CJIT: C, Just in Time

https://dyne.org/cjit/
58•smartmic•2h ago•16 comments

GitHub RCE Vulnerability: CVE-2026-3854 Breakdown

https://www.wiz.io/blog/github-rce-vulnerability-cve-2026-3854
179•bo0tzz•5h ago•45 comments

Your phone is about to stop being yours

https://keepandroidopen.org/en/
748•doener•6h ago•391 comments

Waymo in Portland

https://waymo.com/blog/shorts/waymo-in-portland/
212•xnx•3h ago•277 comments

Patch applies fake diffs from commit messages

https://samizdat.dev/phantom-patch/
56•reconquestio•1d ago•13 comments

I have officially retired from Emacs

https://nullprogram.com/blog/2026/04/26/
159•Fudgel•2d ago•92 comments

Claude.ai unavailable and elevated errors on the API

https://status.claude.com/incidents/9l93x2ht4s5w
235•shorsher•4h ago•200 comments

A New Type of Neuroplasticity Rewires the Brain After a Single Experience

https://www.quantamagazine.org/a-new-type-of-neuroplasticity-rewires-the-brain-after-a-single-exp...
12•ibobev•1d ago•0 comments

Infisical (YC W23) Is Hiring Full Stack Software Engineers (Remote)

https://jobs.ashbyhq.com/infisical/782b9da8-20e1-48b2-919e-6c5430c58628
1•vmatsiiako•5h ago

Localsend: An open-source cross-platform alternative to AirDrop

https://github.com/localsend/localsend
707•bilsbie•10h ago•223 comments

Warp is now open-source

https://www.warp.dev/blog/warp-is-now-open-source
109•meetpateltech•6h ago•42 comments

UAE to leave OPEC

https://www.ft.com/content/8c354f2d-3e66-47f1-aad4-9b4aa30e386d
293•bazzmt•9h ago•411 comments

VibeVoice: Open-source frontier voice AI

https://github.com/microsoft/VibeVoice
297•tosh•10h ago•164 comments

Drone pilot makes US rescind no-fly zones around unmarked, moving ICE vehicles

https://arstechnica.com/gadgets/2026/04/no-fly-zones-around-moving-ice-vehicles-this-drone-pilot-...
63•Bender•1h ago•10 comments

Show HN: Live Sun and Moon Dashboard with NASA Footage

https://www.lumara-space.app/
148•beeswaxpat•8h ago•48 comments

AISLE Discovers 38 CVEs in OpenEMR Healthcare Software

https://aisle.com/blog/aisle-discovers-38-critical-security-vulnerabilities-in-healthcare-softwar...
160•mmsc•6h ago•102 comments

Show HN: Drive any macOS app in the background without stealing the cursor

https://github.com/trycua/cua
19•frabonacci•6h ago•13 comments

GitHub Actions is the weakest link

https://nesbitt.io/2026/04/28/github-actions-is-the-weakest-link.html
185•dochtman•10h ago•62 comments

Show HN: My friend and his AI homies wrote SGI Indy emulator in Rust

https://github.com/techomancer/iris
6•greg_w•1h ago•2 comments

Talkie: a 13B vintage language model from 1930

https://talkie-lm.com/introducing-talkie
622•jekude•1d ago•253 comments

GitHub Copilot code review will start consuming GitHub Actions minutes

https://github.blog/changelog/2026-04-27-github-copilot-code-review-will-start-consuming-github-a...
229•whtsky•13h ago•160 comments

ASML became the chokepoint for cutting-edge chips

https://worksinprogress.co/issue/the-worlds-most-complex-machine/
313•mellosouls•3d ago•193 comments

Things C++26 define_static_array can't do

https://quuxplusone.github.io/blog/2026/04/24/define-static-array/
41•jandeboevrie•2d ago•17 comments