frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Drive any macOS app in the background without stealing the cursor

https://github.com/trycua/cua
4•frabonacci•1h ago
Hi HN, Francesco from Cua here. I hacked this project together last weekend, inspired by the Codex Computer-Use release and lessons learned from deploying GUI-operating agents for our customers.

The main problem: when a UI automation process controls a desktop app today, it usually takes over the human’s session. Your cursor moves, keyboard focus gets stolen, windows jump to the front, and you have to stop working until the agent is done. That is why we have historically avoided encouraging users to run these processes directly on their host machine, instead relying on VMs or GUI containers for concurrency and background execution.

But computer-use - the tools we give agents to operate computers like humans - does not scale cleanly that way. As models get smarter, agents need to share hosts safely, run in the background, and avoid collisions with the human or other agents using the same machine.

We realized macOS has no first-class API for "drive this app without touching the cursor". CGEventPost routes through the hardware input stream, so it moves your cursor. CGEvent.postToPid avoids the cursor warp, but Chromium treats those events as untrusted and silently drops clicks at the renderer boundary. Activating the target app first raises the window and pulls focus, defeating the point of background execution.

Cua Driver is our attempt at a real fix: a background computer-use driver for macOS that lets an agent click, type, scroll, and read native apps while your cursor, frontmost app, and Space stay where they are. The default interface is a CLI, so it is easy to script or call from any coding agent shell.

Try it on macOS 14+:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-d...)"

The first internal use case was delegated demo recording. We ask Claude Code to drive an app while 'cua-driver recording start' captures the trajectory, screenshots, actions, and click markers. The result is an agent-generated product demo, Screen Studio inspired.

Other things we have used it for:

- Replacing Vercel’s agent-browser and other browser-use CLIs. With Claude Code and Cua Driver, you do not need Chrome DevTools Protocol at all.

- A dev-loop QA agent that reproduces a visual bug, edits code, rebuilds, and verifies the UI while my editor stays frontmost.

- Personal-assistant flows that use iMessage from Claude Code, Hermes, or other general-purpose agent CLIs.

- Pulling visual context from Chrome, Figma, Preview, or YouTube windows I am not looking at, without relying on their APIs.

What made this harder than expected:

- CGEventPost warps the cursor because it goes through the HID stream.

- CGEvent.postToPid does not warp the cursor, but Chromium drops it at the renderer IPC boundary.

- Activating the target first raises the window and can drag you across Spaces.

- Electron apps stop keeping useful AX trees alive when windows are occluded without a private remote-aware SPI.

The unlock was SkyLight. SLEventPostToPid is a sibling of the public per-PID call, but it travels through a WindowServer channel Chromium accepts as trusted. Pair it with yabai’s focus-without-raise pattern, plus an off-screen primer click at (-1, -1), and the click lands without the window ever raising.

One thing we learned: the right addressing mode depends on the app. Native macOS apps usually have rich AX trees, Chromium-family apps often need a hybrid of AX and screenshots, and apps like Blender or CAD tools may expose almost no useful AX surface. The mistake is defaulting to pixels everywhere - or defaulting to AX everywhere.

Long technical writeup: https://github.com/trycua/cua/blob/main/blog/inside-macos-wi...

I would like feedback from people building Mac automation, agent harnesses, or accessibility tooling. If it breaks on an macOS app you care about, that is useful data for us.

Comments

LatencyKills•1h ago
Ex-Apple engineer here. I really like your implementation. A few years ago I built a similar tool to help me automate the testing of some of my native macOS apps. Being able to run multiple UI automation tests simultaneously was the big win in my case.

My only criticism is enabling telemetry by default. I'm a fan of having people opt-in.

Universal Transformers Need Memory: Depth-State Trade-Offs in Adaptive Recursive

https://arxiv.org/abs/2604.21999
1•che_shr_cat•2m ago•0 comments

Show HN: Art Coding Lab – Learn Creative Coding Through Micro Challenges

https://artcodinglab.com/
1•absurdwebsite•3m ago•0 comments

GraphCompose – declarative PDF layout engine for Java (MIT)

https://github.com/DemchaAV/GraphCompose
1•demchaav•4m ago•0 comments

Show HN: I built a dating SIM that prepares you for your date

https://claude.ai/public/artifacts/98750067-546b-4c9e-ab62-68cae2941329
1•danish00111•7m ago•0 comments

Study Finds a Third of New Websites Are AI-Generated

https://www.404media.co/study-finds-a-third-of-new-websites-are-ai-generated/
2•Brajeshwar•10m ago•1 comments

GB Electricity Bills

https://www.electricitybills.uk/
2•kieranmaine•10m ago•1 comments

OpenAI Models, Codex, and Managed Agents Come to AWS

https://openai.com/index/openai-on-aws/
3•meetpateltech•10m ago•0 comments

Show HN: PastePlop – yet another Mac clipboard manager

https://bendansby.com/apps/pasteplop.html
1•webwielder2•13m ago•0 comments

Warp is now Open-Source

https://github.com/warpdotdev/warp
1•doppp•14m ago•0 comments

Nvidia Nemotron 3 Nano Omni

https://developer.nvidia.com/blog/nvidia-nemotron-3-nano-omni-powers-multimodal-agent-reasoning-i...
2•qainsights•14m ago•0 comments

Tridimensional Visualization of a Blackbird Song [video]

https://www.youtube.com/watch?v=EgWMo4BrKBs
1•vinnyglennon•15m ago•0 comments

Ask HN: What do you check before launching a web app?

1•pagelensai•15m ago•0 comments

Show HN: How to become an Anti-founder, THE MANUAL

https://manual.cochranblock.org
1•cochranblock•15m ago•0 comments

Biggest US airlines spent $1.2B more on fuel in Q1

https://sherwood.news/business/the-6-biggest-us-airlines-spent-1-2-billion-more-on-fuel-in-q1-and...
2•speckx•16m ago•0 comments

Our Uncertain Uncertainties

https://kevinkelly.substack.com/p/our-uncertain-uncertainties
2•nowflux•16m ago•0 comments

You're the Bread in the AI Sandwich

https://every.to/context-window/you-re-the-bread-in-the-ai-sandwich
2•gmays•16m ago•0 comments

The Download: Musk and Altman's legal showdown, and AI's profit problem

https://www.technologyreview.com/2026/04/28/1136479/the-download-musk-altman-openai-trial-ai-prof...
1•joozio•16m ago•0 comments

From GitHub to Codeberg/Forgejo

https://www.jonashietala.se/blog/2026/04/28/from_github_to_codebergforgejo/
1•lawn•17m ago•0 comments

Doofioso (2006)

https://scottaaronson.blog/?p=75
1•Tomte•18m ago•0 comments

Remote Code Execution on Github with a single Git push

https://twitter.com/wiz_io/status/2049153209982140718
1•ramonga•18m ago•0 comments

The Royal Game (2020)

https://codemetas.de/2020/11/22/The-Royal-Game.html
1•tosh•20m ago•0 comments

Humpback whale 'Timmy' being transported towards ocean

https://www.dw.com/en/germany-stranded-whale-timmy-being-transported-towards-ocean-in-special-bar...
1•Tomte•21m ago•0 comments

How to Acquire a Country: A Thought Experiment

https://flyingsolo.bearblog.dev/how-to-acquire-a-country/
1•ankitdce•22m ago•0 comments

SXSW Used AI-Powered Trademark Tool to Censor Dissent on Instagram

https://www.404media.co/sxsw-used-ai-powered-trademark-tool-to-censor-dissent-on-instagram/
1•cdrnsf•24m ago•0 comments

Building a Fast Multilingual OCR Model with Synthetic Data

https://huggingface.co/blog/nvidia/nemotron-ocr-v2
2•ibobev•26m ago•0 comments

DeepSeek-V4: a million-token context that agents can use

https://huggingface.co/blog/deepseekv4
2•ibobev•27m ago•0 comments

An Aristotelian understanding of object-oriented programming

https://dl.acm.org/doi/10.1145/353171.353194
2•b-man•27m ago•0 comments

Adaptive Ultrasound Imaging with Physics

https://huggingface.co/blog/nvidia/raw2insights-adaptive-ultrasound-imaging
1•ibobev•27m ago•0 comments

General Motors says it expects $500M tariff refund after SCOTUS ruling

https://abcnews.com/Business/general-motors-expects-500-million-tariff-refund-after/story?id=1324...
2•testing22321•29m ago•0 comments

AI's Economics Don't Make Sense

https://www.wheresyoured.at/ais-economics-dont-make-sense-ad-free/
3•speckx•30m ago•0 comments