Windows-Use: an AI agent that interacts with Windows at GUI layer

https://github.com/CursorTouch/Windows-Use

46•djhu9•3d ago

Comments

yodon•2h ago

Very cool - does anyone know of an OSX equivalent?

Preferably one that is similarly able to understand and interact with web page elements, in addition to app elements and system elements.

CharlesW•1h ago

There are MCPs that work with the macOS Accessibility stack, like https://github.com/steipete/macos-automator-mcp, https://github.com/ashwwwin/automation-mcp, https://github.com/mediar-ai/MacosUseSDK, and https://github.com/baryhuang/mcp-remote-macos-use.

For web page elements, you could drive the browser via `do JavaScript` or use a dedicated browser MCP (Chrome DevTools MCP, Playwright MCP).

philfreo•2h ago

Cool. Reminds me of using SendKeys() in Visual Basic 6 in the 90s

https://learn.microsoft.com/en-us/dotnet/api/microsoft.visua...

kh9000•1h ago

Using the UIA tree as the currency for LLMs to reason over always made more sense to me than computer vision, screenshot based approaches. It’s true that not all software exposes itself correctly via UIA, but almost all the important stuff does. VS code is one notable exception (but you can turn on accessibility support in the settings)

freedomben•42m ago

Agreed. I've noticed ChatGPT when parsing screenshots writes out some Python code to parse it, and at least in the tests I've done (with things like, "what is the RGB value of the bullet points in the list" or similar) it ends up writing and rewriting the script five or so times and then gives up. I haven't tried others so I don't know if their approach is unique or not, but it definitely feels really fragile and slow to me

electroly•57m ago

Looks awesome. I've attempted my own implementation, but I never got it to work particularly well. "Open Notepad and type Hello World" was a triumph for me. I landed on the UIA tree + annotated screenshot combination, too, but mine was too primitive, and I tried to use GPT which isn't as good at image tasks as Gemini as used here. Great job!

tiahura•16m ago

LLM’s do a pretty good job of using pywin32 for programs that support COM like office.

Many hard LeetCode problems are easy constraint problems

The treasury is expanding the Patriot Act to attack Bitcoin self custody

3D modeling with paper

Advanced Scheme Techniques (2004) [pdf]

Vector database that can index 1B vectors in 48M

Windows-Use: an AI agent that interacts with Windows at GUI layer

Qwen3-Next

Doom-ada: Doom Emacs Ada language module with syntax, LSP and Alire support

Oq: Terminal OpenAPI Spec Viewer

A beginner's guide to extending Emacs

Humanely Dealing with Humungus Crawlers

Show HN: DWS OS, a Plan 9 Inspired Web "OS"

Building a Deep Research Agent Using MCP-Agent

Racintosh Plus – Rackmount Mac Plus

VaultGemma: The most capable differentially private LLM

Chat Control faces blocking minority in the EU

Show HN: An MCP Gateway to block the lethal trifecta

OpenAI Grove

Why our website looks like an operating system

Float Exposed

Crates.io phishing attempt

Astrophysics Source Code Library

Debian 13, Postgres, and the US time zones

Introduction to Nyquist and Lisp Programming

Over 100 ships have sailed with fake insurance from the Norwegian Ro Marine

Show HN: I made a generative online drum machine with ClojureScript

Ankit Gupta Joins YC as General Partner

Classic GTK1 GUI Library

Top model scores may be skewed by Git history leaks in SWE-bench

Lumina-DiMOO: An open-source discrete multimodal diffusion model

Windows-Use: an AI agent that interacts with Windows at GUI layer

Comments

Many hard LeetCode problems are easy constraint problems

The treasury is expanding the Patriot Act to attack Bitcoin self custody

3D modeling with paper

Advanced Scheme Techniques (2004) [pdf]

Vector database that can index 1B vectors in 48M

Windows-Use: an AI agent that interacts with Windows at GUI layer

Qwen3-Next

Doom-ada: Doom Emacs Ada language module with syntax, LSP and Alire support

Oq: Terminal OpenAPI Spec Viewer

A beginner's guide to extending Emacs

Humanely Dealing with Humungus Crawlers

Show HN: DWS OS, a Plan 9 Inspired Web "OS"

Building a Deep Research Agent Using MCP-Agent

Racintosh Plus – Rackmount Mac Plus

VaultGemma: The most capable differentially private LLM

Chat Control faces blocking minority in the EU

Show HN: An MCP Gateway to block the lethal trifecta

OpenAI Grove

Why our website looks like an operating system

Float Exposed

Crates.io phishing attempt

Astrophysics Source Code Library

Debian 13, Postgres, and the US time zones

Introduction to Nyquist and Lisp Programming

Over 100 ships have sailed with fake insurance from the Norwegian Ro Marine

Show HN: I made a generative online drum machine with ClojureScript

Ankit Gupta Joins YC as General Partner

Classic GTK1 GUI Library

Top model scores may be skewed by Git history leaks in SWE-bench

Lumina-DiMOO: An open-source discrete multimodal diffusion model