Hey HN! I built this after weeks of frustration with Claude's computer use for insurance workflow automation.
The problems I kept hitting:
1. Scrolling kills you — To read a multi-page doc, the AI screenshots, scrolls, screenshots again. Each scroll burns an API call. A 10-page document eats half your context window before you've done anything useful.
2. Environment mismatch — My customers are insurance brokers on corporate VPNs, with passwords saved in their browsers and MFA on their phones. None of that exists in a cloud VM.
3. Reliability — Asked it to download 20 files. After two, it decided there must be a faster way. There wasn't. It downloaded 3, hallucinated the rest, and reported success.
For browsers, this is getting solved — tools like Stagehand use the accessibility tree instead of screenshots. Full page structure in one call, click by name, no pixel hunting.
But for desktop apps? Nothing.
agent-rdp fills that gap: Windows automation over Remote Desktop with full UI Automation access. Runs locally so VPNs and MFA just work. CLI-first so AI agents can write scripts instead of fumbling through screenshots.
Would love feedback, especially from anyone else building desktop automation for AI agents.
Happy to chat — email in my profile, or nick@workflowly.ai
thisnick•2h ago
Hey HN! I built this after weeks of frustration with Claude's computer use for insurance workflow automation.
The problems I kept hitting:
1. Scrolling kills you — To read a multi-page doc, the AI screenshots, scrolls, screenshots again. Each scroll burns an API call. A 10-page document eats half your context window before you've done anything useful.
2. Environment mismatch — My customers are insurance brokers on corporate VPNs, with passwords saved in their browsers and MFA on their phones. None of that exists in a cloud VM.
3. Reliability — Asked it to download 20 files. After two, it decided there must be a faster way. There wasn't. It downloaded 3, hallucinated the rest, and reported success.
For browsers, this is getting solved — tools like Stagehand use the accessibility tree instead of screenshots. Full page structure in one call, click by name, no pixel hunting.
But for desktop apps? Nothing.
agent-rdp fills that gap: Windows automation over Remote Desktop with full UI Automation access. Runs locally so VPNs and MFA just work. CLI-first so AI agents can write scripts instead of fumbling through screenshots.
Would love feedback, especially from anyone else building desktop automation for AI agents.
Happy to chat — email in my profile, or nick@workflowly.ai