Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

28•mahmoud-almadi•2h ago

Hi HN, We’re Mahmoud and Alan, building Cyberdesk (https://www.cyberdesk.io/), a deterministic computer use agent for automating Windows desktop applications. Developers use us to automate repetitive tasks in legacy software in healthcare, accounting, construction, and more, by executing clicks and keystrokes directly into the desktop.

Here’s a couple demos of Cyberdesk’s computer use agent:

A fast file import automation into a legacy desktop app: https://youtu.be/H_lRzrCCN0E

Working on a monster of a Windows monolith called OpenDental (showcases agent learning process as well): https://youtu.be/nXiJDebOJD0.

Filing a W-2 tax form: https://youtu.be/6VNEzHdc8mc

Many industries are stuck with legacy Windows desktop applications, with staff plagued by repetitive tasks that are incredibly time consuming. Vendors offering automations for these end up writing brittle Robotic Process Automation (RPA) scripts or hiring off-shore teams for manual task execution. RPA often breaks due to inevitable UI changes or unexpected popups like a Windows update or a random in-app notification. Off-shore teams are often unreliable and costlier than software, plus they’re not always an option for regulated industries.

I previously built RPA scripts impacting 20K+ employees at a Fortune 100 company where I experienced first hand RPA’s brittleness and inflexibility. It was obvious to me that this was a bandaid solution to an unsolved problem. Alan was building a computer use agent for his previous startup and realized its huge potential to automate a ton of manual computer tasks across many industries, so we started working on Cyberdesk.

Computer use models can struggle with abstract, long-horizon tasks, but they excel at making context-aware decisions on a screen-by-screen basis, so they’re a good fit for automating these desktop apps.

The key to reliability is crafting prompts that are highly specific and well thought out. Much like with ChatGPT, vague or ambiguous prompts won’t get you the results you want. This is especially true in computer use because the model is processing nearly an entire desktop screen’s worth of extra visual information; without precise instructions, it doesn’t know which details to focus on or how to act.

Unlike RPA, Cyberdesk’s agents don’t blindly replay clicks. They read the screen state before every action and self-correct when flows drift (pop-ups, latency, UI changes). Unlike off-the-shelf computer use AIs, Cyberdesk runs deterministically in production: the agent primarily follows the steps it has learned and only falls back to reasoning when anomalies occur. Cyberdesk learns workflows from natural-language instructions, capturing nuance and handling dynamic tasks - far beyond what a simple screen recording of a few runs can encode.

This approach is good for both reliability and cost: reliability, because we fall back to a computer use model in unexpected situations; and cost because the computer use models are expensive and we only use them when we need to. Otherwise we leverage faster, more affordable visual LLMs for checking the screen state step-by-step during deterministic runs. Our agents are also equipped with tools like failsafes, data extraction, screen evaluation to handle dynamic and sensitive situations.

How it works: you install our open source driver on any Windows machine (https://github.com/cyberdesk-hq/cyberdriver). It communicates with our backend to receive commands (click, type, scroll, screenshot) and sends back data (screenshots, API responses, etc). You give our computer use agent a detailed natural language description of the process for a given task, just like an SOP for an employee learning a new task for the first time. The agent then leverages computer use AI models to learn the steps and memorizes them by saving each screenshot alongside its action (click on these coordinates, type XYZ, wait for page to load, etc).

The agent deterministically runs through these steps to run fast and predictably. In order to account for popups and UI changes, our agent checks the live screen state against the memorized state to determine whether it’s safe to proceed with the memorized step. If no major changes prevent safe execution of the memorized step, it proceeds; otherwise, it falls back to a computer use model with context on past actions and the remaining task.

Customers are currently using us for manual tasks like importing and exporting files from legacy desktop applications, booking appointments for patients on a desktop PMS, and data entry for filling our forms like patient profiles and such in an EMR.

We don't have a self-serve option yet but we'd love to onboard you manually. Book a demo here to learn more! (https://www.cyberdesk.io/) If you’d rather wait for the self-serve option a little later down the line, please do submit your email here (https://forms.gle/HfQLxMXKcv9Eh8Gs8) so you can be notified as soon as that’s ready. You can also check out our docs here: https://docs.cyberdesk.io/.

We’d absolutely love to hear your thoughts on our approach and on desktop automation for legacy industries!

Comments

throw03172019•2h ago

Looks great. For the EMR use cases, do you sign BAAs? Which CUA models are being used? No data retention?

mahmoud-almadi•1h ago

We sign BAAs with all our healthcare customers + all our vendors. Currently using Claude computer-use. Zero-data retention signed with both Anthropic and OpenAI, so none of the information getting sent to their LLMs ever get retained

hermitcrab•53m ago

>none of the information getting sent to their LLMs ever get retained

Is it possible to verify that?

sgtwompwomp•52m ago

Yup! We have signed certificates that explicitly state this, with all LLM providers we use.

herval•35m ago

I’m guessing OP is asking if it’s possible to verify they’re honoring the contract and deleting the data?

feisty0630•26m ago

That's not "verification" by any definition of the word.

rkagerer•1h ago

Personally I think this approach is flawed because it runs in the cloud. If it were an agent I could run locally I'd be much more interested.

mahmoud-almadi•1h ago

Are you referring to the LLM being used or where the actions (click, type, etc) are being executed? The actual actions can be executed on any windows machine, so the actual execution can take place locally on your device. The LLMs we're using right now are cloud LLMs. We haven't done an LLM self hosting option yet. Can I ask what reservations you have about running in the cloud? We have zero-date retention signed with our LLM vendors, so none of the data getting sent to them ever gets retained.

Gemma 3 270M: The compact model for hyper-efficient AI

Blood oxygen monitoring returning to Apple Watch in the US

New protein therapy shows promise as antidote for carbon monoxide poisoning

Bluesky: Updated Terms and Policies

Kodak has no plans to cease, go out of business, or file for bankruptcy

What's the strongest AI model you can train on a laptop in five minutes?

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

Axle (YC S22) Is Hiring Product Engineers

Arch shares its wiki strategy with Debian

Jujutsu and Radicle

Brilliant illustrations bring this 1976 Soviet edition of 'The Hobbit' to life (2015)

Org-social is a decentralized social network that runs on an Org Mode

Architecting LARGE software projects [video]

Show HN: I built a free alternative to Adobe Acrobat PDF viewer

NSF and Nvidia award Ai2 $152M to support building an open AI ecosystem

Show HN: Zig-DbC – A design by contract library for Zig

Meta accessed women's health data from Flo app without consent, says court

SIMD Binary Heap Operations

Is chain-of-thought AI reasoning a mirage?

Funding Open Source like public infrastructure

Linux Address Space Isolation Revived After Lowering 70% Performance Hit to 13%

JetBrains working on higher-abstraction programming language

Zenobia Pay – A mission to build an alternative to high-fee card networks

Show HN: Yet another memory system for LLMs

I Made a Realtime C/C++ Build Visualizer

KosmicKrisp a Vulkan on Metal Mesa 3D Graphics Driver

Why LLMs can't really build software

Show HN: XR2000: A science fiction programming challenge

Convo-Lang: LLM Programming Language and Runtime

Launch HN: Golpo (YC S25) – AI-generated explainer videos

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

Comments

Gemma 3 270M: The compact model for hyper-efficient AI

Blood oxygen monitoring returning to Apple Watch in the US

New protein therapy shows promise as antidote for carbon monoxide poisoning

Bluesky: Updated Terms and Policies

Kodak has no plans to cease, go out of business, or file for bankruptcy

What's the strongest AI model you can train on a laptop in five minutes?

Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

Axle (YC S22) Is Hiring Product Engineers

Arch shares its wiki strategy with Debian

Jujutsu and Radicle

Brilliant illustrations bring this 1976 Soviet edition of 'The Hobbit' to life (2015)

Org-social is a decentralized social network that runs on an Org Mode

Architecting LARGE software projects [video]

Show HN: I built a free alternative to Adobe Acrobat PDF viewer

NSF and Nvidia award Ai2 $152M to support building an open AI ecosystem

Show HN: Zig-DbC – A design by contract library for Zig

Meta accessed women's health data from Flo app without consent, says court

SIMD Binary Heap Operations

Is chain-of-thought AI reasoning a mirage?

Funding Open Source like public infrastructure

Linux Address Space Isolation Revived After Lowering 70% Performance Hit to 13%

JetBrains working on higher-abstraction programming language

Zenobia Pay – A mission to build an alternative to high-fee card networks

Show HN: Yet another memory system for LLMs

I Made a Realtime C/C++ Build Visualizer

KosmicKrisp a Vulkan on Metal Mesa 3D Graphics Driver

Why LLMs can't really build software

Show HN: XR2000: A science fiction programming challenge

Convo-Lang: LLM Programming Language and Runtime

Launch HN: Golpo (YC S25) – AI-generated explainer videos