Launch HN: Cyberdesk (YC S25) – Automate Windows legacy desktop apps

27•mahmoud-almadi•2h ago

Hi HN, We’re Mahmoud and Alan, building Cyberdesk (https://www.cyberdesk.io/), a deterministic computer use agent for automating Windows desktop applications. Developers use us to automate repetitive tasks in legacy software in healthcare, accounting, construction, and more, by executing clicks and keystrokes directly into the desktop.

Here’s a couple demos of Cyberdesk’s computer use agent:

A fast file import automation into a legacy desktop app: https://youtu.be/H_lRzrCCN0E

Working on a monster of a Windows monolith called OpenDental (showcases agent learning process as well): https://youtu.be/nXiJDebOJD0.

Filing a W-2 tax form: https://youtu.be/6VNEzHdc8mc

Many industries are stuck with legacy Windows desktop applications, with staff plagued by repetitive tasks that are incredibly time consuming. Vendors offering automations for these end up writing brittle Robotic Process Automation (RPA) scripts or hiring off-shore teams for manual task execution. RPA often breaks due to inevitable UI changes or unexpected popups like a Windows update or a random in-app notification. Off-shore teams are often unreliable and costlier than software, plus they’re not always an option for regulated industries.

I previously built RPA scripts impacting 20K+ employees at a Fortune 100 company where I experienced first hand RPA’s brittleness and inflexibility. It was obvious to me that this was a bandaid solution to an unsolved problem. Alan was building a computer use agent for his previous startup and realized its huge potential to automate a ton of manual computer tasks across many industries, so we started working on Cyberdesk.

Computer use models can struggle with abstract, long-horizon tasks, but they excel at making context-aware decisions on a screen-by-screen basis, so they’re a good fit for automating these desktop apps.

The key to reliability is crafting prompts that are highly specific and well thought out. Much like with ChatGPT, vague or ambiguous prompts won’t get you the results you want. This is especially true in computer use because the model is processing nearly an entire desktop screen’s worth of extra visual information; without precise instructions, it doesn’t know which details to focus on or how to act.

Unlike RPA, Cyberdesk’s agents don’t blindly replay clicks. They read the screen state before every action and self-correct when flows drift (pop-ups, latency, UI changes). Unlike off-the-shelf computer use AIs, Cyberdesk runs deterministically in production: the agent primarily follows the steps it has learned and only falls back to reasoning when anomalies occur. Cyberdesk learns workflows from natural-language instructions, capturing nuance and handling dynamic tasks - far beyond what a simple screen recording of a few runs can encode.

This approach is good for both reliability and cost: reliability, because we fall back to a computer use model in unexpected situations; and cost because the computer use models are expensive and we only use them when we need to. Otherwise we leverage faster, more affordable visual LLMs for checking the screen state step-by-step during deterministic runs. Our agents are also equipped with tools like failsafes, data extraction, screen evaluation to handle dynamic and sensitive situations.

How it works: you install our open source driver on any Windows machine (https://github.com/cyberdesk-hq/cyberdriver). It communicates with our backend to receive commands (click, type, scroll, screenshot) and sends back data (screenshots, API responses, etc). You give our computer use agent a detailed natural language description of the process for a given task, just like an SOP for an employee learning a new task for the first time. The agent then leverages computer use AI models to learn the steps and memorizes them by saving each screenshot alongside its action (click on these coordinates, type XYZ, wait for page to load, etc).

The agent deterministically runs through these steps to run fast and predictably. In order to account for popups and UI changes, our agent checks the live screen state against the memorized state to determine whether it’s safe to proceed with the memorized step. If no major changes prevent safe execution of the memorized step, it proceeds; otherwise, it falls back to a computer use model with context on past actions and the remaining task.

Customers are currently using us for manual tasks like importing and exporting files from legacy desktop applications, booking appointments for patients on a desktop PMS, and data entry for filling our forms like patient profiles and such in an EMR.

We don't have a self-serve option yet but we'd love to onboard you manually. Book a demo here to learn more! (https://www.cyberdesk.io/) If you’d rather wait for the self-serve option a little later down the line, please do submit your email here (https://forms.gle/HfQLxMXKcv9Eh8Gs8) so you can be notified as soon as that’s ready. You can also check out our docs here: https://docs.cyberdesk.io/.

We’d absolutely love to hear your thoughts on our approach and on desktop automation for legacy industries!

Comments

throw03172019•1h ago

Looks great. For the EMR use cases, do you sign BAAs? Which CUA models are being used? No data retention?

mahmoud-almadi•1h ago

We sign BAAs with all our healthcare customers + all our vendors. Currently using Claude computer-use. Zero-data retention signed with both Anthropic and OpenAI, so none of the information getting sent to their LLMs ever get retained

hermitcrab•38m ago

>none of the information getting sent to their LLMs ever get retained

Is it possible to verify that?

sgtwompwomp•37m ago

Yup! We have signed certificates that explicitly state this, with all LLM providers we use.

herval•20m ago

I’m guessing OP is asking if it’s possible to verify they’re honoring the contract and deleting the data?

feisty0630•11m ago

That's not "verification" by any definition of the word.

rkagerer•1h ago

Personally I think this approach is flawed because it runs in the cloud. If it were an agent I could run locally I'd be much more interested.

mahmoud-almadi•1h ago

Are you referring to the LLM being used or where the actions (click, type, etc) are being executed? The actual actions can be executed on any windows machine, so the actual execution can take place locally on your device. The LLMs we're using right now are cloud LLMs. We haven't done an LLM self hosting option yet. Can I ask what reservations you have about running in the cloud? We have zero-date retention signed with our LLM vendors, so none of the data getting sent to them ever gets retained.

Using Lxcfs Together with Podman

Out-of-bound indexing behaviors in Python ecosystem

"Privacy preserving age verification" is bullshit

Deck: Deck is a tool for creating decks using Markdown and Google Slides

What is the go proxy even doing?

Bolt Cloud

Proto Rig and Proto Fleet: A paradigm shift in Bitcoin mining

Graphs Are Programs

Microsoft is getting ready to return to the office

Meta appoints anti-LGBTQ+ conspiracy theorist Robby Starbuck as AI bias advisor

Black metal could give a heavy boost to solar power generation

Show HN: Evaluating LLMs on creative writing via reader usage, not benchmarks

Our relationship to technology is broken

Suspicious Tag Change in AWS's GitHub Action: What Happened and Why It Matters

Fun with Finite State Transducers

Firefox 143 no longer works on certain Windows 10 versions

Death of the Billable Hour: Legal's $900B AI Repricing

Render Launches Edge Caching for Web Services

LLM Copyright/Plagiarism filters trivially bypassed with 0% detection [pdf]

The Curious Case of Bedrock's GPT Deployment

Ask HN: Has anyone used AI agent to unsubscribe from spam newsletters?

China's Lead in Open-Source AI Jolts Washington and Silicon Valley

Why Computer-Use Agents Should Think Less

Mellea is a library for writing generative programs

Trying Out the New Android Linux Terminal

A beginner-friendly guide to learning Jax with practical examples

New trend: extreme hours at AI startups

Right to Light

SynthicAI now uses Ink-Whisper to deliver customer voice responses under 60ms

Goedel-Prover-V2