frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Understudy – Teach a desktop agent by demonstrating a task once

https://github.com/understudy-ai/understudy
29•bayes-song•1h ago
I built Understudy because a lot of real work still spans native desktop apps, browser tabs, terminals, and chat tools. Most current agents live in only one of those surfaces.

Understudy is a local-first desktop agent runtime that can operate GUI apps, browsers, shell tools, files, and messaging in one session. The part I'm most interested in feedback on is teach-by-demonstration: you do a task once, the agent records screen video + semantic events, extracts the intent rather than coordinates, and turns it into a reusable skill.

Demo video: https://www.youtube.com/watch?v=3d5cRGnlb_0

In the demo I teach it: Google Image search -> download a photo -> remove background in Pixelmator Pro -> export -> send via Telegram. Then I ask it to do the same for Elon Musk. The replay isn't a brittle macro: the published skill stores intent steps, route options, and GUI hints only as a fallback. In this example it can also prefer faster routes when they are available instead of repeating every GUI step.

Current state: macOS only. Layers 1-2 are working today; Layers 3-4 are partial and still early.

    npm install -g @understudy-ai/understudy
    understudy wizard
GitHub: https://github.com/understudy-ai/understudy

Happy to answer questions about the architecture, teach-by-demonstration, or the limits of the current implementation.

Comments

sukhdeepprashut•42m ago
2026 and we still pretend to not understand how llms work huh
wuweiaxin•36m ago
The demonstration-based approach is interesting for the handoff problem. The hardest part of agentic automation isnt the first run -- its making the agent robust to the cases the demonstrator never showed it. How do you handle edge cases or failures mid-task? Does it fall back to asking the user, or does it have some recovery heuristic? Asking because we found that the failure mode surface matters more than happy-path coverage when you actually deploy these in production.
bayes-song•17m ago
That’s exactly the hard part, and I agree it matters more than the happy path.

A few concrete things we do today:

1. It’s fully agentic rather than a fixed replay script. The model is prompted to treat GUI as one route among several, to prefer simpler / more reliable routes when available, and to switch routes or replan after repeated failures instead of brute-forcing the same path. In practice, we’ve also seen cases where, after GUI interaction becomes unreliable, the agent pivots to macOS-native scripting / AppleScript-style operations. I wouldn’t overclaim that path though: it works much better on native macOS surfaces than on arbitrary third-party apps.

2. GUI grounding has an explicit validation-and-retry path. Each action is grounded from a fresh screenshot, not stored coordinates. In the higher-risk path, the runtime does prediction, optional refinement, a simulated action overlay, and then validation; if validation rejects the candidate, that rejection feeds the next retry round. And if the target still can’t be grounded confidently, the runtime returns a structured `not_found` rather than pretending success.

3. The taught artifact has some built-in generalization. What gets published is not a coordinate recording but a three-layer abstraction: intent-level procedure, route options, and GUI replay hints as a last resort. The execution policy is adaptive by default, so the demonstration is evidence for the task, not the only valid tool sequence.

In practice, when things go wrong today, the system often gets much slower: it re-grounds, retries, and sometimes replans quite aggressively, and we definitely can’t guarantee that it will always recover to the correct end state. That’s also exactly the motivation for Layer 3 in the design: when the system does find a route / grounding pattern / recovery path that works, we want to remember that and reuse it later instead of rediscovering it from scratch every time.

abraxas•18m ago
One more tool targeting OSX only. That platform is overserved with desktop agents already while others are underserved, especially Linux.
bayes-song•14m ago
Fair point that Linux is underserved.

My own view is that the bigger long-term opportunity is actually Windows, simply because more desktop software and more professional workflows still live there. macOS-first here is mostly an implementation / iteration choice, not the thesis.

renewiltord•12m ago
That's mostly because Mac OS users make tools that solve their problems and Linux users go online to complain that no one has solved their problem but that if they did they'd want it to be free.
jedreckoning•1m ago
cool idea. good idea doing a demo as well.

A small CLI for stopping Git worktrees from fighting over ports

https://github.com/johndockery/portlock
1•ilovejazz442•32s ago•0 comments

Svelte Best Practices

https://svelte.dev/docs/svelte/best-practices
1•Erenay09•56s ago•0 comments

Lowdown can translate Markdown to an mdoc manpage

https://kristaps.bsd.lv/lowdown/mdoc.html
1•fanf2•1m ago•0 comments

FSC Age Verification Bill Tracker

https://action.freespeechcoalition.com/age-verification-bills/
1•muyuu•3m ago•0 comments

Disney+ Teases Creator-Driven Content as It Launches Vertical Video Feature

https://www.hollywoodreporter.com/business/digital/disney-creator-content-launches-vertical-video...
1•andsoitis•5m ago•0 comments

The FermAI Paradox: Agents Need Their IDE Moment

https://docs.ctx.rs/blog/the-fermai-paradox
3•ripped_britches•6m ago•0 comments

New F1 regulations take bravery out of the sport, drivers say

https://www.reuters.com/sports/formula1/new-f1-regulations-take-bravery-out-sport-drivers-say-202...
2•samizdis•9m ago•0 comments

Local Agents with Llama.cpp and Pi

https://huggingface.co/docs/hub/agents-local
2•kristianpaul•9m ago•0 comments

Show HN: Aurion OS – A 32-bit GUI operating system written from scratch in C

https://github.com/Luka12-dev/AurionOS
3•Luka12-dev•10m ago•0 comments

Ask HN: Rethinking SaaS architecture for AI-native systems

2•RobertSerber•10m ago•1 comments

Weak Cyberdefenses Threaten U.S. Tech Dominance

https://www.foreignaffairs.com/united-states/americas-endangered-ai
3•fheiding•11m ago•0 comments

Anthropic invests $100M into the Claude Partner Network

https://www.anthropic.com/news/claude-partner-network
2•surprisetalk•11m ago•0 comments

gstack – Garry Tan's Claude Code Setup

https://github.com/garrytan/gstack
2•jumploops•12m ago•0 comments

The Tao of Kung Fu: The Undiscerning Mind [video]

https://www.youtube.com/watch?v=Q5J4nHdr134
1•jamesgill•13m ago•0 comments

Is MacBook Neo "The One"? [video]

https://www.youtube.com/watch?v=AwuKCgSgcR4
2•tosh•14m ago•0 comments

WebZero – a web server that serves 5k req/SEC on a 2001 Pentium III

https://github.com/davitotty/webzero
2•Davitotty1•14m ago•1 comments

'The shine has been taken off': Dubai faces existential threat

https://www.theguardian.com/world/2026/mar/11/the-shine-has-been-taken-off-dubai-faces-existentia...
2•akbarnama•15m ago•0 comments

Speculative Branching Cache

https://medium.com/@dmitrijs.gavrilovs.swampus/speculative-branching-cache-managing-temporary-sta...
1•swampus•15m ago•0 comments

Valea: An AI-native systems programming language

https://github.com/hvoetsch/valea
1•hvoetsch•15m ago•1 comments

TrueTime Meetings – open-source video meetings, built for customization

https://www.red5.net/truetime/meetings/
1•mondainx•18m ago•0 comments

Mapping the Forests with Precision:Introducing Canopy Height Maps

https://ai.meta.com/blog/world-resources-institute-dino-canopy-height-maps-v2/?_fb_noscript=1
2•tzury•19m ago•0 comments

Axiom Raises $200M Series A at a $1.6B Valuation

https://menlovc.com/perspective/ai-will-write-all-the-code-mathematics-will-prove-it-works/
1•doppp•19m ago•0 comments

My PostgreSQL database got nuked lol

https://akselmo.dev/posts/they-broke-my-server/
1•birdculture•20m ago•0 comments

The Bitter Lesson Has No Utility Function

https://gfrm.in/posts/bitter-lesson-missing-half/index.html
2•slygent•20m ago•0 comments

Show HN: blunder.clinic, realistic daily chess puzzles

https://blunder.clinic/
2•mcyc•21m ago•0 comments

Show HN: Raccoon AI – Collaborative AI Agent for Anything

https://raccoonai.tech
3•scorchy38•23m ago•1 comments

When Weight-Loss Drugs Don't Work

https://www.nytimes.com/2026/03/12/well/weight-loss-drugs-response-wegovy-zepbound.html
4•paulpauper•24m ago•0 comments

The Met Introduces High-Definition 3D Scans of Art Historical Objects

https://www.thisiscolossal.com/2026/03/metropolitan-museum-of-art-3d-models-art-history/
1•paulpauper•24m ago•1 comments

Why Is the USDA Involved in Housing?

https://marginalrevolution.com/marginalrevolution/2026/03/why-is-the-usda-involved-in-housing.html
1•paulpauper•24m ago•0 comments

AI may never be as cheap to use as it is today

https://www.axios.com/2026/03/12/ai-models-costs-ipo-pricing
3•giuliomagnifico•24m ago•0 comments