Show HN: Ourguide – OS wide task guidance system that shows you where to click

52•eshaangulati•1w ago

Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help.

I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is.

It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser.

Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency.

You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks.

I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do.

You can download and test Ourguide here: https://ourguide.ai/downloads

The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for.

Comments

culopatin•1w ago

What data do you extract from interactions?

eshaangulati•1w ago

Ourguide only takes a screenshot when the user asks for the next step. We run a PII filter first, then process the image via Tinfoil.sh in secure hardware enclaves (TEEs). This ensures the data remains private from everyone, including us. Tinfoil is open-source and fully verifiable.

DontBreakAlex•1w ago

Looks cool, I think you should try to target it towards the elderly. My 99 year old grandpa is capable of using a computer and browsing the web, but struggles whenever he gets out of the "usual flow" (accidentally removes the chrome icon from his taskbar, whenever the crappy web-based email he insists on using over thunderbird moves the add attachment button). I end up having to do teamviewer to show him what I can't explain over the phone. He would very much use an assistant that shows him what to do, especially if he can speak to it.

eshaangulati•1w ago

Hey! Thanks for the suggestion. I visited a lot of old age homes in SF today, but the recurring issue I saw was that most of the people there didn't use laptops, or even phones - so I'm not sure how I would market it to them. Any suggestions?

iosguyryan•1w ago

Nicely conceived! This is the kind of feature Apple ought to have already delivered with on device models and private cloud compute.

Sending many whole screenshots to an indie mystery box, though, should be a non-starter for anyone without the skills to verify what any given update to this app is doing. Your website's featured use case highlights the risks (to you and users) unintentionally well: "How do I export my passwords?" (I did a double take: was this performance art from The Onion?) If a user opens a plain text file of secrets without closing this app/the help task, what gets captured, sent over the network, and saved to disk? What protections exist for, say, a computer-challenged elderly person's banking details?

A suggestion about the FAQ ...

"Where is my task history stored? Is it private? Your privacy is our top priority. Your task history is stored securely and encrypted on your local machine by default. You have full control over your data."

... This invites unanswered questions about what exactly from the screenshots is stored, for how long, and what design backs the "securely" claim. Being up front about this would invite trust and helpful developer feedback.

eshaangulati•1w ago

thanks for your comment Ryan!

to answer your question regarding screenshots - I am quoting a previous answer "Ourguide only takes a screenshot when the user asks for the next step. We run a PII filter first, then process the image via Tinfoil.sh in secure hardware enclaves (TEEs). This ensures the data remains private from everyone, including us. Tinfoil is open-source and fully verifiable." Screenshots are not stored.

gyanchawdhary•1w ago

Checkout https://techcrunch.com/2016/05/02/google-acquires-synergyse-... ..

Google acquired these guys back in 2016 to help users learn how to use Google cloud products via interactive tutorials using a step by step guidance / walkthrough (the user had to install a chrome extension)

One of the best use cases would be edtech … think of interactive labs where your product can guide learners / students to complete a task and hand hold them ..

eshaangulati•1w ago

Appreciate the link! I’d love for you to elaborate on the edtech use case, not sure i understand completely.

gyanchawdhary•1w ago

Edtech companies that offer learn by doing curriculum (look at acloudguru, aquired by Pluralsight) offer labs where they expect students/learners to open their dull tutorials or instructions in a separate browser tab and then follow those instructions to achieve a learning objective .. your product could fit there to hand hold and educate users on all the different cloud technologies.. in short you can fuck a number of boomer edtech companies by making ur product like a education copilot and letting learners use it to learn stuff interactively .. learn how to secure s3 buckets in AWS .. learn how to configure backups or whatever ..

linkdead•1w ago

Good idea. I hope AI can automatically learn from the documents of newer version. Yesterday I used ChatGPT for "how to xxxxx in Blender?". Putting screenshots manually is bothersome, and the biggest problem is ChatGPT doesn't have knowledge of Blender 5.

eshaangulati•1w ago

Exactly! Ourguide has live web search capabilities, and can help you across any software for any task you need help with.

davelradindra•1w ago

Really interesting approach. Having a human in the loop seems like the right tradeoff given where computer-use models are today. One thing that came to mind is that this can be a new interface for software learning. If it works reliably, I could see it replacing static docs and videos!

aventus-tech•1w ago

So sick

ecto•1w ago

Can I put it on my mom's computer yet?

eshaangulati•1w ago

Sure! Ourguide is Live for MacOS and free to use.

ryannampham•1w ago

I like how it actually shows an image of your screen and where to place your cursor. This is honestly pretty cool.

thefz•1w ago

Man, in the clearly forged "trusted by professionals" part you have names that do not match the faces, unless Anika is a middle eastern looking guy and Elena really has a baseball hat and a beard.

pshirshov•1w ago

Clown world vision (c)

Incipient•1w ago

Cool idea - but realistically how much would this cost to run for a typical "show me how to do X".

pshirshov•1w ago

I wonder why we can't just make better UIs.

wlswo•1w ago

Interesting approach. It’s essentially a Just-in-Time documentation layer

protocolture•1w ago

"How do I prevent the AI features in Windows 11 from activating"

... Thinking

Click on Google Chrome and search for Debian.

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Fibonacci Number Certificates

AI Overviews are killing the web search, and there's nothing we can do about it

City skylines need an upgrade in the face of climate stress

1979: The Model World of Robert Symes [video]

Satellites Have a Lot of Room

1980s Farm Crisis

Show HN: FSID - Identifier for files and directories (like ISBN for Books)

Show HN: Holy Grail: Open-Source Autonomous Development Agent

Show HN: Minecraft Creeper meets 90s Tamagotchi

Show HN: Termiteam – Control center for multiple AI agent terminals

The only U.S. particle collider shuts down

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

Show HN: Remotion directory (videos and prompts)

Portable C Compiler

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

Software Engineering Transformation 2026

Microsoft purges Win11 printer drivers, devices on borrowed time

Lunch with the FT: Tarek Mansour

Old Mexico and her lost provinces (1883)

'AI' is a dick move, redux

The source code was the moat. But not anymore

Does anyone else feel like their inbox has become their job?

An AI model that can read and diagnose a brain MRI in seconds

Dev with 5 of experience switched to Rails, what should I be careful about?

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

Scientists discover “levitating” time crystals that you can hold in your hand

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

Tell HN: Yet Another Round of Zendesk Spam

Postgres Message Queue (PGMQ)

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

Show HN: Ourguide – OS wide task guidance system that shows you where to click

Comments