Show HN: Ourguide – OS wide task guidance system that shows you where to click

52•eshaangulati•1w ago

Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help.

I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is.

It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser.

Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency.

You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks.

I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do.

You can download and test Ourguide here: https://ourguide.ai/downloads

The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for.

Comments

culopatin•1w ago

What data do you extract from interactions?

eshaangulati•1w ago

Ourguide only takes a screenshot when the user asks for the next step. We run a PII filter first, then process the image via Tinfoil.sh in secure hardware enclaves (TEEs). This ensures the data remains private from everyone, including us. Tinfoil is open-source and fully verifiable.

DontBreakAlex•1w ago

Looks cool, I think you should try to target it towards the elderly. My 99 year old grandpa is capable of using a computer and browsing the web, but struggles whenever he gets out of the "usual flow" (accidentally removes the chrome icon from his taskbar, whenever the crappy web-based email he insists on using over thunderbird moves the add attachment button). I end up having to do teamviewer to show him what I can't explain over the phone. He would very much use an assistant that shows him what to do, especially if he can speak to it.

eshaangulati•1w ago

Hey! Thanks for the suggestion. I visited a lot of old age homes in SF today, but the recurring issue I saw was that most of the people there didn't use laptops, or even phones - so I'm not sure how I would market it to them. Any suggestions?

iosguyryan•1w ago

Nicely conceived! This is the kind of feature Apple ought to have already delivered with on device models and private cloud compute.

Sending many whole screenshots to an indie mystery box, though, should be a non-starter for anyone without the skills to verify what any given update to this app is doing. Your website's featured use case highlights the risks (to you and users) unintentionally well: "How do I export my passwords?" (I did a double take: was this performance art from The Onion?) If a user opens a plain text file of secrets without closing this app/the help task, what gets captured, sent over the network, and saved to disk? What protections exist for, say, a computer-challenged elderly person's banking details?

A suggestion about the FAQ ...

"Where is my task history stored? Is it private? Your privacy is our top priority. Your task history is stored securely and encrypted on your local machine by default. You have full control over your data."

... This invites unanswered questions about what exactly from the screenshots is stored, for how long, and what design backs the "securely" claim. Being up front about this would invite trust and helpful developer feedback.

eshaangulati•1w ago

thanks for your comment Ryan!

to answer your question regarding screenshots - I am quoting a previous answer "Ourguide only takes a screenshot when the user asks for the next step. We run a PII filter first, then process the image via Tinfoil.sh in secure hardware enclaves (TEEs). This ensures the data remains private from everyone, including us. Tinfoil is open-source and fully verifiable." Screenshots are not stored.

gyanchawdhary•1w ago

Checkout https://techcrunch.com/2016/05/02/google-acquires-synergyse-... ..

Google acquired these guys back in 2016 to help users learn how to use Google cloud products via interactive tutorials using a step by step guidance / walkthrough (the user had to install a chrome extension)

One of the best use cases would be edtech … think of interactive labs where your product can guide learners / students to complete a task and hand hold them ..

eshaangulati•1w ago

Appreciate the link! I’d love for you to elaborate on the edtech use case, not sure i understand completely.

gyanchawdhary•1w ago

Edtech companies that offer learn by doing curriculum (look at acloudguru, aquired by Pluralsight) offer labs where they expect students/learners to open their dull tutorials or instructions in a separate browser tab and then follow those instructions to achieve a learning objective .. your product could fit there to hand hold and educate users on all the different cloud technologies.. in short you can fuck a number of boomer edtech companies by making ur product like a education copilot and letting learners use it to learn stuff interactively .. learn how to secure s3 buckets in AWS .. learn how to configure backups or whatever ..

linkdead•1w ago

Good idea. I hope AI can automatically learn from the documents of newer version. Yesterday I used ChatGPT for "how to xxxxx in Blender?". Putting screenshots manually is bothersome, and the biggest problem is ChatGPT doesn't have knowledge of Blender 5.

eshaangulati•1w ago

Exactly! Ourguide has live web search capabilities, and can help you across any software for any task you need help with.

davelradindra•1w ago

Really interesting approach. Having a human in the loop seems like the right tradeoff given where computer-use models are today. One thing that came to mind is that this can be a new interface for software learning. If it works reliably, I could see it replacing static docs and videos!

aventus-tech•1w ago

So sick

ecto•1w ago

Can I put it on my mom's computer yet?

eshaangulati•1w ago

Sure! Ourguide is Live for MacOS and free to use.

ryannampham•1w ago

I like how it actually shows an image of your screen and where to place your cursor. This is honestly pretty cool.

thefz•1w ago

Man, in the clearly forged "trusted by professionals" part you have names that do not match the faces, unless Anika is a middle eastern looking guy and Elena really has a baseball hat and a beard.

pshirshov•1w ago

Clown world vision (c)

Incipient•1w ago

Cool idea - but realistically how much would this cost to run for a typical "show me how to do X".

pshirshov•1w ago

I wonder why we can't just make better UIs.

wlswo•1w ago

Interesting approach. It’s essentially a Just-in-Time documentation layer

protocolture•1w ago

"How do I prevent the AI features in Windows 11 from activating"

... Thinking

Click on Google Chrome and search for Debian.

Tiny C Compiler

The silent death of Good Code

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

The F Word

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Eigen: Building a Workspace

Vocal Guide – belt sing without killing yourself

Al Lowe on model trains, funny deaths and working with Disney

I write games in C (yes, C) (2016)

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: A luma dependent chroma compression algorithm (image compression)

Selection rather than prediction

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Where did all the starships go?

Learning from context is harder than we thought

Coding agents have replaced every framework I used

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Tiny C Compiler

The silent death of Good Code

SectorC: A C Compiler in 512 bytes

Speed up responses with fast mode

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Software factories and the agentic moment

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Stories from 25 Years of Software Development

Hoot: Scheme on WebAssembly

The F Word

FDA intends to take action against non-FDA-approved GLP-1 drugs

First Proof

Eigen: Building a Workspace

Vocal Guide – belt sing without killing yourself

Al Lowe on model trains, funny deaths and working with Disney

I write games in C (yes, C) (2016)

Start all of your commands with a comma (2009)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: A luma dependent chroma compression algorithm (image compression)

Selection rather than prediction

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

Reinforcement Learning from Human Feedback

Unseen Footage of Atari Battlezone Arcade Cabinet Production

A Fresh Look at IBM 3270 Information Display System

72M Points of Interest

Where did all the starships go?

Learning from context is harder than we thought

Coding agents have replaced every framework I used

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: Ourguide – OS wide task guidance system that shows you where to click

Comments