frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Ourguide – OS wide task guidance system that shows you where to click

https://ourguide.ai
12•eshaangulati•4h ago
Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help.

I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is.

It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser.

Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency.

You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks.

I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do.

You can download and test Ourguide here: https://ourguide.ai/downloads

The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for.

Comments

culopatin•1h ago
What data do you extract from interactions?
DontBreakAlex•47m ago
Looks cool, I think you should try to target it towards the elderly. My 99 year old grandpa is capable of using a computer and browsing the web, but struggles whenever he gets out of the "usual flow" (accidentally removes the chrome icon from his taskbar, whenever the crappy web-based email he insists on using over thunderbird moves the add attachment button). I end up having to do teamviewer to show him what I can't explain over the phone. He would very much use an assistant that shows him what to do, especially if he can speak to it.
iosguyryan•34m ago
Nicely conceived! This is the kind of feature Apple ought to have already delivered with on device models and private cloud compute.

Sending many whole screenshots to an indie mystery box, though, should be a non-starter for anyone without the skills to verify what any given update to this app is doing. Your website's featured use case highlights the risks (to you and users) unintentionally well: "How do I export my passwords?" (I did a double take: was this performance art from The Onion?) If a user opens a plain text file of secrets without closing this app/the help task, what gets captured, sent over the network, and saved to disk? What protections exist for, say, a computer-challenged elderly person's banking details?

A suggestion about the FAQ ...

"Where is my task history stored? Is it private? Your privacy is our top priority. Your task history is stored securely and encrypted on your local machine by default. You have full control over your data."

... This invites unanswered questions about what exactly from the screenshots is stored, for how long, and what design backs the "securely" claim. Being up front about this would invite trust and helpful developer feedback.

gyanchawdhary•29m ago
Checkout https://techcrunch.com/2016/05/02/google-acquires-synergyse-... ..

Google acquired these guys back in 2016 to help users learn how to use Google cloud products via interactive tutorials using a step by step guidance / walkthrough (the user had to install a chrome extension)

One of the best use cases would be edtech … think of interactive labs where your product can guide learners / students to complete a task and hand hold them ..

Television is 100 years old today

https://diamondgeezer.blogspot.com/2026/01/tv100.html
366•qassiov•7h ago•117 comments

The Hidden Engineering of Runways

https://practical.engineering/blog/2026/1/20/the-hidden-engineering-of-runways
78•crescit_eundo•6d ago•11 comments

State of the Windows: What is going on with Windows 11?

https://ntdotdev.wordpress.com/2026/01/25/state-of-the-windows-what-is-going-on-with-windows-11/
10•xd1936•20m ago•1 comments

JuiceSSH – Give me my pro features back

https://nproject.io/blog/juicessh-give-me-back-my-pro-features/
162•jandeboevrie•4h ago•61 comments

There is an AI code review bubble

https://www.greptile.com/blog/ai-code-review-bubble
106•dakshgupta•6h ago•78 comments

RIP Low-Code 2014-2025

https://www.zackliscio.com/posts/rip-low-code-2014-2025/
70•zackliscio•6h ago•36 comments

Dithering – Part 2: The Ordered Dithering

https://visualrambling.space/dithering-part-2/
75•ChrisArchitect•3h ago•7 comments

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

https://tetrisbench.com/tetrisbench/
51•ykhli•3h ago•22 comments

Qwen3-Max-Thinking

https://qwen.ai/blog?id=qwen3-max-thinking
382•vinhnx•7h ago•340 comments

Fedora Asahi Remix is now working on Apple M3

https://bsky.app/profile/did:plc:okydh7e54e2nok65kjxdklvd/post/3mdd55paffk2o
353•todsacerdoti•4h ago•126 comments

MapLibre Tile: a modern and efficient vector tile format

https://maplibre.org/news/2026-01-23-mlt-release/
377•todsacerdoti•12h ago•74 comments

When AI 'builds a browser,' check the repo before believing the hype

https://www.theregister.com/2026/01/26/cursor_opinion/
171•CrankyBear•3h ago•76 comments

Show HN: Ourguide – OS wide task guidance system that shows you where to click

https://ourguide.ai
12•eshaangulati•4h ago•4 comments

Not all Chess960 positions are equally complex

https://arxiv.org/abs/2512.14319
41•MaysonL•3d ago•16 comments

Show HN: SF Microclimates

https://github.com/solo-founders/sf-microclimates
14•weisser•20h ago•23 comments

ChatGPT Containers can now run bash, pip/npm install packages and download files

https://simonwillison.net/2026/Jan/26/chatgpt-containers/
60•simonw•3h ago•51 comments

OpenFlexure Microscope

https://openflexure.org/projects/microscope/
28•o4c•5d ago•4 comments

Google AI Overviews cite YouTube more than any medical site for health queries

https://www.theguardian.com/technology/2026/jan/24/google-ai-overviews-youtube-medical-citations-...
327•bookofjoe•7h ago•175 comments

Google Books removed all search functions for any books with previews

https://old.reddit.com/r/google/comments/1qn1hk1/google_has_seemingly_entirely_removed_search/
150•adamnemecek•4h ago•52 comments

San Francisco Graffiti

https://walzr.com/sf-graffiti
125•walz•12h ago•120 comments

Find 'Abbey Road when type 'Beatles abbey rd': Fuzzy/Semantic search in Postgres

https://rendiment.io/postgresql/2026/01/21/pgtrgm-pgvector-music.html
59•nethalo•5d ago•16 comments

Things I've learned in my 10 years as an engineering manager

https://www.jampa.dev/p/lessons-learned-after-10-years-as
502•jampa•5d ago•128 comments

The mountain that weighed the Earth

https://signoregalilei.com/2026/01/18/the-mountain-that-weighed-the-earth/
74•surprisetalk•5h ago•11 comments

OSS ChatGPT WebUI – 530 Models, MCP, Tools, Gemini RAG, Image/Audio Gen

https://llmspy.org/docs/v3
104•mythz•7h ago•24 comments

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
123•beigebrucewayne•11h ago•75 comments

Refusing to Use Twitter

https://blog.korny.info/2026/01/25/refusing-to-use-twitter
34•pavel_lishin•56m ago•5 comments

The Holy Grail of Linux Binary Compatibility: Musl and Dlopen

https://github.com/quaadgras/graphics.gd/discussions/242
198•Splizard•14h ago•168 comments

Taming P99s in OpenFGA: How we built a self-tuning strategy planner

https://auth0.com/blog/self-tuning-strategy-planner-openfga/
13•elbuo•4d ago•1 comments

The browser is the sandbox

https://simonwillison.net/2026/Jan/25/the-browser-is-the-sandbox/
319•enos_feedler•17h ago•172 comments

Text Is King

https://www.experimental-history.com/p/text-is-king
151•zdw•6d ago•69 comments