frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Ourguide – OS wide task guidance system that shows you where to click

https://ourguide.ai
16•eshaangulati•4h ago
Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help.

I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is.

It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser.

Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency.

You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks.

I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do.

You can download and test Ourguide here: https://ourguide.ai/downloads

The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for.

Comments

culopatin•2h ago
What data do you extract from interactions?
DontBreakAlex•1h ago
Looks cool, I think you should try to target it towards the elderly. My 99 year old grandpa is capable of using a computer and browsing the web, but struggles whenever he gets out of the "usual flow" (accidentally removes the chrome icon from his taskbar, whenever the crappy web-based email he insists on using over thunderbird moves the add attachment button). I end up having to do teamviewer to show him what I can't explain over the phone. He would very much use an assistant that shows him what to do, especially if he can speak to it.
iosguyryan•1h ago
Nicely conceived! This is the kind of feature Apple ought to have already delivered with on device models and private cloud compute.

Sending many whole screenshots to an indie mystery box, though, should be a non-starter for anyone without the skills to verify what any given update to this app is doing. Your website's featured use case highlights the risks (to you and users) unintentionally well: "How do I export my passwords?" (I did a double take: was this performance art from The Onion?) If a user opens a plain text file of secrets without closing this app/the help task, what gets captured, sent over the network, and saved to disk? What protections exist for, say, a computer-challenged elderly person's banking details?

A suggestion about the FAQ ...

"Where is my task history stored? Is it private? Your privacy is our top priority. Your task history is stored securely and encrypted on your local machine by default. You have full control over your data."

... This invites unanswered questions about what exactly from the screenshots is stored, for how long, and what design backs the "securely" claim. Being up front about this would invite trust and helpful developer feedback.

gyanchawdhary•1h ago
Checkout https://techcrunch.com/2016/05/02/google-acquires-synergyse-... ..

Google acquired these guys back in 2016 to help users learn how to use Google cloud products via interactive tutorials using a step by step guidance / walkthrough (the user had to install a chrome extension)

One of the best use cases would be edtech … think of interactive labs where your product can guide learners / students to complete a task and hand hold them ..

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

https://tetrisbench.com/tetrisbench/
55•ykhli•4h ago•25 comments

Show HN: Ourguide – OS wide task guidance system that shows you where to click

https://ourguide.ai
16•eshaangulati•4h ago•4 comments

Show HN: SF Microclimates

https://github.com/solo-founders/sf-microclimates
17•weisser•21h ago•25 comments

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
126•beigebrucewayne•12h ago•76 comments

Show HN: Managed Postgres with native ClickHouse integration

32•saisrirampur•4d ago•7 comments

Show HN: An interactive map of US lighthouses and navigational aids

https://www.lighthouses.app/
95•idd2•1d ago•20 comments

Show HN: Hybrid Markdown Editing

https://tiagosimoes.github.io/codemirror-markdown-hybrid/
2•eropatori•3h ago•0 comments

Show HN: TUI for managing XDG default applications

https://github.com/mitjafelicijan/xdgctl
133•mitjafelicijan•1d ago•44 comments

Show HN: Netfence – Like Envoy for eBPF Filters

https://github.com/danthegoodman1/netfence
55•dangoodmanUT•1d ago•7 comments

Show HN: A small programming language where everything is pass-by-value

https://github.com/Jcparkyn/herd
81•jcparkyn•1d ago•54 comments

Show HN: Fence – Sandbox CLI commands with network/filesystem restrictions

https://github.com/Use-Tusk/fence
73•jy-tan•6d ago•23 comments

Show HN: I got tired of checking 5 dashboards, so I built a simpler one

https://anypanel.io/
4•dasfelix•7h ago•0 comments

Show HN: Bonsplit – Tabs and splits for native macOS apps

https://bonsplit.alasdairmonk.com
241•sgottit•1d ago•33 comments

Show HN: NukeCast – If it happened today, where would the fallout go

https://nukecast.com/
17•todd_tracerlab•19h ago•6 comments

Show HN: Delegation/Mixins C# Source Generators Library

https://www.nuget.org/packages/NameHillSoftware.TypeAdoption
2•whoisthemachine•9h ago•0 comments

Show HN: WhyThere – Compare cities side-by-side to decide where to move

https://whythere.life
12•daversa•19h ago•20 comments

Show HN: LLMNet – The Offline Internet, Search the web without the web

https://github.com/skorotkiewicz/llmnet
29•modinfo•1d ago•6 comments

Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

https://github.com/divyaprakash0426/autoshorts
70•divyaprakash•1d ago•34 comments

Show HN: C From Scratch – Learn safety-critical C with prove-first methodology

https://github.com/SpeyTech/c-from-scratch
66•william1872•1d ago•10 comments

Show HN: FaceTime-style calls with an AI Companion (Live2D and long-term memory)

https://thebeni.ai/
31•summerlee9611•23h ago•14 comments

Show HN: Zero – Serverless ECMWF weather visualization (WebGPU)

https://zero.hypatia.earth/
3•noiv•10h ago•1 comments

Show HN: Coi – A language that compiles to WASM, beats React/Vue

221•io_eric•6d ago•69 comments

Show HN: isometric.nyc – giant isometric pixel art map of NYC

https://cannoneyed.com/isometric-nyc/
1315•cannoneyed•4d ago•240 comments

Show HN: Alprina – Intent matching for co-founders and investors

https://www.alprina.com
2•Othrya•11h ago•1 comments

Show HN: CertRadar – Find every certificate ever issued for your domain

https://certradar.net/
20•ops_mechanic•1d ago•8 comments

Show HN: Sightline – Shodan-style search for real-world infra using OSM Data

https://github.com/ni5arga/sightline
22•ni5arga•1d ago•1 comments

Show HN: Open-source Figma design to code

https://github.com/vibeflowing-inc/vibe_figma
50•alepeak•2d ago•9 comments

Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)

https://huggingface.co/collections/Linum-AI/linum-v2-2b-text-to-video
156•schopra909•4d ago•24 comments

Show HN: StormWatch – Weather emergency dashboard with prep checklists

https://jeisey.github.io/stormwatch/
43•lotusxblack•2d ago•11 comments

Show HN: Nhx – Node.js Hybrid eXecutor (a uvx inspired tool)

https://www.npmjs.com/package/nhx
5•kolodny•20h ago•0 comments