frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: PageAgent, A GUI agent that lives inside your web app

https://alibaba.github.io/page-agent/
43•simon_luv_pho•2h ago
Title: Show HN: PageAgent, A GUI agent that lives inside your web app

Hi HN,

I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.

I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.

Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.

To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.

I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!

Comments

simon_luv_pho•2h ago
This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:

- GitHub: https://github.com/alibaba/page-agent

- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)

- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...

I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!

jauntywundrkind•2h ago
Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,

> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.

https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...

simon_luv_pho•1h ago
Thanks for sharing! We need more projects like this in the JS ecosystem.
klueinc•1h ago
I've been trying to arrive to something like this with my own sidepanel extension called Klue but its more of a user notes + web page context approach. Nice to see another take on this! https://chromewebstore.google.com/detail/cackjmmgcmnkjnffabk...
pscanf•1h ago
Very cool!

I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!

simon_luv_pho•1h ago
Thanks!

Bookmarklets are such an underrated feature. It's super convenient to inject and test scripts on any page. Seemed like the perfect low-friction entry point for people to try it out.

Spent some time on that UX because the concept is a bit hard to explain. Glad it worked!

MeteorMarc•1h ago
Confusing name because of the existence of pageant, the putty agent.
kirth_gersen•1h ago
Came here to say missed opportunity to call it "PAgent". Rolls off the tongue better than Page Agent.
simon_luv_pho•57m ago
Darn. Pageant would've been a nice name though. Maybe `page-agent.js` is more relevant in web dev community.
mmarian•40s ago
I think page agent is good. I've never heard of putty's pageant. And I think it's better to distinguish it from general meaning of pageant (for beauty).
coreylane•1h ago
Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?
simon_luv_pho•24m ago
Thanks!

It supports any OpenAI-compatible API out of the box, so AWS Bedrock, LiteLLM, Ollama, etc. should all work. The free testing LLM is just there for a quick demo. Please bring your own LLM for long-time usage.

dzink•1h ago
Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?
simon_luv_pho•31m ago
Full transparency: I work at Alibaba and published this under Alibaba's open-source org. I maintain it during work hours, so yes, Alibaba technically pays me for it. That said, this is my project — it's MIT-licensed, includes no backend service, and is open for anyone to audit.

The free testing LLM endpoint is hosted on Alibaba Cloud because I happen to have some company quota to spend, but it's not part of the library. Bring your own LLM and there is zero data transmission to Alibaba or anywhere else you haven't configured yourself.

I highly recommend using it with a local Ollama setup.

mentalgear•55m ago
> Data processed via servers in Mainland China

Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?

simon_luv_pho•46m ago
Please use your own LLM api instead!

The free testing LLM is Qwen hosted by Aliyun. Qwen and DeepSeek are the only ones I can afford to offer for free. It's just there to lower the try-out barrier; please DO NOT rely on it.

The library itself does NOT include any backend service. Your data only goes to the LLM api you configured.

I tested it on local Ollama models it works fine.

simon_luv_pho•29m ago
I'm looking into a European testing endpoint. The problem is I don't have enough resources to figure out all the legal and compliance requirements, and persuading my company to pay for that infrastructure is gonna be a tough sell.
general_reveal•36m ago
I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?

The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?

Mnexium•33m ago
Curious - how does it perform with captchas and other "are you human" stuff on the web?
simon_luv_pho•16m ago
I added in the system prompt that it should skip CAPTCHAs and hand control back to the user. Currently working on a proper human-in-the-loop feature. That's actually one of the key advantages of running the agent inside your own browser.
popalchemist•22m ago
Does it support long-click / click-and-drag?
simon_luv_pho•12m ago
Not yet. Currently focused on the more common interaction patterns. PRs welcome though!
popalchemist•8m ago
Gotcha. Still very cool! Congrats on the release.

Show HN: Jido 2.0, Elixir Agent Framework

https://jido.run/blog/jido-2-0-is-here
168•mikehostetler•4h ago•38 comments

Show HN: PageAgent, A GUI agent that lives inside your web app

https://alibaba.github.io/page-agent/
43•simon_luv_pho•2h ago•24 comments

Show HN: Git Diff for Agentic Coding

https://github.com/msoedov/justshowmediff
2•alex_mia•23m ago•0 comments

Show HN: Vet – Prevent coding agents from making mistakes

https://imbue.com/product/vet/
13•andrewlak•53m ago•4 comments

Show HN: I'm an AI growth-hacking agent. My premise was a lie.

2•happymouse•1h ago•1 comments

Show HN: Poppy – A simple app to stay intentional with relationships

https://poppy-connection-keeper.netlify.app/
164•mahirhiro•15h ago•76 comments

Show HN: Tracemap – run and visualize traceroutes from probes around the world

https://tracemap.dev/
6•solhuang•3h ago•2 comments

Show HN: Hormuz Crisis Dashboard Real-time shipping disruption tracker

https://www.hormuztracker.com/
5•MrNekked•5h ago•0 comments

Show HN: OmoiOS–190K lines of Python to stop babysitting AI agents (Apache 2.0)

https://github.com/kivo360/OmoiOS
2•kanddle•3h ago•2 comments

Show HN: AgnosticUI – A source-first UI library built with Lit

https://www.agnosticui.com/
3•roblevintennis•3h ago•1 comments

Show HN: Stacked Game of Life

https://stacked-game-of-life.koenvangilst.nl/
189•vnglst•5d ago•26 comments

Show HN: echo.html, between Feather Wiki and Roam with commands like Emacs

https://m15o.net/echo/
3•m15o•4h ago•0 comments

Show HN: Keep large tool output out of LLM context: 3x accuracy 95% fewer tokens

https://github.com/lourencomaciel/sift-gateway
6•loumaciel•6h ago•1 comments

Show HN: Vertex.js – A 1kloc SPA Framework

https://lukeb42.github.io/vertex-manual.html
44•LukeB42•4d ago•25 comments

Show HN: Voice skill for AI agents – sub-200ms latency via native SIP

https://github.com/nia-agent-cyber/openai-voice-skill
2•nia-agent•6h ago•0 comments

Show HN: Rust compiler in PHP emitting x86-64 executables

https://github.com/mrconter1/rustc-php
64•mrconter11•4d ago•48 comments

Show HN: A shell-native cd-compatible directory jumper using power-law frecency

https://github.com/jghub/sd-switchdir
23•jghub•1d ago•8 comments

Show HN: SpiderSuite – Multi-engine web crawler and proxy for security research

https://spidersuite.io/
3•sub3suite•7h ago•1 comments

Show HN: I made a zero-copy coroutine tracer to find my scheduler's lost wakeups

https://github.com/lixiasky-back/coroTracer
45•lixiasky•2d ago•3 comments

Show HN: podcast-cli - A Rust CLI for Podcast Index & YouTube Subtitles

https://github.com/the-waste-land/podcast-cli
2•liweixin•8h ago•1 comments

Show HN: DevTrack – A personal dashboard to track your developer growth

https://devtrack-rose.vercel.app
3•nullAffi•8h ago•0 comments

Show HN: Anaya – CLI that scans codebases for DPDP compliance violations

https://github.com/sandip-pathe/anaya-scan
4•sandippathe•9h ago•1 comments

Show HN: AlifZetta – AI Operating System That Runs LLMs Without GPUs

https://axz.si/
4•padamkafle•9h ago•1 comments

Show HN: PyMath Preview – preview LaTeX math in Python docstrings inside VS Code

https://github.com/sankarebarri/pymath-preview
2•sankarebarri•10h ago•1 comments

Show HN: I built a sub-500ms latency voice agent from scratch

https://www.ntik.me/posts/voice-agent
564•nicktikhonov•2d ago•153 comments

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

https://github.com/kossisoroyce/timber
204•kossisoroyce•3d ago•33 comments

Show HN: A GFM+GF-MathJax/Latex HTML formatting adventure

https://github.com/scottvr/phart/blob/main/docs/GHM-LATEX.md
4•ycombiredd•4d ago•1 comments

Show HN: Paste a URL and watch multiple AI models redesign it side-by-side

https://shuffle.dev/ai-website-redesign
7•kemyd•20h ago•2 comments

Show HN: Omni – Open-source workplace search and chat, built on Postgres

https://github.com/getomnico/omni
172•prvnsmpth•3d ago•42 comments

Show HN: Your AI Slop Bores Me

https://www.youraislopbores.me/
11•mikidoodle•13h ago•4 comments