Show HN: A web browser agent in your Chrome side panel

https://github.com/parsaghaffari/browserbee

153•parsabg•8mo ago

Hey HN,

I'm excited to share BrowserBee, a privacy-first AI assistant in your browser that allows you to run and automate tasks using your LLM of choice (currently supports Anthropic, OpenAI, Gemini, and Ollama). Short demo here: https://github.com/user-attachments/assets/209c7042-6d54-4fc...

Inspired by projects like Browser Use and Playwright MCP, its main advantage is the browser extension form factor which makes it more convenient for day to day use, especially for less technical users. Its also a bit less cumbersome to use on websites that require you to be logged in, as it attaches to the same browser instance you use (on privacy: the only data that leaves your browser is the communication with the LLM - there is no tracking or data collection of any sort).

Some of its core features are as follows:

- a memory feature which allows users to memorize common and useful pathways, making the next repetition of those tasks faster and cheaper

- real-time token counting and cost tracking (inspired by Cline)

- an approval flow for critical tasks such as posting content or making payments (also inspired by Cline)

- tab management allowing the agent to execute tasks across multiple tabs

- a range of browser tools for navigation, tab management, interactions, etc, which are broadly in line with Playwright MCP

I'm actively developing BrowserBee and would love to hear any thoughts, comments, or feedback.

Feel free to reach out via email: parsa.ghaffari [at] gmail [dot] com

-Parsa

Comments

m0rde•8mo ago

This looks fun, thanks for sharing. Will definitely give it a shot soon.

I read over the repo docs and was amazed at how clean and thorough it all looks. Can you share your development story for this project? How long did it take you to get here? How much did you lean on AI agents to write this?

Also, any plans for monetization? Are you taking donations? :)

parsabg•8mo ago

Thanks a lot! :)

I might write a short post on the development process, but in short:

- started development during Easter so roughly a month so far

- developed mostly using Cline and Claude 3.7

- inspired and borrowed heavily by Cline, Playwright MCP, and Playwright CRX which had solved a lot of the heavy lifting already - in a sense this project is those 3 glued together

I don't plan to monetize it directly, but I've thought about an opt-in model for contributing useful memories to a central repository that other users might benefit from. My main aim with it is to promote open source AI tools.

dmos62•8mo ago

I presume that this works by processing the html and feeding to the llm. What approaches did you take for doing this? Or am I wrong?

fermuch•8mo ago

Under the "tools" part of the README it shows the following observation tools: - browser_snapshot_dom - browser_query - browser_accessible_tree - browser_read_text - browser_screenshot

So most likely the LLM can chose how to "see" the page?

donclark•8mo ago

Looks great. Any plans for this to work in Firefox?

parsabg•8mo ago

I'll be exploring a FF port. There are a couple of tight Chrome dependencies that need to be rethought (IndexedDB for storage and CDP for most actions)

owebmaster•8mo ago

Indexeddb is not Chrome only

nico•8mo ago

Looks amazing, love it. And I see that in your roadmap the top thing is saving/replaying sessions

Related to that, I'd suggest also adding the ability to "templify" sessions, ie. turn sessions into sort of like email templates, with placeholder tags or something of the like, that either ask the user for input, or can be fed input from somewhere else (like an "email merge")

So for example, if I need to get certain data from 10 different websites, either have the macro/session ask me 10 times for a new website (or until I stop it), or allow me to just feed it a list

Anyway, great work! Oh also, if you want to be truly privacy-first you could add support for local LLMs via ollama

parsabg•8mo ago

Thank you!

I like that suggestion. Saved prompts seem like an obvious addition, and having templating within them makes sense. I wonder how well would "for each of the following websites do X" prompts work (so have the LLM do the enumeration rather than the client - my intuition is that it won't be as robust because of the long accumulated context)

Edit: forgot to mention it does support Ollama already

nico•8mo ago

Yeah, that "for each" needs to be code instead of prompt. Ideally you want to only use the LLM for the first time you run the task, but after "figuring out the path", you want to run that directly through code

So for the example above, the user might have to do: "do this for this website", then save macro, then create template, then run template with input: [list of 10 websites]

dbdoskey•8mo ago

Looks amazing. Would love something like this in Firefox or Zen. Mozilla released Orbit, but it was never something that ended up really being useful.

tux3•8mo ago

Firefox already has something similar natively, but it's not enabled by default. If you turn on the new sidebar they have an AI panel, which basically looks like an iframe to the Claude/OAI/Gemini/etc chat interface. Different from Orbit.

dbdoskey•8mo ago

That sidebar doesn't have the ability to do any actions on the browser tab, or have the data form the browser as a context in any way. It is just a simple iframe.

Vinnl•8mo ago

If you click the three-dots menu above the iframe, you can select "Show shortcut when selecting text". That allows you to select text and then provide that as context to an AI prompt.

(At least, that's how I understand it - I have the feature turned off myself.)

parsabg•8mo ago

Thank you! :)

Would love to explore a FF port. Right now, there are a couple of tight Chrome dependencies:

- CDP - mostly abstracted away by Playwright so perhaps not a big lift

- IndexedDB for storing memories and potentially other user data - not sure if there's a FF equivalent

dbdoskey•8mo ago

Thanks! Will track your project for the future. Looks very promising

joshstrange•8mo ago

FF supports IndexedDB directly, it has supported it, fully, since version 16 [0].

[0] https://caniuse.com/indexeddb

dataviz1000•8mo ago

You might be able to reduce the amount of information sent to the LLM by 100 fold if you use a stacking context. Here is an example of one made available on Github (not mine). [0] Moreover, you will be able to parse the DOM or have strategies that parse the DOM. For example, if you are only concerned with video, find all the videos and only send that information. Perhaps parsing a page once finding the structure and caching that so the next time only the required data is used. (I see you are storing tool sequence but I didn't find an example of storing a DOM structure so that requests to subsequent pages are optimized.)

If someone visits my website that I control using your Chrome Extension, I will 100% be able to find a way to drain all their accounts probably in the background without them even knowing. Here are some ideas about how to mitigate that.

The problem with Playwright is that it requires Chrome DevTools Protocol (CDP) which opens massive security problems for a browser that people use for their banking and managing anything that involves credit cards are sensitive accounts. At one point, I took the injected folder out of Playwright and injected it into a Chrome Extension because I thought I needed its tools, however, I quickly abandoned it as it was easy to create workflows from scratch. You get a lot of stuff immediately by using Playwright but likely you will find it will be much lighter and safer to just implement that functionality by yourself.

The only benefit of CDP for normal use is allowing automation of any action in the Chrome Extension that requires trusted events, e.g. play sound, go fullscreen, banking websites what require trusted event to transfer money. I'm my opinion, people just want a large part of the workflow automated and don't mind being prompted to click a button when trusted events are required. Since it doesn't matter what button is clicked you can inject a big button that says continue or what is required after prompting the user. Trusted events are there for a reason.

[0] https://github.com/andreadev-it/stacking-contexts-inspector

parsabg•8mo ago

I will look into this. Speed and inefficiency due to the low information density of raw DOM tokens is the single biggest issue for this type of thing right now.

dataviz1000•8mo ago

Here are some ideas on how to cache selectors for reuse and get all the text to use with full text search to find clickable elements slow but still faster than a round trip to a LLM. [0] These are very naive but that is the only place there is money doing this. If you create 100 of these optimizations like only selecting visible selectors or selectors that contain video if the context is video you can greatly limit the amount of the useless data being sent to the LLM.

[0] https://chatgpt.com/c/682a2edf-e668-8004-a8ce-568d5dd0ec1c

parsabg•8mo ago

The link doesn't load for me. Can you try sharing again?

dataviz1000•8mo ago

I tried to make it cleaner and organized with code and then output but I think I made it worse without explaining what or why. [0] Sorry. It is only some examples of how to query the DOM to isolate the most important information.

I'm not definite (I'm supposed to be working on something else sorry if I'm wrong here), however, I believe this is the code Browser Use uses for stacking context including piercing the shadow DOM. [1] Because they build a map with all the visible elements, they can inject different color borders around them. Here they test for the topmost elements in the viewport. [2]

[0] https://chatgpt.com/share/682a68bf-c6a0-8004-9c20-15508e6b3b...

[1] https://github.com/browser-use/browser-use/blob/55d078ed5a49...

[2] https://github.com/browser-use/browser-use/blob/55d078ed5a49...

kanzure•8mo ago

possibly something like https://github.com/romansky/dom-to-semantic-markdown could also help for this use case.

parsabg•8mo ago

Looks powerful at least for read only use cases. Will have a look and compare token stats. Thanks

dataviz1000•8mo ago

That is awesome. A list of power tools on Amazon went from 2.5MB of HTML to 236KB of markup. That is huge! Wow, thank you for sharing.

This is half the equation. Also, lot of the information in the markup can be used to query elements to interact with because it keeps the link locations which can be used to navigate or select elements. On the other hand, by using the stacking context, it is possible query only elements that are visible which removes all elements that can't be interacted with.

barbazoo•8mo ago

> Since BrowserBee runs entirely within your browser (with the exception of the LLM), it can safely interact with logged-in websites, like your social media accounts or email, without compromising security or requiring backend infrastructure.

Does it send the content of the website to the LLM?

parsabg•8mo ago

yes, the LLM can invoke observation tools (e.g. read the text/DOM or take a screenshot) to retrieve the context it needs to take the next action

barbazoo•8mo ago

So maybe something we want to be mindful of before using this on banking, health, etc.

How is it “privacy-first” then if it literally sends all your shit to the LLM?

joshstrange•8mo ago

You can use Ollama as the backend so the data never leaves your computer.

Also, the line is blurry for some people on “privacy” when it comes to LLMs. I think some people, not me, think that if you are talking directly to the LLM provider API then that’s “private” whereas talking to a service that talks to the LLM is not.

And, to be fair, some people use privacy/private/etc language for products that at least have the option of being private (Ollama).

blooalien•8mo ago

> How is it “privacy-first” then if it literally sends all your shit to the LLM?

Because it supports Ollama, which runs the LLM entirely locally on your own hardware, thus data sent to it never leaves your machine?

Edit: joshstrange beat me to the same conclusion by mere moments. :)

rizs12•8mo ago

Aren't browsers starting to ship with built-in LLMs? I don't know much about this but if so then surely your extension won't need to send queries to LLM APIs?

boredpudding•8mo ago

There's two types of built-in LLM's:

- The ones the user sees (like a sidepanel). These often use LLM API's like OpenAI.

- The browser API ones. These are indeed local, but are often very limited smaller models (for Chrome this is Gemini Nano). Results from these would be lower quality, and of course with large contexts, either impossible or slower than using an API.

A4ET8a8uTh0_v2•8mo ago

Interesting. I can't play with it now since out for grocery run, but can it interact with elements on the page if asked directly?

parsabg•8mo ago

yes, you can ask it to both observe (e.g. query an element) or interact with (e.g. click on) elements, for example using selectors or a high level reference like the label or the color of a button

krembo•8mo ago

Chrome canary already had Gemini Nano built in into the browser for local LLM. For the use cases you mentioned there is no need to call a 3rd party.

nsonha•8mo ago

Gemini Nano sounds like a model that only does basic autocomplete or semantic inference, no tool calling for sure. What this kind of product seems to be headed to is something like Manus, which needs agentic (thinking, planing, tool calling) capabilities.

parsabg•8mo ago

In a way this should be a core feature of any browser and if this project accelerates/improves that by 5% I will be very happy!

The fact that Chrome and Gemini are, at least for now, owned by the same company raises huge privacy and consumer choice concerns for me though, and I see benefit in letting the user choose their model, where/how to store their data, etc.

blks•8mo ago

I would really appreciate if browsers will not make this ai slop their core functionality, and keep on having web browser as their core functionality.

throwaway314155•8mo ago

> Gemini Nano

Can't possibly do tool calling well enough to handle browser automation.

neonwatty•8mo ago

definitely

stoicfungi•8mo ago

Looks awesome. Last couple of months, I've built a similar Chrome Extension, https://overlay.one/en

I also started with with conversational mode and interactive mode, but later removed the interactive mode to keep its features a bit simple.

parsabg•8mo ago

That looks very cool. Would love to chat if you're open to it

stoicfungi•8mo ago

happy to, sent you a message

reliablereason•8mo ago

Looks like the example video is extremely expensive. It racks up almost 2$ of usage in about a minute.

parsabg•8mo ago

Good spot. I probably shouldn't have the 2nd most expensive model in the demo!

Some of the cheaper models have very similar performance at a fraction of a cost, or indeed you could use a local model for "free".

The core issue though is that there's just more tokens to process in a web browsing task than many other tasks we commonly use LLMs for, including coding.

0xd1r•8mo ago

Can it perform DOM manipilation as well, like fill forms or would the LLM response need to be structured for each specific site to use it on? And would an LLM be able to perform such a task?

parsabg•8mo ago

It can fill forms - the agent can invoke a large number of tools to both observe and interact with a page

0xd1r•8mo ago

How does it do so? Just DOM manipulation, viewport scanning or something of the sort?

hiccuphippo•8mo ago

Can this be used to automatically remove the plethora of cookie banners/modals polluting the web?

parsabg•8mo ago

Yes! Sometimes it does it even without the user asking which is very satisfying :)

reustle•8mo ago

uBlock Origin will do this for free

saadshamim•8mo ago

I keep getting the "Error: Failed to stream response from [Gemini | OpenAi] API. Please try again." - tried valid new keys from both google/openai

parsabg•8mo ago

Is that with 2.5 Flash? I got that error intermittently with that mode, but the other Gemini models worked fine. I'll investigate

saadshamim•8mo ago

ah yea 2.0 flash is working. 2.5 doesn't and OpenAi 4.0 and mini models dont work either. The error message should probably say to try other models because i was pretty confused

hoppp•8mo ago

What makes it privacy first?

Shouldn't it use local llm then?

Does it send my password to a provider when it signs up to a website for me?

neonwatty•8mo ago

yes most likely

matula•8mo ago

Very nice. I tried with Ollama and it works well.

The biggest issue is having the Ollama models hardcoded to Qwen3 and Llama 3.1. I imagine most Ollama users have their favorites, and probably vary quite a bit. My main model is usually Gemma 3 12B, which does support images.

It would be a nice feature to have a custom config on the Ollama settings page, save those to Chrome storage, and use that in the 'getAvailableModels' method, along with the hardcoded models.

parsabg•8mo ago

Great suggestion, will add custom Ollama configurations to the next release

jaggs•8mo ago

This looks really well done. I particularly like the simple user interface. A lot of the time these things are unnecessarily complex I feel.

afshinmeh•8mo ago

Looks great! Tried a few examples and models, works very well.

flakiness•8mo ago

Looks good,

I've been disappointed by the fact that Chrome doesn't have this. I don't want to give full access to my browsing to a random extension (not an offense to this specific one, but general security hygiene - there are so many scammy extensions out there). Chrome (or browser of your choice) already has that trust, good or bad. Please use the trust in a good way. It's a table stake at this point.

tnjm•8mo ago

Thanks for building this!

It struggled with tasks I asked for (e.g. download the March and April invoices for my GitHub org "myorg") -- it got errors parsing the DOM and eventually gave up. I recommend taking a look at the browser-use approach and specifically their buildDOMTree.js script. Their strategy for turning the DOM into an LLM parsable list of interactive elements, and visually tagging them for vision models, is unreasonably effective. I don't know if they were the first to come up with it, but it's genius and extracting it for my browser-using agents has hugely increased their effectiveness.

gurvinderd•8mo ago

Good work. I did the same thing - https://chromewebstore.google.com/detail/auto-browse/ngnikmg... and https://github.com/auto-browse/auto-browse-agent I tried playwright-crx, but it increased the size of the extension and sometime the browser got stuck. So i moved to using puppeteer instead. To save token, i have not enabled screenshot, instead relying on DOM.

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: MCP App to play backgammon with your LLM

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: MCP App to play backgammon with your LLM

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: A web browser agent in your Chrome side panel

Comments