Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator

443•hugs•1mo ago

i started the selenium project 21 years ago. vibium is what i'd build if i started over today with ai agents in mind. go binary under the hood (handles browser, bidi, mcp) but devs never see it. just npm install vibium. python/java coming. for claude code: claude mcp add vibium -- npx -y vibium v1 ships today. ama.

Comments

christophilus•1mo ago

Nice. I was just thinking of building this very thing. Glad to see I won’t have to. I’ll check it out after the holidays.

hugs•1mo ago

what specific things were you looking for?

christophilus•1mo ago

My use case is mainly to make it easier to show Claude Code a problem with an SPA as I develop it. Claude’s decent at traditional server-rendered stuff, since it can curl and reason a bit about the responses, but SPAs require something more like your tool here.

xnx•1mo ago

You might try Google Antigravity since it is natively designed to test in the browser as it codes.

anamexis•1mo ago

My number one question would be how it compares to Playwright -- differences in design goals, capabilities, advantages and disadvantages.

hugs•1mo ago

it's a good questionn! i partially addressed this in the "why vibium" section of the v1 announcement: https://github.com/VibiumDev/vibium/blob/main/docs/updates/2...

to save a click, i'll post it here, too:

-----------

why vibium?

there are dozens of "ai-powered browser" tools now. so why this one?

the selenium ecosystem is massive: millions of tests, thousands of companies, decades of investment. but there's no obvious bridge to the ai future. many have moved to playwright — and for good reason: it's fast, easy to use, has popular features like auto-waiting, integrated video recording, and a ton of other batteries included.

vibium takes the same approach. batteries included. great dx. but built for where the industry is going: ai agents that need to drive browsers.

when i did those interviews in september, the response wasn't just "cool idea." it was relief. the community trusts us to build this bridge because we built the last two: selenium in 2004, appium in 2012.

community and ecosystem are the moat.

anamexis•1mo ago

Thanks! I don't think it really answers my question though.

AFAIK Playwright also takes the approach of batteries included, great dx, and has a lot of good integration with AI agents.

Basically, what sets Vibium apart?

therunninglight•1mo ago

vibium is hardly 2 days old. the 5-yr plan is grand. quoting hugs "goal is to embrace what playwright has done well, then extend what's possible".

anamexis•1mo ago

I appreciate that it's brand new, but I'm still very interested in knowing about what sets it apart, even if that is all just vision at this point. What does "extending what's possible" mean?

hugs•1mo ago

i appreciate the persistence in getting an answer. :-)

i was being a little too cute using "embrace" and "extend" in a previous comment (look up "embrace, extend, extinguish"). sorry about that.

the big idea with vibium in v2 and beyond is to bring to test automation something old and boring in robotics: the "sense - think - act" loop. sensors observe the world, a brain makes decisions, and actuators carry them out.

right now most browser tools extend what's possible at the "act" layer. they make it easier for an llm to click, type, and observe the browser.

that's useful, but it mostly enables one-off demos. every run starts from scratch. there's no accumulated understanding of the app, and long workflows are navigated by guessing and retries.

what vibium is trying to extend is not just action, but the loop.

vibium v1 is just the "act" part, which i'm calling clicker. it clicks buttons, types, and navigates the browser.

retina and cortex are coming in v2. retina turns real interaction into durable signal (manual exploration, existing tests, production usage). cortex builds on that signal to create a navigable model of workflows that an llm can plan through, instead of reasoning from raw html each time.

clicker is the execution layer. playwright mcp largely lives here. vibium clicker overlaps in scope, but is designed from the start to feed sensing and planning rather than being the whole system.

so yes, playwright mcp covers part of this. what's missing today is first-class sense and think. that's the gap vibium is exploring, even if v1 only ships the act layer.

tl;dr:

sense -> retina (v2)

think -> cortex (v2)

act -> clicker (v1)

i've spent the past few months talking about applying the "sense - think - act" loop to browser automation, but at some point i realized i needed to "talk less, ship more". :-) i'm looking forward to shipping retina and cortex so we can see whether the full loop is actually a step change beyond what playwright or playwright+mcp can do.

happy to dig deeper if helpful.

anamexis•1mo ago

extremely cool, thank you!!

suchintan•1mo ago

This is very cool. We were thinking about doing something very similar with Skyvern

What was the reason you went down this path instead of extending selenium with AI features?

hugs•1mo ago

i partially addressed this in the "why vibium" section of the v1 announcement: https://github.com/VibiumDev/vibium/blob/main/docs/updates/2...

but why a new thing vs extending selenium? it's a little complicated, but neither selenium nor playwright were designed with ai in mind from day 1. with vibium, i'm optimizing for "vibe coding" and ai-driven workflows first.

suchintan•1mo ago

This makes sense. I guess I wanted to understand why starting from scratch was better than "fixing" selenium, but perhaps "fixing" selenium isn't an option?

hugs•1mo ago

for the entire testing tools industry, in some ways, selenium was the "final boss" to beat. every new tool had to trash selenium in their marketing. eventually those "hit points" added up. "fixing selenium" is as much as of a branding problem as it is a technical problem. "oh, there's a new version of selenium? i heard selenium sucks!" is actually a problem that has to be dealt with. an entire new generation of coders only know "playwright rules, selenium drools".

of course, i have a new host of problems by going all in with "vibium"... i'm making a huge bet that "vibe coding" is a trend, not a fad. (it could still be a fad! we'll see if this post ages well soon enough!)

suchintan•1mo ago

That makes a lot of sense. Sometimes it's easier to leave the baggage behind. It's too bad..selenium is a masterpiece. Thanks for sharing it with the world

gsnedders•1mo ago

Also, as someone on the periphery of Selenium (mostly via WebDriver), some of the challenge is that Selenium has a huge amount of test code already written for it — and making radical API changes would break every test already written for it, and at that point you’re effectively a new library.

It’s gonna be very interesting to watch exactly how the adoption of WebDriver BiDi goes with Selenium, especially once WebDriver Classic starts to go away, and how API stability is balanced with exposing more and more async capabilities.

moss_dog•1mo ago

I'd love to be able to lock down the browser to only allow certain URLs (e.g. localhost) so I can give Claude (and other tools) carte blanche to use browser automation (rather than manually approving each command). Is this something on your radar / roadmap?

ramoz•1mo ago

If using Claude Code, a simple hook can govern `browser_navigate` (mcp)

A custom sh script or something for whitelists would take ~5min to setup.

For more robust governance (many policies), you can write Rego using https://github.com/eqtylab/cupcake

https://code.claude.com/docs/en/hooks#mcp-tool-naming

moss_dog•1mo ago

Thank you for the links / info! I'm looking forward to digging into this.

hugs•1mo ago

fully aware of the "blast radius" risk of using claude to do stuff. i'm doing all my vibium dev in a vm using UTM (and you should, too!). wonder if there are some network rules we can add.

i did post a v2 roadmap on the github repo. might be time to start the draft for v3!

falcor84•1mo ago

As I see it, the only real solution is to put it into a container that has a firewall with a short whitelist.

moss_dog•1mo ago

I was looking into this earlier -- presumably you'd also need to allowlist Claude itself (whatever endpoints it hits to run inference etc). VM firewall gets a little trickier with Claude's web search tool, too.

The solution I landed on recently was to locally modify the Chrome devtools MCP to launch the browser instance with strict network restrictions. I believe the implementation used `--host-resolver-rules`, blocking all URLs by default with an environment variable to control the allowlist (which, in hindsight, Claude can easily work around if it needs to -- I should probably just hard-code the allowlist).

falcor84•1mo ago

> you'd also need to allowlist Claude itself

This is Anthropic's recommended setup for devcontainers:

https://github.com/anthropics/claude-code/blob/main/.devcont...

You may want to adapt it and particularly to remove the GitHub and VS Code stuff.

michelb•1mo ago

Maybe this is something: https://github.com/vibheksoni/stealth-browser-mcp

mannanj•1mo ago

Hi this looks really valuable, thanks for developing and sharing. Would you share some use cases and how you or your users use it personally? would love to see some examples and feel the aha "That's how I'd like to use it too!" and it would help me drive and se the problems I have as being solvable by this too rather than seeing a tool/solution looking for a problem. (not implying you're that, but without examples/use cases that's the default way I think)

hugs•1mo ago

lots of people have already been posting examples of how they used vibium on linkedin. (code's only been available for a day or two, so we're just getting started!)

we also have a new discord server for the project that we just spun up and will be opening up more widely soon. discord could be a good place to share uses cases and experiments until we set up a more formal website structure).

rancar2•1mo ago

I wasn’t able to gather the future state plans beyond what’s noted in the V2 plans:

https://github.com/VibiumDev/vibium/blob/main/V2-ROADMAP.md

What’s next 5 years look like given that you are very good at building long-term projects that last and evolve through time? And for a very specific example, what’s the plan for incorporating new standards like Agent Skills as they quickly evolve and launch?

hugs•1mo ago

short term: yeah, we should totally add agent skills asap! new year's eve goal?

as far as long term plans go, i like the tim o'reilly quote: "create more value than you capture".

with selenium, we created an entire ecosystem of tools, users, companies, and economic activity. (literally billions of usd -- it's a story frequently ignored by the tech press when looking for "open source success stories".) but i hope to do the same with vibium. there will likely be a hosted "vibium.cloud" hosted service. i also hope there will be lots of them. in a similar way, there weren't many "hosted selenium" services when i started sauce labs. now there's a bunch. browserstack, lambdatest, etc.

it was also not really an accident we did that with selenium. there is a lot of behind-the-scenes consensus building that happens to make things like a w3c webdriver standard happen. (funfact: vibium relies on the new! w3c standard "webdriver bidi" protocol heavily inspired by the chrome devtools protocol used by playwright. (tl;dr: it's just json over websockets.)

i'm betting on industry cooperation, standards, and shared prosperity. that's my 5 year plan!

hcoura•1mo ago

How does it handle context bloat between the browser and the llm?

Any plans of exposing more of the browser? For instance playwright is able to store tracing files the agent may decide to read to understand some requests / payloads…

Any plans on allowing the agent to run an arbitrary js script?

hugs•1mo ago

i definitely have plans to expose more of the browser! at the moment, it's very limited. i'm not sure if anyone has completely nailed the context bloat problem -- it's worth more study and benchmarking. i suspect the long term answer is "don't use mcp". but mcp (warts and all) felt like a table-stakes feature for a v1 release.

also need to clarify: there are two apis exposed right now: the mcp server and a "plain old" js/ts api. the js/api does have the ability to run arbitrary js. theoretically, you could ask an agent to write a vibium script with the js/ts library, and have the ai run that... (which ironically? is also a way to deal with the issue of context bloat)

michelb•1mo ago

Interesting, I've been using this skill https://github.com/SawyerHood/dev-browser to save on context and get some more speed. Will try this out!

hugs•1mo ago

yeah, looking to play more with (and support) skills with vibium soon.

chews•1mo ago

big virtual hugs for @hugs... thank you for the Christmas gift of fewer keystrokes :-)

nivekney•1mo ago

Aside from the project itself, I am learning a lot just from reading the commits. Mostly about the process when one knows how they'd do it.

https://github.com/VibiumDev/vibium/commits/main/?after=ffc3...

therunninglight•1mo ago

likewise, watching it take shape in real time is fascinating

hugs•1mo ago

thanks. if ai-assisted development is the future of software (and i think it is), it was important i put my money where my mouth is and develop it by doing exactly that.

ripped_britches•1mo ago

What is the benefit of using this instead of playwright?

hugs•1mo ago

it will be more obvious in v2.

v1 is about getting to a base-line of functionality.

things get interesting in v2: https://github.com/VibiumDev/vibium/blob/main/V2-ROADMAP.md

badlogic•1mo ago

Neat. Any reason why the MCP server doesn't expose a JavaScript/eval tool? Current models excel at writing JS to drive and inspect the DOM. They aren't great at driving browsers via screenshots.

hugs•1mo ago

> why the MCP server doesn't expose a JavaScript/eval tool?

no reason other than my number #1 goal was "ship something". i only started the actual coding on dec 11. it's been a bit of a sprint the last two weeks!

though "image-based" vs "dom-based" testing approaches is a very big topic! (look forward to researching that more in the future.)

v1 announcement: https://github.com/VibiumDev/vibium/blob/main/docs/updates/2...

coty•1mo ago

FWIW, if you have Claude Code or the like, you can quickly prompt your way to an eval function in MCP. It already exists in clicker and the client API. You can use it to get the accessibility tree, for example, and use that to find what to fill out and click.

999900000999•1mo ago

As someone who's made a good living primarily in UI automation for over a decade, thank you.

It's been an interesting journey.I do think Playwright is the defacto standard now, but Selenium was the original browser driver.

Anyway, how does Vibium compare to Playwright ? Playwright's main advantage is it has official support for multiple languages.

hugs•1mo ago

> I do think Playwright is the defacto standard now

i'll politely pushback a little. i think it's safe (at this moment in time) to say: playwright wins the first derivative, but selenium wins the "area under the curve". selenium is very entrenched in many parts of the world, especially outside of SF/USA. part of the inbound interest i've been getting for vibium is from those selenium users who want some kind of bridge to the future, but didn't have an obvious path forward beyond "dump selenium, adopt playwright"...

part of my plan with vibium post-v1 is to give that massive (and it truly is massive, i'm not bragging) installed base of selenium users an upgrade path to more agentic coding options.

steve_adams_86•1mo ago

Selenium is distinctly more popular among scientists in my experience. I've only seen playwright at startups.

therunninglight•1mo ago

same in my experience.

999900000999•1mo ago

I've personally implemented Playwright in a large enterprise company. Puppeteer before that.

Generally if you have a lot of legacy selenium scripts it's probably not worth it to switch everything over, but if you're creating a new UI automation framework I've just never seen selenium as a first choice for that.

Don't get me wrong it's still solid technology though.

therunninglight•1mo ago

yes, i've noticed some tendency for [agentic] qa services to go the puppeteer and then playwright route (sometimes either or). it's almost too easy to get running with pw. and, hence, enticing for any startup that wants to get off the ground asap and break even. seems vibium may tap into that startup market as it matures.

legacy selenium suites are a strong contender for vibium adoption. i think hugs has been surveying a ton of folks, he may have a better bird's eye view of the potential user base.

as for academic use of selenium, we have boni garcia - maker/popularizer or selenium webdriver manager teaching at a uni in spain. (maybe an isolated example, but he's rather known in the community)

zenmac•1mo ago

Isn't Selenium vs Playwright more a Java vs JS/ES/TS thing?

hugs•1mo ago

i started the selenium project as just a js and python thing. but it got really popular in java circles.

vibium will also be a big tent project and support ts/js, python, java, and as many other languages we can support, too. but i started with ts/js because that's what's extremely popular right now. (and i like js!)

(side-note: i f'ing love nim, we will be supporting nim, too. getting nim on the tiobe top 20 is on my bucket list.)

999900000999•1mo ago

Are you solo developing vibium ?

Playwright really simplifies getting setup. It won't work for everyone, but within 30 seconds Playwright will download it's needed browsers along with a test runner.

I also find the documentation is much better/consolidated.

Definitely open to helping you out if I can be of assistance.

hugs•1mo ago

"npm install vibium" installs the needed browser on install.

right now, code-wise -- for the code you see in github at the moment -- it's just me and my ai pal, claude. but there's a growing cast of (human!) characters also helping with all the other things we need to do to run a successful project. patches and tokens welcome!

999900000999•1mo ago

Would you suggest Vibium as ready for production use?

On second thought not being controlled by Microsoft might be good enough to differentiate it from playwright. It's not a good idea for a single company to control so much.

I'm thinking their needs to be an easy way to sandbox vibe automation. I don't want to accidentally click an ad and vibe test an unrelated website .

hugs•1mo ago

it's just v1. good for experimenting, not for production (yet!).

and yes, it's perhaps impolite and silently taboo for me to say it out loud, but "not being controlled by microsoft" is on the top 10 list of "why vibium and not playwright". most of the sf world has gone all-in on playwright; i'm betting on web standards. i hope people will notice that distinction. however, i realize vibium can't win just as the "not microsoft" option; it will also need to win on the merits.

microsoft is already incredibly well-positioned to own the whole dev stack. from their investment in openai to vscode, github, and playwright... they are in a powerful position. i'm old enough to remember the last time ms had massive power over the stack (see: internet explorer 6).

999900000999•1mo ago

Not to mention they own NPM( GitHub owns NPM).

C# is fantastic and Playwright's C# support is extremely good. I've been in the C# ecosystem for years, but I'm still a bit worried about having a single company control so much of my livelihood. Let's just say I have a little bit of experience with Microsoft, they don't particularly pay well so I'm a little confused as to if they can maintain so many frameworks indefinitely.

Are you going to keep Vibium a community project, or you hoping to raise capital/bootstrap.

Don't get me wrong, I absolutely love what you're doing but I don't know if one person can create something so complex. The good news is Chrome is the only browser most normal people use, which reduces the amount of testing you have to actually do.

At the vast majority of places I've worked at we basically just test on Chrome and then if it doesn't work on other browsers ohh well. Every now and then you would get a project manager who would suggest adding Firefox/Safari testing. But it's never been a priority.

the_gipsy•1mo ago

So it's vibecoded?

hugs•1mo ago

it's named vibium for a reason.

jjmarr•1mo ago

Selenium was a part of my degree. I had a course involving it.

maxloh•1mo ago

Any idea how did Puppeteer lose the race?

999900000999•1mo ago

The Puppeteer team moved to from Google, to Microsoft and started Playwright.

https://blog.logrocket.com/playwright-vs-puppeteer/

hugs•1mo ago

google fell victim to one of the classic blunders. the most famous of which is inventing the transformer behind gpt but not productizing it first. but only slightly less well-known is this: letting puppeteer go without a plan.

mbrochh•1mo ago

You might also want to look into Stagewright.

starik36•1mo ago

How do you install it into Claude Desktop? I tried the following, but it fails.

    "vibium": {
      "command": "npx",
      "args": [
        "-y",
        "@vibium/mcp@latest"
      ]
    }

therunninglight•1mo ago

"vibium": {

"command": "npx",

"args": [

"-y",

"vibium"

]

}

source: https://www.linkedin.com/posts/apzal-bahin_ai-mcp-browseraut...

starik36•1mo ago

That didn't work - generates errors. But there is a PR in progress that should fix it at some point soon.

jeff4f5da2•1mo ago

Since it's in go, wouldn't it be great if it also expose go api?

hugs•1mo ago

yes, yes it would!

captainregex•1mo ago

entirely possible I’m just really bad at this stuff but I can’t get browser agents to do simple report pulls without running into a captcha or a dropdown menu that breaks its brain. hopefully this is the one!

hugs•1mo ago

good security will always be the eternal enemy of easy automation.

therunninglight•1mo ago

the realm of bots vs bots

rukuu001•1mo ago

Hey man, just wanted to say thanks for Selenium - it was a game changer and had a big impact on my professional life.

I’m interested in checking out Vibium - I’ve been a reluctant adopter of Playwright and hopeful for a new approach.

hugs•1mo ago

playwright got a lot of things right. one of the big ones was a fast websockets+json way to drive the browser. (vibium is using the w3c standard equivalent - webdriver bidi). but they also raised the bar on usability and developer experience. i hope to get to the level of "click, click, awesome" out-of-the-box experience that playwright did so well.

pkiv•1mo ago

If you're already using Playwright, I'd love for you to give Stagehand a try (https://github.com/browserbase/stagehand) - it has compatible-ish API, but built for automation, not testing.

hugs•1mo ago

stagehand is good stuff!

zenmac•1mo ago

Hmmm so it is basically using https://www.director.ai for the AI natural language stuff right?

brianjking•1mo ago

Director.ai uses Stagehand, which is made by Browserbase.

irjustin•1mo ago

> I’ve been a reluctant adopter of Playwright and hopeful for a new approach.

Out of curiosity, why?

Personally, I'm a massive lover of playwright. Flakiness has been so much lower for us.

m00dy•1mo ago

I think the future is mobile when it comes AI agents, If you are also looking for a new approach, I would suggest DeepWalker. It is mobile first automation, currently works on android.

[0]: https://deepwalker.xyz

dmd•1mo ago

Does it allow you to inject js, modify the DOM, and most crucially monitor/modify network requests? I do those things in probably 95-99% of the time I reach for playwright mcp in claude, and from the "For Agents" part of the README, it seems like all this can do is click/type/screenshot?

hugs•1mo ago

> inject js, modify the DOM, and most crucially monitor/modify network requests

not yet. definitely on the roadmap, though. goal is to embrace what playwright has done well, then extend what's possible...

dmd•1mo ago

Thanks. I would love to understand what people are doing with Playwright that doesn't involve those things. I really can't recall ever using it where that wasn't what I was doing. I use it letting Claude fix things. You can't fix what you can't see! What else are people using it for? Obviously there must be a (very popular!) use case for "just clicking", but I can't seem to imagine it.

hugs•1mo ago

don't underestimate the "just clicking" use case!

therunninglight•1mo ago

hugs built an entire career on the "click" case (just making a button work). no wonder, the vibium go binary us called "clicker".

VoidWhisperer•1mo ago

In my experience, we've used playwright significantly for unit/integration tests combining it with react-testing-library to verify individual components and also whole (mocked, we used something else that I can't seem to remember for E2E tests) flows within that React application

Robdel12•1mo ago

To me doing network interception in browser driven tests is a smell like that. Unless you’re running vs a full mocked server (like MSW).

I’m a big fan of testing exactly like a user. Users don’t use network intercepts, timeouts, etc. All of my most reliable tests assert on DOM state. If the user doesn’t see it, don’t assert on it.

dmd•1mo ago

Almost nothing I do has to do with what users actually see though. It’s all things like “why didn’t the SSO flow work”.

mewpmewp2•1mo ago

I guess the issue is that real world does smell terribly. I wish I could just have the perfect World like my side projects always have, but not the case with the commercial ones making money.

doctorpangloss•1mo ago

all i want is monitored network requests, because flutter + amazon appsync apps are so radioactive

hugs•1mo ago

good feedback. thank you!

j2kun•1mo ago

Is this something you use to generate static browser tests that no longer use the LLM? Or would you need to use the LLM every time you run the tests?

therunninglight•1mo ago

- No LLM Getting Started with Vibium (beginner-friendly): https://github.com/VibiumDev/vibium/blob/main/docs/tutorials...

- MCP option (where tokens will eventually get burned) Getting Started with Vibium MCP: https://github.com/VibiumDev/vibium/blob/main/docs/tutorials...

rahimnathwani•1mo ago

If an agent gets a copy of the screen using browser_screenshot and then wants to click somewhere on that screen, how is it meant to find the right css selector to pass to browser_click?

There's a browser_find method, but that assumes you already know what type of element it is. But I can't always tell what type of element something is just by looking at a screenshot.

What have I missed or misunderstood?

coty•1mo ago

For right now, the MCP server doesn’t expose quite enough to navigate on its own.

I’ve added a browser_evaluate tool in my fork—though I haven’t committed or pushed a PR yet. With that, the agent can call JavaScript to get the accessibility tree and then use that to navigate via browser_find.

This and much more will be coming soon. See the V2 roadmap for more insight: https://github.com/VibiumDev/vibium/blob/main/V2-ROADMAP.md

hugs•1mo ago

one of the wild things about vibe coding is... i want to add that feature, but i'm slightly more interested in using the prompt/spec you might have used to create it, not the patch itself.

rahimnathwani•1mo ago

Sometimes an AI-written spec based on the code is better than that the spec/prompts used to create the patch.

coty•1mo ago

Yeah. Let me see if I can find or reconstitute that prompt. Ultimately I wanted to have a system for automagically keeping Java up-to-date with JavaScript.

OutOfHere•1mo ago

I will wait for full Python and Go support.

therunninglight•1mo ago

python client coming soon to PyPi

hugs•1mo ago

have plans for new year's eve?

jstummbillig•1mo ago

Cool. Can this currently be used with codex in the same way?

hugs•1mo ago

not today. but the plan is to be very "big tent" and support as many options as possible.

michaelsbradley•1mo ago

Anyone attempting something similar for Qt/QML based apps?

rubymamis•1mo ago

I thought about it as well. Might do it at some point.

pryelluw•1mo ago

Any plans to support local models through llama.cpp or similar?

hugs•1mo ago

100% yes. favorites?

pryelluw•1mo ago

I daily drive llama.cpp so that please.

hugs•1mo ago

which local models? (e.g. qwen, llama, mistral?)

pryelluw•1mo ago

Oh qwen mostly. The smaller ones > 10B. Happy to be a tester ! Email in profile

didip•1mo ago

And this handles login sessions, cookies, etc.? So much of the modern web is now hidden behind login sessions.

hugs•1mo ago

there's generally only 3 ways to make money in browser automation:

1) test automation (my specialty)

2) data scraping / crawling

3) business/robotic process automation (e.g. back-office data entry, processing invoices, etc.)

when it comes to handling login sessions, cookies, etc. test automation is the easiest. (you create disposable test logins and use them in each test. it's mostly a solved problem.)

handling logins is a way gnarlier problem in data scraping and business process automation. i'm focused on test automation in v1. (i'm hoping experts in data scraping and process automation can help me improve vibium in this regard.)

mlrtime•1mo ago

Thank you for helping me understand this. I'm trying to use various tools to automate downloading my energy usage from my provider. They seem to be going to great lengths to try and prevent this. It sucks because I want to conserve and reconcile usage.

entergy.com ... I'm hoping your tool or playwright will help me get it into home assistant.

palidanx•1mo ago

Sorry, I kind of have a dumb question here. So we have a bunch of legacy selenium scripts that do end to end user testing, and occasion they break (either because of a network error, or devs committed something that breaks a test).

We were looking at seeing if a model could look at the screenshot of the failure, some of the original website source code, and try to fix the failing test.

My question is with vibium, would it make more sense to port the legacy tests over to vibium, and if a test fail, use its capabilities to try to self-heal?

hugs•1mo ago

i apologize, but i'll answer your question with a metaphor.

i want to build an island resort and a bridge from the mainland to get there. do i build the island resort first or the bridge first?

here's my thinking: if the resort is popular and a fun place to be, there will be a huge incentive to build the bridge next. but we might also find out that building the bridge will ultimately be economically impractical and we should just stick to using ferry boats. at least we'll have a cool island resort to go to, though!

so for now, i'm just focusing on building the island resort at the moment. but i really, really want to build that bridge, too, asap.

eth0up•1mo ago

I recently conducted a little research project involving YouTube comments, where selenium made it possible. Quite cool to see the legend here, still active.

Thanks, from a very tiny human.

hugs•1mo ago

thanks for using it! <heart-emoji/>

i try to say this often, but it never feels like enough: yes, i started the project, but it's a relay race. i ran the first few laps, but the project has been going for 21 years now. there's dozens (hundreds?) of people to thank at this point for the success and impact that the selenium project has achieved.

jaredwy•1mo ago

Hey! You really helped my career. We chatted a lot maybe 13-15 years ago and really helped atlassian scale their selenium tests at the time.

So glad to see you are still in this space!

saikatsg•1mo ago

The good old days of browser automation! Thanks a lot for Selenium and all the best for Vibium :)

hugs•1mo ago

thanks!

mintflow•1mo ago

great project, just add the mcp and try in claude code, it automatically help me to broser a local forum and give a summary of hot posts today.

this is the second MCP i added, quit impressive.

Merry christmas!

xpe•1mo ago

I'm thinking about various security models. When it comes to browser integration, I'm particularly interested in defense-in-depth rather than trusting the shIP activities to the captAIn.

Bad puns aside, this is an important area! Many of us want to know what people are building (or should be built) to put security front and center -- or at least integrated --rather than an afterthought. Components might include: sandboxing, access rules, logging, honey-pot mode, perhaps even read-only access for a "protector" agent. (Another common approach here is wishful thinking such as "this ship is unsinkable", but that ship has sailed for me.)

Putting on my dark humor hat, if all else fails, there could be a "time to panic" mode triggered by certain criteria (e.g. a regex matching "your bank account balance is $0").

What can biology teach us? When you think about defense-in-depth for "insider threats" in the human body, what comes to mind? There are many; here is one: reflexes. Your motor planning neurons might send your hand towards a hot surface and succeed, but they will be quickly countermanded [1] by a reflex arc [2].

P.S. Please don't interpret my style as a lack of seriousness. If used carelessly, this technology opens up some impressive botnet potential. Luckily, with the benefit of wishful thinking or just flat-out ignorance, we can trust humans and AIs to be adequately trustworthy. [2] [3]

[1]: maybe overruled is a better term?

[2]: https://en.wikipedia.org/wiki/Reflex_arc

[3]: https://www.schneier.com/blog/archives/2007/02/the_psycholog...

[4]: https://www.anthropic.com/research/agentic-misalignment

gsnedders•1mo ago

Is there any plan about how to deal with indirect prompt injection attacks that could trivially be lurking in malicious web pages, given the agent can navigate to an arbitrary URL?

hugs•1mo ago

short-term mediation is always always always run it in a virtual machine with as minimal credentials as possible.

password-app•1mo ago

Congrats on shipping v1. The "sense-think-act" architecture is exactly what's needed for agentic workflows.

Re: the login handling discussion upthread—I've been using browser-use for automated password rotation (breach response use case). Two patterns that might be relevant to Vibium's roadmap:

Credential injection: Instead of putting passwords in the prompt, pass them via a sensitive_data parameter. The agent calls enter_password() without the value ever appearing in LLM context. Solves the "blast radius" concern several people raised.

Deterministic 2FA handling: When email verification is required, open Gmail in a new tab, but extract OTPs with local regex—not AI. The LLM orchestrates navigation; code extraction stays local. Handles ~90% of email 2FA automatically.

These patterns should work with any browser automation framework. Built a Mac app around this: https://thepassword.app

Would love to see Vibium add first-class support for credential injection in the API—it's the missing piece for any security-sensitive automation.

rule2025•1mo ago

What can this mainly do? How is it different from Chrome devtools MCP?

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: MCP App to play backgammon with your LLM

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: ARM64 Android Dev Kit

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Compile-Time Vibe Coding

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Daily-updated database of malicious browser extensions

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Horizons – OSS agent execution engine

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Sem – Semantic diffs and patches for Git

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: MCP App to play backgammon with your LLM

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: ARM64 Android Dev Kit

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Slack CLI for Agents

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Compile-Time Vibe Coding

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Daily-updated database of malicious browser extensions

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Horizons – OSS agent execution engine

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

Show HN: I built a RAG engine to search Singaporean laws

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Sem – Semantic diffs and patches for Git

Show HN: A password system with no database, no sync, and nothing to breach

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator

Comments