frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Open-source alternative to ChatGPT Agents for browsing

https://github.com/trymeka/agent
104•ElasticBottle•6mo ago
Hey HN,

We are Winston, Edward, and James, and we built Meka Agent, an open-source framework that lets vision-based LLMs execute tasks directly on a computer, just like a person would.

Backstory:

In the last few months, we've been building computer-use agents that have been used by various teams for QA testing, but realized that the underlying browsing frameworks aren't quite good enough yet.

As such, we've been working on a browsing agent.

We achieved 72.7% on WebArena compared to the previous state of the art set by OpenAI's new ChatGPT agent at 65.4%. You can read more about it here: https://github.com/trymeka/webarena_evals.

Today, we are open sourcing Meka, our state of the art agent, to allow anyone to build their own powerful, vision-based agents from scratch. We provide the groundwork for the hard parts, so you don't have to:

* True vision-based control: Meka doesn't just read HTML. It looks at the screen, identifies interactive elements, and decides where to click, type, and scroll.

* Full computer access: It's not sandboxed in a browser. Meka operates with OS-level controls, allowing it to handle system dialogues, file uploads, and other interactions that browser-only automation tools can't.

* Extensible by design: We've made it easy to plug in your own LLMs and computer providers.

* State-of-the-art performance: 72.7% on WebArena

Our goal is to enable developers to create repeatable, robust tasks on any computer just by prompting an agent, without worrying about the implementation details.

We’d love to get your feedback on how this tool could fit into your automation workflows. Try it out and let us know what you think.

You can find the repo on GitHub and get started quickly with our hosted platform, https://app.withmeka.com/.

Thanks, Winston, Edward, and James

Comments

cahoodle•6mo ago
James here from the team! Let us know if you have feedback on either our cloud or open source repo. We want to push the frontiers for computer-use so that people can do less repetitive work.
hugs•6mo ago
that yc app deadline is just around the corner, isn't it? :)
cahoodle•6mo ago
Didn't even realize, maybe we'll put in an app!

I did YC back in S16 and was just reminiscing with a friend about how startups felt so different back then.

phsource•6mo ago
This is pretty impressive results given that this is not from one of the major AI labs. Congrats: https://blog.withmeka.com/meka-achieves-state-of-the-art-per...

Out of curiosity, what do you think contributed to this working better than even OpenAI agent or some of the other tools out there?

I'm not that familiar with how OpenAI and other agents like Browser Use currently work, but is this, in your opinion, the most important factor?

> An infrastructure provider that exposes OS-level controls, not just a browser layer with Playwright screenshots. This is important for performance as a number of common web elements are rendered at the system level, invisible to the browser page

tcwd•6mo ago
Thanks! Quite a few factors, here's a detailed post on the architecture: https://blog.withmeka.com/introducing-meka-an-open-source-fr...

IMO, the combination of having an "evaluator model" at the end to verify if the intent of the task was complete, and using multiple models that look over each other's work in every step was helpful - lots of human organization analogies there, like "trust but verify" and pair programming. Memory management was also very key.

anonymousiam•6mo ago
"* Full computer access: It's not sandboxed in a browser. Meka operates with OS-level controls, allowing it to handle system dialogues, file uploads, and other interactions that browser-only automation tools can't."

This seems pretty scary. Just recently an AI wiped a company database: https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-d...

xnx•6mo ago
Power and risk go hand in hand. Best approach is probably to run in a VM.
ElasticBottle•6mo ago
Yeap, that's exactly where the agents run in
tcwd•6mo ago
Hi there, I'm Edward, one of the co-founders. The OS that the agent operates in is a fresh confined environment, and not a company or personal computer.
dotancohen•6mo ago
So how can it do anything useful without access to actual data?

Can it be installed on a conventional (personal or work) desktop?

wsycharles0o•6mo ago
I would assume this capability is meant to be used in a docker?
tcwd•6mo ago
We explored using a containerized VM that exposed agentic controls in the open source version, but generally found that the cloud-based solutions were much faster to get started and easier to work with. Our repo contains adapters that work with several of the most popular cloud-hosted VM-as-a-service infra providers.

Definitely would be happy to be wrong and missed something here!

chhxdjsj•6mo ago
Hi, great work congrats!

Does it use openrouter for model selection? Which models did you achieve the webarena result with? Are there any open source models which are any good for this?

tcwd•6mo ago
For the WebArena result, we actually used a mixture of models checking each other's work and evaluating in real time. We found the verifications to be really effective in producing accurate results. Feel free to take a look at our architectural blog post to learn more in detail: https://blog.withmeka.com/introducing-meka-an-open-source-fr...

Unfortunately, we didn't try it out with open source models, but you are welcome to pull the repo and try with any model that has good visual grounding! (I heard UI-TARS and the latest Qwen visual model are quite good)

armanj•6mo ago
Nice job. It's exciting that the quality is approaching human level, but still I think we are spending way too many tokens, and the automation speed-up isn't really worth the total token price yet (unless you have very high-end gpus and you don't care about the completion speed of your tasks)
tcwd•6mo ago
Thanks! I agree with your sentiment for a lot of basic mundane tasks, but there are a number of tasks that exist today that are very high value yet still mundane and requires manual work.

Examples include form filling, sales prospecting, lead enrichment, or even just keeping track of prices of important things.

Over time, we do expect the cost of tokens on these models to decrease drastically. Powerful vision models are still relatively new compared to other generic LLM models for text. Definitely a lot of room for optimizations that we expect will come quickly!

esafak•6mo ago
I wish AI products competed on token efficiency.
h23bhati•6mo ago
This is awesome, biggest open-source browser agent?
nsonha•6mo ago
Tested a few agentic browsers such as genspark, fellou and comet. I found the vision approach less effective comparing to the dom-based approach, and seem quite slower too. Does it need a reasoning step to type an url into the address bar?
ElasticBottle•6mo ago
I see it 3 fundamental pillars:

* Accuracy (does it do what we want) * Reliability (does it consistently do what we want) * Speed (does it do what we want fast)

We're mostly focused on solving 1 and maybe in some capacity 2.

The belief here is that models are going to get better. With that smaller models will become more capable. This will result in speed ups automatically.

So yes, I will concur that speed is probably not the main strength of our framework right now, but believe that we will get there with time.

gotten09•6mo ago
this is pretty awesome, on the cloud env though I got the error: Error: AIProviderError: AI provider failed to generate text. Timeout while downloading https://playmatic-screenshots.s3.us-west-2.amazonaws.com

Also the task I gave it this was the result:

I was unable to retrieve any live fare data because both airline sites became unworkable in the remote session (xxxx selectors would not stay open; xxxxsearch could not be completed before the session ended). Below is a blank comparison table you can fill in once you gather the prices manually:

is that the current state of best in class computer use agents? or is more of a we need to modify it until it is good for our use case?

trying to provide helpful feedback and honest curiosity, this is awesome work

pants2•6mo ago
This is great. Will it solve the three biggest issues with ChatGPT agent?

1. Proxy support for sites that block the user

2. Browser extensions support for uBlock, password managers, etc.

3. CAPTCHA solving

ElasticBottle•6mo ago
All good questions and is the second piece aside from the agent.

1. We have proxy support right now and most traffic are already being proxied today. Might allow fine tuning of this over time 2. We have plans to allow this, but not currently available 3. We are leveraging some anti bot/captcha solving, but I do believe this will be a never ending problem in some sense

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
63•valyala•2h ago•31 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
39•valyala•2h ago•4 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
14•gnufx•1h ago•1 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
131•AlexeyBrin•8h ago•25 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
255•ColinWright•2h ago•289 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
143•1vuio0pswjnm7•9h ago•169 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
839•klaussilveira•22h ago•251 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
77•vinhnx•5h ago•9 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
196•alephnerd•3h ago•138 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
57•thelok•4h ago•8 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1067•xnx•1d ago•615 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
87•onurkanbkrc•7h ago•5 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
497•theblazehen•3d ago•186 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
218•jesperordrup•13h ago•79 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
238•alainrk•7h ago•378 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
581•nar001•7h ago•260 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
18•momciloo•2h ago•1 comments

The F Word

http://muratbuffalo.blogspot.com/2026/02/friction.html
5•zdw•3d ago•0 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
42•rbanffy•4d ago•8 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
10•languid-photic•3d ago•1 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
32•marklit•5d ago•4 comments

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
15•josephcsible•45m ago•10 comments

History and Timeline of the Proco Rat Pedal (2021)

https://web.archive.org/web/20211030011207/https://thejhsshow.com/articles/history-and-timeline-o...
19•brudgers•5d ago•4 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
116•videotopia•4d ago•35 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
83•speckx•4d ago•94 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
280•isitcontent•23h ago•38 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
203•limoce•4d ago•112 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
291•dmpetrov•23h ago•156 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
23•sandGorgon•2d ago•13 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
560•todsacerdoti•1d ago•272 comments