frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Env-shelf – Open-source desktop app to manage .env files

https://env-shelf.vercel.app/
1•ivanglpz•37s ago•0 comments

Show HN: Almostnode – Run Node.js, Next.js, and Express in the Browser

https://almostnode.dev/
1•PetrBrzyBrzek•45s ago•0 comments

Dell support (and hardware) is so bad, I almost sued them

https://blog.joshattic.us/posts/2026-02-07-dell-support-lawsuit
1•radeeyate•1m ago•0 comments

Project Pterodactyl: Incremental Architecture

https://www.jonmsterling.com/01K7/
1•matt_d•1m ago•0 comments

Styling: Search-Text and Other Highlight-Y Pseudo-Elements

https://css-tricks.com/how-to-style-the-new-search-text-and-other-highlight-pseudo-elements/
1•blenderob•3m ago•0 comments

Crypto firm accidentally sends $40B in Bitcoin to users

https://finance.yahoo.com/news/crypto-firm-accidentally-sends-40-055054321.html
1•CommonGuy•4m ago•0 comments

Magnetic fields can change carbon diffusion in steel

https://www.sciencedaily.com/releases/2026/01/260125083427.htm
1•fanf2•4m ago•0 comments

Fantasy football that celebrates great games

https://www.silvestar.codes/articles/ultigamemate/
1•blenderob•4m ago•0 comments

Show HN: Animalese

https://animalese.barcoloudly.com/
1•noreplica•5m ago•0 comments

StrongDM's AI team build serious software without even looking at the code

https://simonwillison.net/2026/Feb/7/software-factory/
1•simonw•5m ago•0 comments

John Haugeland on the failure of micro-worlds

https://blog.plover.com/tech/gpt/micro-worlds.html
1•blenderob•6m ago•0 comments

Show HN: Velocity - Free/Cheaper Linear Clone but with MCP for agents

https://velocity.quest
2•kevinelliott•7m ago•1 comments

Corning Invented a New Fiber-Optic Cable for AI and Landed a $6B Meta Deal [video]

https://www.youtube.com/watch?v=Y3KLbc5DlRs
1•ksec•8m ago•0 comments

Show HN: XAPIs.dev – Twitter API Alternative at 90% Lower Cost

https://xapis.dev
1•nmfccodes•8m ago•0 comments

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

https://psychotechnology.substack.com/p/near-instantly-aborting-the-worst
1•eatitraw•15m ago•0 comments

Show HN: Nginx-defender – realtime abuse blocking for Nginx

https://github.com/Anipaleja/nginx-defender
2•anipaleja•15m ago•0 comments

The Super Sharp Blade

https://netzhansa.com/the-super-sharp-blade/
1•robin_reala•16m ago•0 comments

Smart Homes Are Terrible

https://www.theatlantic.com/ideas/2026/02/smart-homes-technology/685867/
1•tusslewake•18m ago•0 comments

What I haven't figured out

https://macwright.com/2026/01/29/what-i-havent-figured-out
1•stevekrouse•18m ago•0 comments

KPMG pressed its auditor to pass on AI cost savings

https://www.irishtimes.com/business/2026/02/06/kpmg-pressed-its-auditor-to-pass-on-ai-cost-savings/
1•cainxinth•19m ago•0 comments

Open-source Claude skill that optimizes Hinge profiles. Pretty well.

https://twitter.com/b1rdmania/status/2020155122181869666
3•birdmania•19m ago•1 comments

First Proof

https://arxiv.org/abs/2602.05192
4•samasblack•21m ago•1 comments

I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

https://mohammedeabdelaziz.github.io/articles/trendscope-market-scanner
1•mohammede•22m ago•0 comments

Kagi Translate

https://translate.kagi.com
2•microflash•23m ago•0 comments

Building Interactive C/C++ workflows in Jupyter through Clang-REPL [video]

https://fosdem.org/2026/schedule/event/QX3RPH-building_interactive_cc_workflows_in_jupyter_throug...
1•stabbles•24m ago•0 comments

Tactical tornado is the new default

https://olano.dev/blog/tactical-tornado/
2•facundo_olano•26m ago•0 comments

Full-Circle Test-Driven Firmware Development with OpenClaw

https://blog.adafruit.com/2026/02/07/full-circle-test-driven-firmware-development-with-openclaw/
1•ptorrone•26m ago•0 comments

Automating Myself Out of My Job – Part 2

https://blog.dsa.club/automation-series/automating-myself-out-of-my-job-part-2/
1•funnyfoobar•26m ago•1 comments

Dependency Resolution Methods

https://nesbitt.io/2026/02/06/dependency-resolution-methods.html
1•zdw•27m ago•0 comments

Crypto firm apologises for sending Bitcoin users $40B by mistake

https://www.msn.com/en-ie/money/other/crypto-firm-apologises-for-sending-bitcoin-users-40-billion...
1•Someone•27m ago•0 comments
Open in hackernews

Show HN: Web-eval-agent – Let the coding agent debug itself

https://github.com/Operative-Sh/web-eval-agent
84•neversettles•9mo ago
Hey HN! We’ve been building an MCP server to help AI-assisted web app developers by using browser agents to test whether changes made by an AI inside an editor actually work. We've been testing it on scenarios like verifying new flows in a UI, or checking that sending a chat request triggers a response. The idea is to let your coding agent both code and evaluate if what it did was correct. Here’s a short demo with Cursor: https://www.youtube.com/watch?v=_AoQK-bwR0w

When building apps, we found the hardest part of AI-assisted coding isn’t the coding—it’s tedious point-and-click testing to see if things work. We got tired of this loop: open the app, click through flows, stare at the network tab, copy console errors to the editor, repeat. It felt obvious this should be AI-assisted too. If you can vibe-code, you should be able to vibe-test!

Some agents like Cline and Windsurf have browser integrations, but Cline’s (via Anthropic Computer Use) felt slow and only reported console logs, and Windsurf’s didn’t work reliably yet. We got so tired of manually testing that we decided to fix it.

Our MCP server sits between your IDE agent (Cursor/Windsurf/Cline/Continue) and a Playwright-powered browser-use agent. It spins up the browser, navigates your app per instructions from the IDE agent, and sends back steps, console events, and network events so the IDE agent can assess the app’s state.

We proxy Browser-use’s original Claude calls and swap in Gemini Flash 2.0, cutting latency from ~8s → ~3s per step. We also cap console/network logs at 10,000 characters to stay within context limits, and filter out irrelevant logs (e.g., noisy XHR requests).

At the end, the browser agent outputs a summary like:

  Web Evaluation Report for http://localhost:5173 
  Task: delete an API key and evaluate UX
  Steps: Home → Login → API Keys → Create Key → Delete Key
  Flow tested successfully; UX had problems X, Y, Z...
  Console (8)...   Network (13)...   Timeline of events (57) …
This gives the coding agent the ability to recognize the console and network errors, or any issues with clicking around, and have the coding agent fix them before returning back to the user. (There’s a longer example in the README at https://github.com/Operative-Sh/web-eval-agent.)

Try it in Cursor / Cline / Windsurf / Claude Desktop: (macOS/Linux):

  curl -LSf https://operative.sh/install.sh -o install.sh
  less -N install.sh   # inspect if you’d like
  bash install.sh      # installs uv + jq + Playwright + server
  # then in Cursor/Cline/Windsurf/Continue: craft a prompt using the web_eval_agent tool
(For Windows, there’s a 4-line manual install in the README.)

What we want to do next: pause/go for OAuth screens; save/load browser auth states; Playwright step recording for automated test creation and regression test creation; supporting Loveable / v0 / Bolt.new sites by offering a web version.

We’d love to hear your feedback, especially if you’ve experienced the pain of having to manually test changes happening in your web apps after making changes from inside your IDE, or if you’ve tried any alternative MCP tools for this that have worked well.

Try it out if you feel it’d be helpful for your workflow: https://github.com/Operative-Sh/web-eval-agent. (note: the server hits our operative.sh proxy to cover Gemini tokens. The MCP server itself is OSS; Anthropic base-URL support is coming soon. Free tier included; heavy users can grab the $10 plan to offset our model bill.)

Let us know what you think! Thanks for reading!

Comments

GreenGames•9mo ago
This is very cool! Does your MCP server preserve cookies/localStorage between steps, or would developers need to manually script auth handshakes?
neversettles•9mo ago
Between steps it would preserve cookies, but atm when the playwright browser launches, it starts with a fresh browser state, so you'd have to o-auth to log in each time.

We're adding browser state persistence soon, hoping to enable it so once you sign in with google once, it can stay signed in on your local machine.

GreenGames•9mo ago
Oh okay thanks - that would be fire tbh
esafak•9mo ago
Is there a benchmark for this? If not, you ought to (crowd?)start one for everybody's sake.
neversettles•9mo ago
We started with using browser-use because they had the best evals: https://browser-use.com/posts/sota-technical-report

- but we found that Laminar came out with a better browser agent (& a better eval): https://www.lmnr.ai/ so we're looking to migrate over soon!

nico•9mo ago
Looks amazing. Congrats on the release

How does this compare to browser mcp (https://browsermcp.io/)?

neversettles•9mo ago
In browser MCP, looks like cursor controls each action along the way, but actually what we wanted was a single browser agent that had a high quality eval that could perform all the actions independently (browser-use)
proc0•9mo ago
Interesting. I see from the video example it took a lot of steps and there is a lot of output for a simple task. I'm thinking this probably doesn't scale very well and more complex tasks might have performance challenges. I do think it's the right direction for AI coding.
neversettles•9mo ago
Yeah, I suppose to esafak's point, perhaps a benchmark for browser agent QA testing would be needed.
klntsky•9mo ago
I told windsurf to install playwright, identify crucial workflows of the app and add tests for them. Not without my input, but I got what I wanted without getting the hands dirty.

Does this thing add much on top?

neversettles•9mo ago
The power here is the coding agent has the ability to test visually if - and like a human would. So if the button isn't visible, the browser agent would use vision to detect that it's missing.

It sorta tests 'just like a human would' to make sure the flow that's implemented works as it's expecting to.

gitroom•9mo ago
Gotta say, getting rid of all the clicking and checking just sounds like a huge win. I hate wasting time on all that.