frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
167•theblazehen•2d ago•49 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
674•klaussilveira•14h ago•202 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
951•xnx•20h ago•552 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
24•kaonwarb•3d ago•20 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
123•matheusalmeida•2d ago•33 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
58•videotopia•4d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
232•isitcontent•14h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
225•dmpetrov•15h ago•118 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
332•vecti•16h ago•145 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
495•todsacerdoti•22h ago•243 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
383•ostacke•20h ago•95 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
360•aktau•21h ago•182 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
289•eljojo•17h ago•176 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
34•jesperordrup•4h ago•16 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
413•lstoll•21h ago•279 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
20•bikenaga•3d ago•9 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
18•speckx•3d ago•8 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
64•kmm•5d ago•8 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
91•quibono•4d ago•21 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
258•i5heu•17h ago•197 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
32•romes•4d ago•3 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
44•helloplanets•4d ago•42 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
60•gfortaine•12h ago•26 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1070•cdrnsf•1d ago•446 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
36•gmays•9h ago•12 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
150•vmatsiiako•19h ago•70 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
289•surprisetalk•3d ago•43 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
150•SerCe•10h ago•143 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
186•limoce•3d ago•101 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•14h ago•14 comments
Open in hackernews

Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously

https://app.propolis.tech/#/launch
116•mpapazian•3mo ago
Hi HN, we're Marc and Matt, and we're building Propolis (app.propolis.tech/#/launch). We use browser agents to simulate users in order to report bugs and write e2e tests. Today, you can launch 10s-100s of agents that collaboratively explore a website and report back on pain points + propose e2e tests that can run as part of your CI.

You can try an initial run (two minute set up) to get a feel for the product for free here: app.propolis.tech/#/launch. Or watch our demo video: https://www.tella.tv/video/autonomous-qa-system-walkthrough-...

The Problem

Both Matt and I have been thinking about software quality for the last 10 years. While at Airtable Matt worked on the infrastructure team responsible for deploys and thought a lot about how to catch bugs before users did. Deterministic tests are incredibly effective at ensuring pre-defined behavior continues to function, but it's hard to get meaningful coverage & easy to "stub/mock" so much that it's no longer representative of real usage.

I like to pitch what we're building now as a set of “users” you can treat like a canary group without worrying about impacting real users.

What we do: Propolis runs "swarms" of browser agents that collaborate to come up with user journeys, flag points of friction, and propose e2e tests that can then be run more cheaply on any trigger you'd like. We have customers from public companies to startups running "swarms" regularly to massively increase the breadth of their automated testing + running the produced tests as part of their CI pipeline to ensure that more specific flows stay working without needing to worry about updating playwright/selenium tests.

One thing that really excites me about this approach is how flexible "checks" can be since they're evaluated partially via LLM, for example we've caught bugs related to the quality of non-deterministic output (think a shopping assistant recommending a product that the user then searches for and can’t find).

Pricing and Availability

It's production-ready today at $1000/month unlimited-use + active support for early users willing to give feedback and request features. We're also happy to work with you for capped-use / hobby plans at lower prices if you'd like to use it for smaller or personal projects.

We'd love to hear from the HN community - especially curious if folks have thoughts on what else autonomous agents could validate beyond bugs and functional correctness. Try it out and let us know what you think!

Comments

ttamslam•3mo ago
Hey I'm Matt! Really excited to answer any questions.

To elaborate a little bit on the "canary" comment --

For a while at Airtable I was on the infra team that managed the deploy (basically click run and then sit and triage issues for a day), One of my first contributions on the team was adding a new canary analysis framework that made it easier to catch and rollback bugs automatically. Two things always bothered me about the standard canary release process:

1) It necessarily treats some users as lower value, and thus more acceptable to risk exposing bugs to (this makes sense for things like free-tier, etc. but the more you segment out, the less representative and thus less effective your canary is). When every customer interaction matters (as is the case for so many types of businesses) this approach is harder to justify

2) Low frequency / high impact bugs are really difficult to catch in canary analysis. While it’s easy to write metrics that catch glaring drops/spikes in metrics, more subtle high impact regressions are much harder and often require user reports (which we did not factor in as part of our canary). Example: how do you write a canary metric that auto rolls back when an enterprise account owner (small % of overall users) logs in and a broken modal prevents them from interacting with your website.

I view what we’re building at Propolis as an answer to both of these things. I envision a deploy process (very soon) that lets us roll out to simulated traffic and canary on THAT before you actually hit real users (and then do a traditional staged release, etc.)

bfeynman•3mo ago
seems like you are misappropriating what canaries are useful and used for... they are designed to be lightweight and shallow... hence the name and whole analogy, canaries never were meant to determine if a mine was structurally unsafe etc
svnt•3mo ago
I don’t see how they have it wrong?

Canaries are lightweight and shallow once they exist. Building a canary from the ground up is still beyond us, but if you don’t want to kill an actual bird that is pretty much the only way to go.

mritchie712•3mo ago
I've been looking for this exact thing. A couple questions:

Are your agents good at testing other agents? e.g. I want your agent to ask our agent a few questions and complete a few UI interactions with the results.

How do you handle testing onboarding flows? e.g. I want your agent to create a new account in our app (https://www.definite.app/) and go thru the onboarding flow (e.g. add Stripe and Hubspot as integrations).

ttamslam•3mo ago
> Are your agents good at testing other agents? e.g. I want your agent to ask our agent a few questions and complete a few UI interactions with the results.

I'd say this is one of our strong suits I think, specifically the UIs tend to be easy to navigate for browser agents, and the LLM as a judge offers pretty good feedback on chat quality and it can inform later actions. (I'd be remiss not to mention though that a good LLM eval framework like Braintrust is probably the best first line though)

> How do you handle testing onboarding flows?

We can step through most onboarding flows if you start from logged out state & give the context it'll need (i.e. a stripe test card, etc.) That said though, setting up integrations that require multi-page hops is still a pain point in our system and leaves a lot to be desired.

Would love to talk more about your specific case and see if we can help! founders@propolis.tech

tommy18•2mo ago
Then how do you compare with braintrust? Aren’t they doing same thing for Agents?
kodefreeze•3mo ago
I just did this test with our web QA agent - kodefreeze.com, it was able to test creating an account until it reached the screen that requires email confirmation.

Support for being able to receive email/custom actions is on our roadmap, but would love to see if getting this far would be valuable to you. The test was with the email=test@kodefreeze.com.

8note•3mo ago
fraud/abuse/compliance is a good usecase for this kinda thing - an abuse vector is kinda like a bug, except that the system does what its expected to do.

testing for abuse stuff ive always found quite difficult, since to work well, you need to both create some real resources so you can delete/clean them up, and also you need to create a new test identity, since your abuse detection system should be deny listing found bad actors. the difficulty is that those sessions probably want to be open for like a week, so they can process both payments and refunds.

can the agents check their email? other notification methods?

ttamslam•3mo ago
This is interesting, I think we've shied away a bit from security-ish use cases since it's outside of our personal core competencies, do you have examples of what tools exist today for catching things like that? Or is it totally adhoc?

> can the agents check their email? other notification methods?

Yes to email (for paying customers agents spin up with unique addresses), no to other notifications, but as soon as a paying customer has a use case for SMS, etc. we'll build it.

dfsegoat•3mo ago
OTP protected flow verification
dfsegoat•3mo ago
Really good call out re: email and other 'side-flows' - hopefully there is integration with something like Mailosaur.

https://mailosaur.com/email-testing

rvz•3mo ago
Looks like a great idea. Does this fully automate QA testing of websites including removing the human in the loop during testing?

Once again, great product.

mpapazian•3mo ago
Great question! The swarm takes a first pass to generate tests and can continuously add as it runs again over time.

In the off chance it misses specific tests - we have tools to let you build them directly with ai support, either by giving them objectives or dropping in a video of the actions you're taking!

mhb•3mo ago
Good video, but it looks like it plays twice. Should be ~3.5 minutes...
mpapazian•3mo ago
ah good catch! Fixed :)
ekarabeg•3mo ago
This seems really interesting. I tried running a swarm on my landing page but didn't get a completed email. I'll try it again, though!
mpapazian•3mo ago
hi! Looking at your swarm results, you might not have given the swarm login credentials to use, which is why most of the runs are failing out. Please feel free to try it again and give them access.
GeorgyM•3mo ago
Sounds interesting. Is this handling mobile as well?
mpapazian•3mo ago
We don't handle mobile yet, but we might get to it at a future date!
anamhira•3mo ago
If you are interested in mobile, check out revyl.ai
cloudflare728•3mo ago
Can it find broken UI?

Human can find and report broken UI easily by using common sense.

Even though it is simple for human. Computer has no common sense and I am a machine learning expert. I tried and mostly failed to build a broken UI detector in my previous company. They had automated plugin upgradable process. That periodically broke UI.

I tried to detect it my taking long screenshot, and you could select a image as working version, then later finding diff between 2 images. I kind of worked but not satisfactory.

mpapazian•3mo ago
The agents can definitely detect when something is off, given they're using VLMs. They don't necessarily compare it to previous versions, rather they have opinionated takes on whether something looks broken / off. So - yes!
orliesaurus•3mo ago
Does it output playwright scripts?
ttamslam•3mo ago
We use playwright for interacting with the browser, so while it's not available by default, we do support bulk exporting tests as playwright to move off our platform or to customers who want to run deterministic versions of the tests on their own infra (you can also run them on ours!)
plasma•3mo ago
Neat! How do you handle state changes during tests, for example, in a todo app the agents are (likely) working on the same account in parallel or even as a subsequent run, some test data has been left behind or now data is not perhaps setup for a test run.

I’m curious if you’d also move into API testing too using the same discovery/attempt approach.

mpapazian•3mo ago
This is one of our biggest challenges, you're spot on! What we're working on taking this includes a memory layer that agents have access to - thus state changes become part of their knowledge and accounted for while conducting a test.

They're also smart enough to not be frazzled by things having changed, they still have their objectives and will work to understand whether the functionality is there or not. Beauty of non-determinism!

not-chatgpt•3mo ago
Been looking for a solution exactly like this but I struggle to see how it is different than spinning 10 Atlas tabs with a 2 sentence prompt.
mpapazian•3mo ago
You could definitely do that and get some good results! But if you want a repeatable process with detailed bug reports (including logs, reasoning, etc.) and a large enough search area with agents that can continuously build an understanding of your app - that's us :)

let's chat - founders@propolis.tech

tarasyarema•3mo ago
Looks interesting! How it would work in terms of latencies? Also, would you be able to run on CI, or even with ephemeral envs? From my experience it is key to be able to run it on each change.
webprofusion•3mo ago
This is a great idea and definitely useful, in particular this would make sense if it can take the build artifacts of recent github action builds and test them comprehensively, particularly to green-light a release. You can already pretty much do this using the standard agent tools and a set of test prompts so anything that makes it easier and repeatable is good.

The pricing sounds quite enterprisey, the risk there is that people will tend towards building their own.

_bfjg•3mo ago
Great to see others working on the problem of validating UI.

We are also building a Web QA agent at https://kodefreeze.com. We are focused on the small and medium sized companies and are offering free usage during our trial period!

nkmnz•3mo ago
Will certainly try this out! FYI: the pricing table is difficult to parse when reading it on mobile.
kodefreeze•3mo ago
Thanks for letting us know, we'll fix it.

Please let us know if you have any feedback!

MrThoughtful•3mo ago
When running a test all I see is:

    Error loading video
    Please try refreshing the page
No matter how often I refresh, that is all I get.

Maybe you need more QA?

When I open my browser console, I see this:

    Capturing error: Error: WHEP request failed: 500 - {"message":"\"message\" is required!","error":"Server Error"}
kodefreeze•3mo ago
Sorry that you are running into this error, are you seeing this on the marketing website? or somewhere in the app?
wyatthoran•3mo ago
Oh this is sick! I've been complaining about this exact problem for years. The "canary without real users" idea is brilliant - way better than just throwing your free tier users under the bus and hoping for the best.

The thing that really got me was catching bugs in non-deterministic output. We've been struggling with this on LLM features where traditional assertions just don't work. Having agents actually judge quality instead of looking for exact matches is such an obvious solution in hindsight.

Quick question though - how do you handle auth flows with MFA or OAuth redirects?

mhb•3mo ago
I did a trial run with a poorly chosen URL. Wanted to rerun with a better one. When i click "Launch Swarm" nothing happens. Am I done? Maybe there should be a message displayed?