frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

I Stored a Website in a Favicon

https://www.timwehrle.de/blog/i-stored-a-website-in-a-favicon/
121•theanonymousone•4h ago•38 comments

Where to Find the Colors Your Screen Can't Show You

https://moultano.wordpress.com/2026/06/19/where-to-find-the-colors-your-screen-cant-show-you/
100•moultano•5h ago•26 comments

Data Compression Explained (2012)

https://mattmahoney.net/dc/dce.html
118•mtdewcmu•3d ago•15 comments

There are no instances in ATProto

https://overreacted.io/there-are-no-instances-in-atproto/
439•danabramov•18h ago•226 comments

Can you see three trees?

https://www.not-ship.com/can-you-see-three-trees/
110•Pamar•2d ago•52 comments

The discovery that changed how scientists think about memory

https://www.ibm.com/think/news/discovery-changed-how-scientists-think-about-memory-kavli-prize
52•rbanffy•2d ago•11 comments

Surprising economics of load-balanced systems

https://brooker.co.za/blog/2020/08/06/erlang.html
108•KraftyOne•13h ago•27 comments

Hyundai buys Boston Dynamics

https://startupfortune.com/hyundai-takes-full-control-of-boston-dynamics-as-softbank-exits-for-32...
825•ck2•17h ago•362 comments

GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

https://arrowtsx.dev/bigger-models/
144•oshrimpton•17h ago•38 comments

How many of the 170k English words do you know?

https://vocabowl-870366514258.us-west1.run.app/
365•abnry•19h ago•448 comments

Norway imposes near ban on AI in elementary school

https://www.reuters.com/technology/norway-imposes-near-ban-ai-elementary-school-2026-06-19/
660•ilreb•17h ago•456 comments

Soccer Arcade Games Through the Years

https://arcadeheroes.com/2026/06/13/world-cup-2026-soccer-arcade/
17•speckx•3d ago•2 comments

Project Valhalla, Explained: How a Decade of Work Arrives in JDK 28

https://www.jvm-weekly.com/p/project-valhalla-explained-how-a
593•philonoist•1d ago•368 comments

Bobby Prince, composer for Doom, Wolfenstein 3D, and Duke Nukem 3D, has died

https://www.legacy.com/legacy/robert-bobby-prince-lll
371•pgrote•14h ago•40 comments

Satellite reveals immense scale of GPS signal tampering

https://www.space.com/space-exploration/satellites/its-quite-a-bit-more-than-we-expected-satellit...
75•y1n0•5h ago•26 comments

A 1969 camera operators' strike created Upstairs Downstairs multiverse

https://ironicsans.ghost.io/the-color-strike/
20•ohjeez•3d ago•4 comments

A Perceptron in Age of Empires II

https://adewynter.github.io/notes/aoe2-circuits
74•EvgeniyZh•2d ago•31 comments

Egyptian Fractions (2006)

https://blog.plover.com/math/egyptian-fractions.html
94•luu•4d ago•8 comments

AURpocalypse now: a look at the recent AUR attacks

https://lwn.net/SubscriberLink/1077619/f7b07c5489fdd43a/
81•jwilk•16h ago•52 comments

Ask HN: Will programmers write more efficient code during the memory shortage?

110•amichail•10h ago•178 comments

John Jumper to join Anthropic

https://twitter.com/JohnJumperSci/status/2068001285173834106
131•artninja1988•15h ago•96 comments

Digital Printing of Arabic: explaining the problem

https://digitalorientalist.com/2017/08/21/digital-printing-of-arabic-explaining-the-problem/
56•a_t48•3d ago•23 comments

Zen and the Art of Machine Learning Research

https://blog.jxmo.io/p/zen-and-the-art-of-machine-learning
257•jxmorris12•4d ago•92 comments

Court Records Should Be Free

https://www.eff.org/deeplinks/2026/06/court-records-should-be-free
363•hn_acker•16h ago•80 comments

Building a robotics research setup that lives next to my desk

https://dfdxlabs.com/research/2026/robotics-setup/
148•mplappert•1d ago•51 comments

Telescope Ranchers

https://kottke.org/26/06/telescope-ranchers
124•bookofjoe•3d ago•47 comments

Show HN: Metiq: a real time 3D globe for 100 public datasets

https://metiq.space
123•rakeda•3d ago•32 comments

Big Banana Car

https://bigbananacar.com/
151•Bender•15h ago•76 comments

Designing a backyard deck for my house

https://blog.cosmin.cloud/posts/diy-deck.html
15•spycraft•4h ago•4 comments

Ten years of ClickHouse in open source

https://clickhouse.com/blog/open-source-10
302•saisrirampur•4d ago•84 comments
Open in hackernews

Launch HN: TesterArmy (YC P26) – Agents that test web and mobile apps

https://tester.army
125•okwasniewski•1d ago
Hey HN - we’re Oskar, Szymon, and Piotr, and we’re building TesterArmy (https://tester.army). TesterArmy is an agentic testing platform that runs end-to-end checks before deployment and in production. Instead of wasting hours on manual testing or maintaining static scripts, we let you specify your tests in natural language and handle everything in between. We've built the platform fully around agents. Our agent will reliably execute the tests, but your coding agent can manage everything in our platform, from defining tests in natural language to running them on your behalf.

Check out our demo video: https://www.youtube.com/watch?v=291IkUbPrlk.

We started TesterArmy because testing is still far too painful. AI coding tools have made it dramatically faster to write and ship code, but testing is still a bottleneck. Traditional E2E tests are slow to set up and expensive to maintain. Managing auth and test users is painful. Setting up staging environments is painful. Running tests reliably is painful.

We think most teams do not actually want to spend their time writing selectors or maintaining test infrastructure. They just want confidence that their core flows work. With TesterArmy, an engineer can sign up, give an agent our CLI, and let it handle creating tests and running them on schedule or on GitHub.

When something breaks, TesterArmy alerts your team through Slack or Discord.

Over the past few months, we scaled from 0 to 30+ teams using our product every day. We caught bugs in critical flows, including onboarding, checkout, and AI chat. We've got many of our customers migrating from already established competitors to us because of the quality and reliability of our agents.

Here are a few of the recent bugs that our agent found (there were quite a lot of them!):

1) Timezone bug that affected the booking flow in one of our clients' apps, the dashboard was very complex and hard to catch by a human. 2) Regression in agent orchestration that caused a sandboxed environment to be stuck on loading, thanks to TesterArmy, the team was able to resolve it before it hit production. 3) Incorrectly counting the order amount in a complex dashboard flow with checkout, thanks to TesterArmy, the team was able to resolve it before it affected revenue 4) Catching a regression in an AI chat flow that would result in a user not being able to retrieve their data due to broken tool calling.

And many more, mostly related to some incorrect API calls, 404s, unhandled errors, etc.

If this sounds useful, we would love your feedback at https://tester.army. We have a bunch of free test runs for you to try. And don’t worry, we won’t make you do sales calls, and we don’t have long onboarding or annoying setup. Our goal is an it-just-works experience.

If you're looking for an end-to-end testing solution, we'd love to hear your feedback!

Comments

yohguy•1d ago
Does it work of mobile native applications or expo apps that have native modules?

Pricing question, the usage on the plans seems low considering in the demo you said that you have 25 tests per pr which would mean you get only 10 PRs per month on the hobby plan?

okwasniewski•1d ago
Yes, it works for any framework. We just get the built native binary and run it in the cloud.

Regarding pricing, the self serve options are currently only for lower usage. We will add more plans further down the line. Currently the most popular one is the startup plan. If you need more usage I’m happy to discuss it on a call!

msencenb•1d ago
Have you been able to nail down a loop where your tool can take an open pr, guess the code path and do some testing?

We use cypress heavily for our core flows which has a similar ai prompt thing but it’s not quite ad hoc enough for smaller fixes which is where the bottleneck still comes in for us.

okwasniewski•1d ago
Yes! We spent quite a lot of time on this, and we are currently creating a test plan based on PR changes and sending an agent to verify it. We have some customers who are only using this feature.
dbbk•1d ago
"Traditional E2E tests are slow to set up and expensive to maintain." I don't really understand this. If I'm already using Opus to write the code, surely it would know best what E2E tests to write to be able to verify its own output? This seems like an unnecessary external step.
okwasniewski•1d ago
Unfortunately from our experience tests don’t scale as well as code. First of all static tests are very brittle, you rely on selectors, need wait times and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox) and handling video recording and screenshots. So with the traditional testing approach you end up mocking a lot of services. I highly recommend you to give it a try!
Obertr•1d ago
I would respectfully disagree on this. How i write tests right now I ask claude/codex to create an eval and it just spins up a bg LLM agent worker which verifies the tests in the sandbox/internally.

So i would say that atm in house testing is easier than external testing for us

iknownthing•1d ago
.army?
okwasniewski•1d ago
We are thinking whether to change this.. We also have testerarmy.com/.ai
thih9•1d ago
Change it now to .com or get stuck there for years, suffering anti spam filters, potential renewal problems and more in the meantime.
tootubular•1d ago
You have the .com? That's a no-brainer imo. I have a domain for a saas where the .com is squatted so I settled for .ai (and other surrounding TLDs / host permutations) and right out of the gate ran into some issues with firewall vendors in corpo environments.
okwasniewski•1d ago
Yeah, we have all of them. I saw it too where in bigger companies our emails were going straight to spam. Will migrate to it soon
rpunkfu•1d ago
Congratulations on launch, I’ve been tracking your progress since you’ve been accepted for spring batch.

Always happy to see cool products from Poland! :)

okwasniewski•1d ago
Thank you!
Laurel1234•1d ago
Seems interesting, but I wonder about this

> Traditional E2E tests are slow to set up and expensive to maintain.

Isn't this just using agents to create e2e tests or is there some better new approach I'm missing?

okwasniewski•1d ago
We use agents to navigate the app, making real-time decisions based on its state. I prefer to compare it more to a manual QA engineer than to static e2e tests. We spent a lot of time on the harness to make sure the results are reliable. This allows you to assert on dynamic content like AI-generated content. We also support validation of email flows since the agent can read its own email.
jaggederest•1d ago
Fable (rip) is absurdly good at this, great time to build a product around it, you definitely need the harness, but it feels like it just turned the corner to be able to do really in depth and edge case work.

Do you handle heterogenous environments and network connectivity simulation as well? I am working on a mobile app and occasionally having users just lose a request or two can put the state machine into unusual modes.

okwasniewski•1d ago
I feel like new AI model releases will only allow our agents to do more in-depth testing; the space still has a lot of room to grow. Quality assurance is way more complicated than just clicking around a UI.

Regarding the other question: not yet. For now, we have Chromium, iOS, and Android (latest versions of each), but we are working on adding more. Regarding network connectivity, it's coming soon (I have an open PR).

Laurel1234•20h ago
> We use agents to navigate the app, making real-time decisions based on its state.

This still leads me to my original question of how though. If you're not using locators are you just passing page contents to the LLM? Or using a multi modal model and say screenshotting? My experience with that has been pretty poor and worse than proper e2e scripts, and is fairly expensive to boot.

Sorry for the insistence haha, just interested because it could be pretty groundbreaking if done well.

RayFitzgerald•1d ago
Love your approach to product. It feels like TesterArmy will become the "Vercel for testing". Refreshing stuff!
okwasniewski•1d ago
Thank you! That's the goal
poisonborz•1d ago
E2E tests are now quick to write due to LLMs, and are then deterministic AND cheap to run. How would this compare to the token costs of running an agent the whole time for each test? How do you make sure results stay stable regardless of the nondeterministic nature? Do customers still need to create test cases - any way to import from test case management system - based on which they could have already generate e2e tests locally?
okwasniewski•1d ago
Unfortunately from our experience tests don’t scale as well as code. First of all, static tests are very brittle: you rely on selectors, need wait times, and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox) and handling video recording and screenshots.

To ensure stable results we do a lot of harness engineering, where we inject trajectories of previous tests to ensure the stability and also the split into smaller steps helps to prevent context overload and decision fatigue.

Regarding test case management, our customers have used our CLI to migrate their existing test cases from whatever system they were using before.

ai_slop_hater•1d ago
Why can't you test AI chats?
tcoff91•1d ago
I'm curious how your mobile testing compares to https://revyl.com

I've been experimenting with Revyl and it's really nice. I think this agent-driven testing is the future.

okwasniewski•1d ago
We support both web and mobile, which is what a lot of companies prefer, just one agent for both. Also, I'm pretty sure Revyl relies only on vision models, which tend to be slower. We built the platform around a hybrid approach that combines vision and accessibility APIs, which is much faster.

Would love to hear your feedback after you try it out!

pranshuchittora•1d ago
Try the OSS alternative - https://github.com/vostride/agent-qa
defied•14h ago
Curious how they frame testing on real devices: “Real devices on demand: spin up iOS simulators and Android emulators”. So those are not real/physical devices?

Physical devices for AI agents is something we at TestingBot do provide: https://testingbot.com/support/ai/mcp

zuzululu•1d ago
not sure the pain point you mentioned resonate. with LLMs its very easy to do E2E testing. also I feel uneasy about outsourcing this part with all the security issues these days.
okwasniewski•1d ago
Unfortunately from our experience tests don’t scale as well as code.

First of all, static tests are very brittle: you rely on selectors, need wait times, and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox), spinning up simulators and handling video recording and screenshots.

To ensure stable results we do a lot of harness engineering, where we inject trajectories of previous tests to ensure the stability and also the split into smaller steps helps to prevent context overload and decision fatigue.

Regarding security part, the product can operate solely without any access to the codebase, you can just give us a URL or a mobile app build and we will do the testing.

skinfaxi•1d ago
Goodness I really didn't expect such lazy copy-pasting of responses for a YC company.
AlexeyBelov•2h ago
Really? You didn't expect low-effort hustler-like behaviour from a YC company? :)
j0sip•1d ago
I wonder how does it compare to mobileboost.io, which has been used by some companies like Duolingo?
okwasniewski•1d ago
Our approach is heavily focused on agents, both for executing tests and for managing the platform. We want to provide the best and simplest way to conduct agentic testing, with a strong focus on details. It looks like their platform also requires a sales call.
antifarben•1d ago
What are people using to test mobile apps on self hosted infrastructure nowadays? Is there a solution that's not super heavy and/or slow?
pranshuchittora•1d ago
Try the OSS alternative - https://github.com/vostride/agent-qa
jkman•19h ago
Jesus man, stop spamming your project in this thread
_pdp_•1d ago
Great presentation

On a slight tangent, since we are all here...

Does anyone still believe there is a long-term future in traditional UI/UX?

It feels like a lot of attention is still going into landing pages, dashboards, and CRUD apps, while overlooking a bigger shift where fewer people will actually need to interact with those interfaces directly when the same tools can perform the underlying tasks automatically, without much UI at all.

So the bigger question is does UI/UX evolve into something else, or does a large part of it simply disappear?

I might be a bit too early. Recently I started a project and decided to skip all of that and focus to make it more friendly to AI agents and frankly so far it has been great purely from user experience but also what it delivers.

altmanaltman•1d ago
How can it perform tasks automatically? It's not magic, there has to be an UI/UX for interacting with it. Will that UI/UX be more optimized and easier to use is the question. Like would you prefer saying "close window computer" or press alt+f4 or just click on the little cross thing or equivalent. Why are we assuming all AI automagic UI/UX will be better for all tasks?
_pdp_•1d ago
AI agents can perfectly do a lot of the data entry tasks and build dashboards. You practically need to build none of that when you can ask an AI agent to pull the data and build a chart or provide a file or a paste to insert into a database.

Basically that.

If the app requires a mouse then it should have UI, if not, unless critical, it can be driven by an agent.

That's my point.

collingreen•1d ago
Perfectly is a wild word to use here
Eridrus•1d ago
Is there a long term future in hand-crafted UI/UX? Maybe not.

Is there a future where we still have traditional UX? Absolutely.

I don't want to write a whole dissertation on this topic, so I'm just going to mention that we tried to build AI voice assistants for a decade, and while LLMs have basically solved understanding, they have not solved the UX portion.

ed_mercer•21h ago
Aside from entertainment/marketing purposes, I think UI will become useless. Why should I interact with UI when an agent can (and will) interact with it?
Eridrus•8h ago
Because you do not know everything and information (possible inputs, actual outputs) still needs to be presented to you. Because text is not the best way to present everything. Because consistency of presentation makes it easier to absorb. Because everything is social and standardization is useful.
Lionga•1d ago
The most flaky tests possible as a service. Everyone knows that no tests are better then unreliable tests.
pranshuchittora•1d ago
Try the OSS alternative - https://github.com/vostride/agent-qa
peterspath•1d ago
Does it support testing on all Apple platforms (macOS, iOS, iPadOS, watchOS, tvOS, and visionOS)?
okwasniewski•1d ago
Currently, only iOS, but we can add iPadOS too!
pensono•1d ago
Love using tester army to validate PRs against my preview environment. Skips the manual check much of the time and helps me ship more confidently.
okwasniewski•1d ago
Happy to hear that!
pranshuchittora•1d ago
Hey, I just gave it a try and ran a quick test on booking.com. It took ~3 mins for a basic test. Do you cache the test steps so that future runs are faster and they don't call LLMs for the subsequent runs?

Also your current pricing is $300 for 1K tests which means $0.3 for each test. We tried out playwright mcp and it easily consumes 1M+ tokens for a test with ~20 steps (including image input). So with this pricing are you guys default alive?

Also is there a benchmark which you ran to prove the efficacy of your testing agent? because in the current stage it is a trust me bro kinda thing.

okwasniewski•1d ago
We've been doing quite a lot of context engineering and optimizations to make sure it's not as expensive. The subsequent runs are faster because we cache the trajectory of the agent (not the whole test run yet, as we want to keep the agent in the loop, more like a manual QA engineer, not a test script).

We currently do not have any benchmarks; much of the experience depends on the test plan. We've been mostly focusing on the customer experience not benchmarking.

pranshuchittora•1d ago
Some digging FAST_MODEL = "google/gemini-3-flash" (fast mode primary) DEEP_MODEL = "openai/gpt-5.4" (deep mode primary) VISION_CLICK_MODEL= "openai/gpt-5.4" (the visual grounder)

fast: gemini-3-flash, falls back to gpt-5.4, 15-min run timeout, max 2 visual calls/step. deep: gpt-5.4, 15-min timeout, max 3 visual calls/step.

Why such a hard timeout, and why not latest models?

okwasniewski•1d ago
We found gemini 3 flash to be the best model as of now, when it comes to bang for the buck, GPT 5.5 is also a bit more expensive than 5.4, if we run tests at scale it has to be affordable. Once a newer model that fits into the criteria is released, we will update it!
Eridrus•1d ago
Has anyone tried to build their own version of this?

It's cool, but I'm not super excited about using some 3rd party SaaS as a critical part of my testing.

pranshuchittora•1d ago
I did. Checkout the OSS alternative - https://github.com/vostride/agent-qa
Eridrus•8h ago
Nice!

I will say though, I don't really want to "write tests in natural language", I want something to crawl my app and figure out what's there and what's currently broken and then write its own regression tests.

mogili•1d ago
This is a solved problem, there are many that do this. Can't believe YC would fund this in 2026.
pranshuchittora•1d ago
Try the OSS alternative - https://github.com/vostride/agent-qa
anaschouhan475•1d ago
System requirements please
negamax•1d ago
Was writing E2E tests ever a problem that needs automation? Also E2E tests need to be updated everytime a new feature is added. TesterArmy sounds great. But config overhead and potential security leaks makes it a no go
pranshuchittora•1d ago
Try the OSS alternative - https://github.com/vostride/agent-qa
negamax•21h ago
Another repo that shouldn't exists Do you know anything about how E2E tests are actually written and run?
pranshuchittora•21h ago
Yes a lot. So with the acceleration of AI in the software engineering. Features are being shipped faster but causes regressions. The only way to verify is either you write tests with AI and spend hours reviewing them or you do manual QA. agent-qa aims to solve the later. First your product should work for the end user, later you can write clean test etc.
radku•20h ago
Congrats on the launch!

Will this solution work with services protected by cloudflare turnstile or captchas? Does this involve human in a loop?

fny•17h ago
How do you differ from othe companies doing automated testing?

Rainforest QA, for example, has been in this space for a while and also happens to be a YC company.