Can we test it? Yes, was can [video]

https://www.youtube.com/watch?v=MqC3tudPH6w

43•zdw•3d ago

Comments

webdevver•8h ago

i thought that's what customers were for?

somewhereoutth•8h ago

My feeling on testing:

- If it is used by a machine, then it can be tested by a machine.

- If it is used by a human, then it must be tested by a human.

alex_smart•7h ago

But testing by human is expensive

Anonbrit•7h ago

and horribly unreliable even when done by competent and motivated humans, let alone most IT workers

karolinepauls•3h ago

I've heard of good experiences using a 3rd party QA company (for frontend-heavy changes) and had pretty OK experiences doing it in-house with subject experts (though testing backend changes, so pretty much "either it works or it doesn't" in my case).

Retric•7h ago

Expensive isn’t unnecessary.

Developer written tests can’t tell you if your UI is intuitive for novice users.

sfn42•6h ago

They can however tell you whether the button/form/page/whatever is working and continues working.

dylan604•5h ago

Do those developer created tests include weird and random text that users will unwittingly break your form with? More than once has a form been broken because devs didn’t test for emoji or other glyphs in the text. Just because a button click calls a form with as expected doesn’t not mean the form will behave as expected with unexpected user input

sfn42•1h ago

Playwright is essentially a programmable headless browser. It sends a request to a page, waits for the response to load and then for things like React rendering, secondary requests etc, then once the page is ready you can do whatever you want. You can interact with the DOM, fill in forms, scroll, click things, get a div and check its size and location, anything you want. Testing special symbols and such should be no problem at all, provided that you actually think to write that test.

sfn42•5h ago

Tools like playwright allow pretty nice web UI testing. You can make sure things are working properly and continue working as things change.

Doesn't replace human testing but it does ease the human load and help catch problems and regressions before they get to the human testers.

Ygg2•5h ago

> Tools like playwright allow pretty nice web UI testing.

Can they test for color blindness and myopia?

dylan604•5h ago

“Doesn't replace human testing”

It’s like you stopped reading to try to score internet points or something. The answer to your question was one more sentence from where you stopped reading

sfn42•1h ago

There are tools that can analyze a page for accessibility. I don't have experience with them but I wouldn't be surprised if you could include them in a CI/CD pipeline. I don't know whether they specifically control for color blindness and myopia but I wouldn't be surprised if they do. I know they check for contrast and stuff like that.

You could also check things like colors etc using Playwright but I would say it's probably the wrong tool for that job. It's more about testing functionality - make sure a page has the right content and works correctly on a technical level.

Without automated tools this type of thing can take a lot of time - in order to ensure quality you would basically have to click through the entire application for every release. Otherwise you might end up with some minor changes to one page breaking a different page and you'd never know until a tester checked it out or a user complained. With Playwright and similar tools you can catch regressions like this automatically.

__MatrixMan__•4h ago

User acceptance testing is a good idea but doing it in cases where you can get away with cheaper testing is not.

diggan•8h ago

Seems like blog spam, the actual content (presentation/talk) is at: https://www.youtube.com/watch?v=MqC3tudPH6w

tomhow•8h ago

Thanks, we updated the URL from https://antithesis.com/blog/2025/bugbash_2025/mitchell_hashi...

aspenmayer•7h ago

When this happens, how do you determine who gets the karma? Is it right and just and logical for OP to get karma for submitting a URL that HN readers didn't visit after being updated by mods, or for OP to get karma previously for a URL that was deemed lacking with regards to the guidelines? It seems like they should get one or the other, but not both.

Just some food for thought. The reason I mention it, is that a person who has been commented upon by me previously for using scripts submitted this before OP, and if precedent holds, they should get the karma, not OP. But they have been commented upon by mods for having used scripts, but somehow haven't been banned for doing so, because dang has supposedly interacted with them/spoken with them, as if that could justify botting. But I digress.

To wit:

https://news.ycombinator.com/item?id=44449650

sebastianmestre•7h ago

They're fake internet points, it's no big deal

aspenmayer•7h ago

Then why don't you email mods to delete your account?

Your post adds nothing to the discussion I made with my reply, so what are you even doing here?

thunderbong•7h ago

Because it's possible they didn't know about it? Because it's possible they might post something of more value another time?

Let's assume good faith.

aspenmayer•7h ago

Who didn't know about what now?

This is HN inside baseball. If you or they don't know, they should ask somebody or lurk more, to be blunt. To phrase this in a better way, they should post better if they want a better response.

tomhow•7h ago

It's an imperfect system and some human judgment about fairness applies.

In this case, the original URL submitted had the YouTube video prominently embedded, along with some commentary. It's no big deal to do that, as sometimes the commentary adds something. In this case nobody seems to think it does so I updated the URL to the primary source, but there's no need to penalize the submitter.

If the primary/best source for a topic has been submitted by multiple people, all being equal we'll promote the first-submitted one to the front page and mark later ones as dupes.

But things aren't always equal, and if the first submission was from a user who submits a lot and gets many big front page hits, we don't feel the need to just hand them more karma and will happily give the points to a less karma-rich user who submitted it later, especially if theirs is already on the front page.

aspenmayer•7h ago

> But things aren't always equal, and if the first submission was from a user who submits a lot and gets many big front page hits, we don't feel the need to just hand them more karma and will happily give the points to a less karma-rich user who submitted it later, especially if theirs is already on the front page.

I know dang has said that generated comments and bots are against HN guidelines previously, but should I read what you're not saying between the lines, and should I interpret it as consistent with HN policy to use scripts or bots to post to HN? Because that seems to be happening in this case, and keeps coming up, because it keeps happening. After a certain point, inaction is a choice and a policy by proxy.

inb4 someone asks what it is and what it is that is happening; if you don't already know, ask somebody else, and if you're not a mod on HN, I'll likely consider your response a concern troll if it isn't on topic for this thread or attempts to derail it, which I fully expect to happen, but I am constantly willing to be surprised, and often am on this site.

As the previous construction was rather strained, I'll say it plainly:

Is it okay, as in, consistent with and affirming the HN guidelines, to use scripts/bots to post to HN, or not? My reading tells me no.

tomhow•6h ago

We ban bots that submit a lot of spam or other content that’s a bad fit for HN. We ban LLM-comment bots because we want the discussions to be between humans sharing original human ideas.

If someone has written a script that finds and submits articles that are good for HN, I don’t see why we should ban them. We can use human judgment to decide which of their posts should be rewarded or downranked; we’re doing manual curation all the time anyway.

aspenmayer•6h ago

> We ban bots that submit a lot of spam or other content that’s a bad fit for HN. We ban LLM-comment bots because we want the discussions to be between humans sharing original human ideas.

> If someone has written a script that finds and submits articles that are good for HN, I don’t see why we should ban them. We can use human judgment to decide which of their posts should be rewarded or downranked; we’re doing manual curation all the time anyway.

You should ban them for the same reason generated comments are banned.

This is not a great outcome for HN, so I don't expect this to actually occur, mind you!

I just think that the status quo unfairly advantages those who have already demonstrated that they're actively and successfully gaming the system. If the points don't matter, then script users' contributions matter even less than a human-initiated post, so why not run the script in-house under an official username at that point. This arm's length scripted behavior leaves a bad taste after Digg and every other site that has done this, or allowed others to do it. Either the content is user-submitted, or it isn't. Bots aren't people.

tomhow•6h ago

I don’t know what kind of game is going on when you quote my whole comment like this. That’s not a norm on HN.

We don’t treat them the same because they don’t have the same effects and aren’t the same thing. They’d be the same thing if someone made a bot to write whole articles and post them as submissions. Of course we’d ban that, and indeed we have banned bots like that, or at least banned sites whose contents are mostly LLM-generated, along with the accounts that submitted them.

If a user’s script finds and submits good articles that attract many upvotes and stimulate good discussions, it would be a net loss for HN to ban them.

aspenmayer•6h ago

It's not a game, but many people try to derail the conversation when I bring this up. I want to have a record of what was said when I reply when I think it's worth quoting, so as there is no ambiguity as to what was said or what I was replying to.

> If a user’s script finds and submits good articles that attract many upvotes and stimulate good discussions, it would be a net loss for HN to ban them.

I agree. So why not run that script in-house, so that we have transparency about the account being scripted? Or, the script user could say something to that effect in their bio. Or, they could use a dedicated scripting account. Anything would be better than this, because it's a bad look for HN, and I'm tired of talking about it, but it's an issue with values to allow scripted submissions as long as they're URLs, but not if they're comments. It's a distinction without a difference to my view.

That being said, I can't disagree that they find good content. I am fine with it being a quirk of the rules that scripts and bots are allowed. I don't think it's what's best for HN, and I don't think that it's the status quo, but as you say, you do a lot of manual intervention. If a script user is making posts that are good, that is reducing your workload, so I think you may be close enough to the situation to care much more than I do, and so I take your view to heart and trust your judgement that it's not a problem to you or HN in your view, but I think differently, and I don't know what you know. If I did know what you do know, I'm willing to believe I would think as you do, so I don't mean to accuse or blame, or find fault.

I like the topic because after a certain point, generated comments and posts may be indistinguishable from other HN posts, which would be weird, but I would be okay with that as long as the humans remain and are willing to curiously engage with each other. I'm not really anti-AI at all, I just find the guidelines rather helpful, and yet I hate to be a scold. Please don't interpret this thread as critical of HN, but rather bemused by HN.

tomhow•6h ago

The thing is you just don’t seem to be able to convey why it is that it’s such an important issue. Nobody else cares about it. But you take us to task about it again and again with these lengthy comments but not a clear statement of what the fundamental problem is.

For what it’s worth we have systems and methods for finding good new articles, like the second chance pool. We wouldn’t ban other people’s scripts for the same reason there’s always room in the marketplace for different variants of a product; someone else’s variant may be better than ours at least in some ways.

Ultimately there’s just no need for us to spend a whole lot of time thinking about it because it doesn’t cause problems (that we can’t address with routine human oversight).

aspenmayer•5h ago

> The thing is you just don’t seem to be able to convey why it is that it’s such an important issue.

I have conveyed why it's important to me. Whether or not you find my exhortation convincing or not is likely not due to my lack of attempts to convince you why I feel the way I do. Of all the things you could find lacking, I don't think it's unclear. Scripted behavior isn't authentic. Coordinated inauthentic behavior is problematic to me, because I work in security amongst other hats I wear, and I have another name for coordinated inauthentic behavior.

> Nobody else cares about it.

Tomte cares, and posted in this and the other thread? I'm sure other people would care if they saw the thread. Funny how people only care about what they're aware of.

> For what it’s worth we have systems and methods for finding good new articles, like the second chance pool. We wouldn’t ban other people’s scripts for the same reason there’s always room in the marketplace for different variants of a product; someone else’s variant may be better than ours at least in some ways.

> Ultimately there’s just no need for us to spend a whole lot of time thinking about it because it doesn’t cause problems (that we can’t address with routine human oversight).

You don't have to spend any time to ask script users to mention it in their bio! If they don't, they don't. Rule breakers are not an indictment of the concept of making rules, or following ones that already exist, or closing gaps in the rules once identified.

If there was nothing I or anyone else could say to change your mind, perhaps the failure of communication is on your end, and may even be willful. I come to HN to interact with human beings making user-submitted posts and comments. That's what HN is to me, and this announcement is a departure from all prior communications, because you've laid bare what I drew attention to last time this came up, where Tomte also posted. Apparently people are scripting submissions and farming posts on HN. I don't see how that isn't a problem on its face. The fact that you know and do nothing because perfect detection and enforcement is impossible makes me wonder if the reason you allow it is because it is expedient to moderating HN, and not what is necessarily best for HN as determined by HN users.

> But you take us to task about it again and again with these lengthy comments but not a clear statement of what the fundamental problem is.

And yet the problem has been identified, and it remains signposted by me because the problem has been denied to exist in favor of criticisms of the length of my posts. What even is the issue? Should I post fewer, more convincing words? I am honestly at a loss as to how to continue this thread, so I will rest and await any reply from you or anyone else.

tomhow•5h ago

Ok, I definitely need to step away from this thread because where I am it’s nearly midnight on a Sunday night and I’m emotionally exhausted after dealing with some family illness this evening. That’s what’s exacerbated my impatience with the topic and duration of the discussion. It’s just not how I expected/hoped to be spending my the last hours of my Sunday night when I just posted a routine mod comment about a url change. Sorry for letting that get the better of me.

That Tomte sees it as a problem is interesting, because I wouldn’t have been surprised to find they also used some kind of scripts to find articles to post; indeed I just casually assumed they did, at least to some extent. I mean that not as an accusation, just an impression I’d picked up from observing their posts over many years.

Ok, point taken about how it makes you feel about HN. I’ll think more about it as we continue to work on ways to improve everything.

aspenmayer•5h ago

Thank you for your efforts. You and the entire mod team make this site worth visiting, so something you're doing must be working, and likely many things I don't need to know about or trouble myself with. Most sites don't even have named mods, so the fact that you post is here is just as important to me as anything else HN and you all have done to make this place what it is.

I trust you know what you're doing, or I wouldn't be here.

I hope you can rest and recharge. Nothing I said today or probably ever on HN is more important than the people in our lives, which is why I think preserving a place for humans is worth it, even if it's not perfect. I appreciate all you do, even if I have a strange way of showing it.

Tomte•6h ago

I disagree with aspenmeyer violently on all those fairness issues that came up a few weeks ago in the similar discussion, but I think he‘s right with pushing back on this in general.

I am a very active HN user and was totally surprised by the declaration that submission bots are fine with you. It goes against pretty much all earlier communication (which, in fairness, was usually about comment bots), but I think in the past my submission behavior was repeatedly ruled okay when challenged by other users, because I‘m submitting manually.

I do feel I‘m losing interest a bit when we‘re all just firing scripts. Manual submission at least makes you care enough to spend those seconds, bot submissions mean nobody cares anymore because you can just fling shit and see what sticks. And maybe we high-volume submitters should even be reigned in more.

(Also it feels unfriendly towards lobste.rs, when HN is effectively just bulk copying their submissions.)

aspenmayer•6h ago

Thanks for this. I think you make good points, and I apologize if I came off as hostile toward you directly or indirectly in that other thread. The closest I have come to scripted submissions is using the HN provided bookmarklet, and even that felt like scripting to me. I'm not a purist about this, but the inconsistency feels strange to me too, and I would rather not have scripts be the primary posting or comment method, but please keep the faith with me here on HN.

If we don't make an effort and intention to care and stay here despite bad calls by refs, we'll just have to take our ball and go home, but for many, they don't have another home like HN, so that would be a net loss for them. We owe it to ourselves and each other to show up where we want to effect change that wouldn't happen without our presence and involvement. That's what user generated content is all about!

cloogshicer•7h ago

I think what people really mean when they say "This can't be tested" is:

"The cost of writing these tests outweighs the benefit", which often is a valid argument, especially if you have to do major refactors that make the system overall more difficult to understand.

I do not agree with test zealots that argue that a more testable system is always also easier to understand, my experience has been the opposite.

Of course there are cases where this is still worth the trade-off, but it requires careful consideration.

MoreQARespect•6h ago

It's often shorthand for "this cant be unit tested" or "this isnt dependency injected" even though integration tests are perfectly capable of testing non-DI code.

The author's claims that we should isolate code under test better and rely more on snapshot testing are spot on.

diggan•1h ago

> rely more on snapshot testing are spot on

Never quite liked "snapshot testing" which I think has a better name under "golden master testing" or similar anyways.

Reason for the dislike, is that it's basically a codified "Trust me bro, it's correct" without actually making clear what you are asserting with that test. I haven't found any team that used snapshot testing and didn't also need to change the snapshots for every little change, which obviously defeats the purpose.

The only things snapshot testing seems to be good for, is when you've written something and you know it'll never change again, for any reason. Beyond that, unit tests and functional/integration tests are much easier to structure in a way so you don't waste so much time reviewing changes.

MoreQARespect•31m ago

>I haven't found any team that used snapshot testing and didn't also need to change the snapshots for every little change, which obviously defeats the purpose

I dont see how this even defeats the point, let alone obviously.

If a UI changes I appreciate being notified. If a REST API response changes I like to see a diff.

If somebody changes some CSS and it changes 50 snapshots, it isnt a huge burden to approve them all and sometimes it highlights a bug.

ChrisMarshallNY•6h ago

This is the case.

I did a lot of work on hardware drivers and control software, and true testing would often require designing a mock that could cost a million, easy.

I've had issues, with "easy mocks"[0].

A good testing mock needs to be of at least the same Quality level as a shipping device.

[0] https://littlegreenviper.com/concrete-galoshes/#story_time

fleventynine•2h ago

I've had a lot of success writing driver test cases against the hardware's RTL running in a simulation environment like verilator. Quick to setup and very accurate, the only downside is the time it takes to run.

And if you want to spend the time to write a faster "expensive mock" in software, you can run your tests in a "side-by-side" environment to fix any differences (including timing) between the implementations.

rhizome31•4h ago

Testing is a skill. The more you do it, the less expensive it becomes.

cloogshicer•4h ago

The main cost isn't writing the tests themselves but the increased overall system complexity. And that never goes down.

diggan•1h ago

> but the increased overall system complexity

I think this happens because people don't treat the testing code as "production code" but something else. You can have senior engineers spending days on building the perfect architecture/design, but when it comes to testing, they behave like a junior and just writes whatever comes to mind first, and never refactor things like they would "production code", so it grows and grows and grows.

If people could spend some brain-power on how to structure things and what to test, you'd see the cost of the overall complexity go way down.

vlovich123•1h ago

Sounds like you’re responding to the title without listening to the presentation. He literally says this in the intro.

j_w•5h ago

My takeaway from this is that when you have a system or feature that "can't be tested" that you should try to isolate the "untestable" portions to increase what you can test.

The "untestable" portions of a code base often gobble up perfectly testable functionality, growing the problem. Write interfaces for those portions so you can mock them.

gblargg•1h ago

This often comes naturally for portable code, where the OS-specific things are separated from the core logic that just processes data. His keyboard handling example illustrates this.

testthetest•4m ago

The ROI on unit tests, as well as the answer to "Can we test it?" is changing fast in the age of AI.

1. AI is making unit tests nearly free. It's a no-brainer to ask Copilot/Cursor/insert-your-tool-here to include tests with your code. The bonus is that it forces better habits like dependency injection just to make the AI's job possible. This craters the "cost" side of the equation for basic coverage.

2. At the same time, software is increasingly complex: a system of a frontend, backend, 3rd-party APIs, mobile clients, etc. A million passing unit tests and 100% test coverage mean nothing in a world when a tiny contract change breaks the whole app. In our experience the thing that gives us the most confidence is black-box, end-to-end testing that tests things exactly as a real user would see them.

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

Functions Are Vectors (2023)

Get the location of the ISS using DNS

Hannah Cairo has solved the Mizohata-Takeuchi conjecture

Cool People [pdf]

Collatz's Ant and Σ(n)

Lessons from 863 episodes of This American Life

Toys/Lag: Jerk Monitor

Metriport (YC S22) is hiring engineers to improve healthcare data exchange

Overthinking GIS (2024)

Luigi Lineri, the Man Who Collects and Categorizes Stones (2024)

Claude Code Pro Limit? Hack It While You Sleep

Serving 200M requests per day with a CGI-bin

Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths

Hidden interface controls that affect usability

What a Hacker Stole from Me

How to get started with Old English poetry

Eastern Baltic cod grow much smaller than they did due to overfishing

Take Two: Eshell

Reinforcement Learning from Human Feedback (RLHF) in Notebooks

'Shit in, shit out', AI is coming for agriculture, but farmers aren’t convinced

1945 TV Console Showed Two Programs at Once

July 5, 1687: When Newton explained why you don't float away

Micro Common Lisp

Show HN: Pixel Art Generator Using Genetic Algorithm

Meet Bionode

How to Network as an Introvert

Development of a transputer ISA board

The force-feeding of AI features on an unwilling public

Local-first software (2019)

Show HN: I wrote a "web OS" based on the Apple Lisa's UI, with 1-bit graphics

Functions Are Vectors (2023)

Get the location of the ISS using DNS

Hannah Cairo has solved the Mizohata-Takeuchi conjecture

Cool People [pdf]

Collatz's Ant and Σ(n)

Lessons from 863 episodes of This American Life

Toys/Lag: Jerk Monitor

Metriport (YC S22) is hiring engineers to improve healthcare data exchange

Overthinking GIS (2024)

Luigi Lineri, the Man Who Collects and Categorizes Stones (2024)

Claude Code Pro Limit? Hack It While You Sleep

Serving 200M requests per day with a CGI-bin

Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths

Hidden interface controls that affect usability

What a Hacker Stole from Me

How to get started with Old English poetry

Eastern Baltic cod grow much smaller than they did due to overfishing

Take Two: Eshell

Reinforcement Learning from Human Feedback (RLHF) in Notebooks

'Shit in, shit out', AI is coming for agriculture, but farmers aren’t convinced

1945 TV Console Showed Two Programs at Once

July 5, 1687: When Newton explained why you don't float away

Micro Common Lisp

Show HN: Pixel Art Generator Using Genetic Algorithm

Meet Bionode

How to Network as an Introvert

Development of a transputer ISA board

The force-feeding of AI features on an unwilling public

Local-first software (2019)

Can we test it? Yes, was can [video]

Comments