I think this is getting a bit carried away. I don't have any argument against the observation that that average of a p95 is not something that mathematically makes sense, but if you actually understand what it is, it is absolutely still meaningful. With time series data, there is always some time denominator, so it really means (say) "the p95 per minute averaged over the last hour", which is or can be meaningful (and useful at a glance).
Also, the claim that "[o]nly looking at the 95th percentile is what you do when you want to hide all the bad stuff" is very context dependent. As long as you understand what it actually means, I don't see the harm in it. The author makes this point that, because a load of a single webpage will result in 40 requests or so, you are much more likely to hit a p99 and so you should really care about p99 and up - more power to you, if that's the contextually appropriate, then that is absolutely right, but that really only applies to a webserver serving webpage assets which is only one kind of software that you might be writing. I think it is definitely important to know, for one given "eyeball" waiting on your service to respond, what the actual flow is - whether it's just one request, or multiple concurrent requests, or some kind of dependency graph of calls to your service all needed in sequence - but I don't really think that challenges the commonsense notion of latency, does it?
Source: I’ve done this a lot
Good load testing tools will have modes to send in data at a fixed rate regardless of other requests to handle coordinated omission. k6 for instance defined these modes are "open" and "closed": https://grafana.com/docs/k6/latest/using-k6/scenarios/concep.... They mention the term "coordinated omission" on the page however I feel like they could have given a nod to Gil for the inventing term.
tomhow•1h ago
A summary of how not to measure latency - https://news.ycombinator.com/item?id=10732469 - Dec 2015 (3 comments)