One nuance we've been seeing in practice is that the "utility" of a token isn't purely semantic: some tokens carry behavioral constraints (negations, numeric bounds, formatting rules, safety instructions) and their removal can cause discrete failures rather than smooth degradation.
And yes, since cost scales linearly with input tokens, reducing prompt size (I mean context size) can improve both spend and latency.
Gillesray•1h ago