Tl;dr e.g. the HTML of a YouTube video contains the video description, views, likes etc. in its first 600KB, the remaining 900KB are of no use for me, but I have to pay my proxies by the gigabyte.
My crawler receives packet per packet, and if I got everything I needed I reset the request, and only pay-for-what-i-crawled.
This is also potentially useful for large-scale crawling operations, where duplicates matter. You could compute a simHash on the fly, and reset on-the-fly before crawling the entire document (again).