https://github.com/googleapis/googleapis/blob/master/google/... is a more complete version of this. It supports resumable uploads, and the download can start from an offset within a file, allowing you to download only part of the file instead of the whole.
Another version of this is to use grpc to communicate the "metadata" of a download file, and then "side" load the file using a side channel with http (or some other light-weight copy methods). Gitlab uses this to transfer Git packfiles and serve git fetch requests iirc https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/sidec...
pipo234•52m ago
I understand some of the appeal of grpc, but resumable uploads and download offsets have long be part of plain http. (E.g. RFC 7233)
Relying on http has the advantage that you can leverage commodity infrastructure like caching proxies and CDN.
Why push protobuf over http when all you need is present in http already?
avianlyric•46m ago
Because you may already have robust and sensible gRPC infrastructure setup and working, and setting up the correct HTTP infrastructure to take advantage of all the benefits that plain old HTTP provides may not be worth it.
If moving big files around is a major part of the system you’re building, then it’s worth the effort. But if you’re only occasionally moving big files around, then reusing your existing gRPC infrastructure is likely preferable. Keeps your systems nice and uniform, which make it easier to understand later once you’ve forgotten what you originally implemented.
a-dub•34m ago
this.
also, http/s compatibility falls off in the long tail of functionality. i've seen cache layers fail to properly implement restartable http.
that said, making long transfers actually restartable, robust and reliable is a lot more work than is presented here.
pipo234•25m ago
Simplicity makes sense, of course. I just hadn't considered a grpc-only world. But I guess that makes sense in today's Kubernetes/node/python/llm world where grpc is the glue that once was SOAP (or even CORBA).
Still, stateful protocols have a tendency to bite when you scale up. And HTTP is specifically designed to be stateless and you get scalability for free as long as you stick with plain GET requests...
ithkuil•29m ago
I like implementing this standard gRPC interface (of I already have a gRPC based project) because it allows me to reuse a troubleshooting utility I wrote that uses it:
sluongng•1h ago
Another version of this is to use grpc to communicate the "metadata" of a download file, and then "side" load the file using a side channel with http (or some other light-weight copy methods). Gitlab uses this to transfer Git packfiles and serve git fetch requests iirc https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/sidec...
pipo234•52m ago
Relying on http has the advantage that you can leverage commodity infrastructure like caching proxies and CDN.
Why push protobuf over http when all you need is present in http already?
avianlyric•46m ago
If moving big files around is a major part of the system you’re building, then it’s worth the effort. But if you’re only occasionally moving big files around, then reusing your existing gRPC infrastructure is likely preferable. Keeps your systems nice and uniform, which make it easier to understand later once you’ve forgotten what you originally implemented.
a-dub•34m ago
also, http/s compatibility falls off in the long tail of functionality. i've seen cache layers fail to properly implement restartable http.
that said, making long transfers actually restartable, robust and reliable is a lot more work than is presented here.
pipo234•25m ago
Still, stateful protocols have a tendency to bite when you scale up. And HTTP is specifically designed to be stateless and you get scalability for free as long as you stick with plain GET requests...
ithkuil•29m ago
https://github.com/mkmik/byter