You can also run multiple instances of rsync, the problem seems how to efficiently divide the set of files.
It turns out, fpart does just that! Fpart is a Filesystem partitioner. It helps you sort file trees and pack them into bags (called "partitions"). It is developed in C and available under the BSD license.
It comes with an rsync wrapper, fpsync. Now I'd like to see a benchmark of that vs rclone! via https://unix.stackexchange.com/q/189878/#688469 via https://stackoverflow.com/q/24058544/#comment93435424_255320...
SSH was never really meant to be a high performance data transfer tool, and it shows. For example, it has a hardcoded maximum receive buffer of 2MiB (separate from the TCP one), which drastically limits transfer speed over high BDP links (even a fast local link, like the 10gbps one the author has). The encryption can also be a bottleneck. hpn-ssh [1] aims to solve this issue but I'm not so sure about running an ssh fork on important systems.
There's gotta be a less antisocial way though. I'd say using BBR and increasing the buffer sizes to 64 MiB does the trick in most cases.
Depending on what you're doing it can be faster to leave your files in a solid archive that is less likely to be fragmented and get contiguous reads.
Source: Been in big tech for roughly ten years now trying to get servers to move packets faster
> MPLS ECMP hashing you over a single path
This is kinda like the traffic shaping I was talking about though, but fair enough. It's not an inherent limitation of a single stream, just a consequence of how your network is designed.
> a single loss event with a high BDP
I thought BBR mitigates this. Even if it doesn't, I'd still count that as a TCP stack issue.
At a large enough scale I'd say you are correct that multiple streams is inherently easier to optimize throughput for. But probably not a single 1-10gb link though.
The issue is the serialization of operations. There is overhead for each operation which translates into dead time between transfers.
However there are issues that can cause singular streams to underperform multiple streams in the real world once you reach a certain scale or face problems like packet loss.
My goal is to smooth out some of the operational rough edges I've seen companies deal with when using the tool:
- Team workspaces with role-based access control
- Event notifications & webhooks – Alerts on transfer failure or resource changes via Slack, Teams, Discord, etc.
- Centralized log storage
- Vault integrations – Connect 1Password, Doppler, or Infisical for zero-knowledge credential handling (no more plain text files with credentials)
- 10 Gbps connected infrastructure (Pro tier) – High-throughput Linux systems for large transfersI've adjusted threads and the various other controls rclone offers but I still feel like I'm not see it's true potential because the second it hits a rate limit I can all but guarantee that job will have to be restarted with new settings.
With rsync, you upload hashes of what you have, then the source has to do all the hashing work to figure out what to send you. It's slightly more efficient, but If you are supporting even 10s of downloads it's a lot of work for the source.
The other option is to send just a diff, which I believe e.g. Google Chrome does. Google invented Courgette and Zucchini which partially decompile binaries then recompile them on the other end to reduce the size of diffs. These only work for exact known previous versions, though.
I wonder if the ideas of Courgette and Zucchini can be incorporated into zsync's hashes so that you get the minimal diff, but the flexibility of not having a perfect previous version to work from.
Edit: oh I see, delta transfer only sends the changed parts of files?
>In fact, some compression modes would actually slow things down as my energy-efficient NAS is running on some slower Arm cores
Depending on the number/type of devices in the setup and usage patterns, it can be effective sometimes to have a single more powerful router and then use it directly as a hop for security or compression (or both) to a set of lower power devices. Like, I know it's not E2EE the same way to send unencrypted data to one OPNsense router, Wireguard (or Nebula or whatever tunnel you prefer) to another over the internet, and then from there to a NAS. But if the NAS is in the same physically secure rack directly attached by hardline to the router (or via isolated switch), I don't think in practice it's significantly enough less secure at the private service level to matter. If the router is a pretty important lynchpin anyone, it can be favorable to lean more heavily on that so one can go cheaper and lower power elsewhere. Not that more efficiency, hardware acceleration etc are at all bad, and conversely sometimes might make sense to have a powerful NAS/other servers and a low power router, but there are good degrees of freedom there. Handier then ever in the current crazy times where sometimes hardware that was formerly easily and cheaply available is now a king's ransom or gone and one has to improvise.
baal80spam•1h ago
digiown•1h ago
plasticsoprano•38m ago