frontpage.

Bzfs 1.13.0 – 1‑second (even sub‑second) ZFS replication across fleets

1•werwolf•2h ago

bzfs is a simple, reliable CLI for replicating ZFS snapshots (zfs send/receive) locally or over SSH. Its companion, bzfs_jobrunner, turns that into periodic snapshot/replication/pruning jobs across N source hosts and M destination hosts, driven by one versioned job config.

This release makes 1‑second replication frequency practical for small incrementals, and even sub‑second frequency possible in constrained setups (low RTT, few datasets, daemon mode).

v1.13.0 focuses on cutting per‑iteration latency — the enemy of high‑frequency replication at fleet scale:

- SSH reuse across datasets and on startup: fewer handshakes and fewer round‑trips, which is where small incremental sends spend much of their time. - Earlier stream start: estimate "bytes to send" in parallel so the data path can open sooner instead of blocking on preflight. - Smarter caching: faster snapshot list hashing and shorter cache paths to reduce repeated ZFS queries in tight loops. - More resilient connects: retry the SSH control path briefly before failing to smooth over transient blips. - Cleaner ops: normalized exit codes; suppress “Broken pipe” noise when a user kills a pipeline.

Why this matters - At 1s cadence, fixed costs (session setup, snapshot enumeration) dominate. Shaving RTTs and redundant `zfs list` calls yields bigger wins than raw throughput. - For fleets, the tail matters: reducing per‑job jitter and startup overhead improves end‑to‑end staleness when multiplied by N×M jobs.

1‑second (and sub‑second) replication - Use daemon mode to avoid per‑process startup costs; keep the process hot and loop at `--daemon-replication-frequency` (e.g., `1s`, even `100ms` for constrained cases). - Reuse SSH connections (now default) to avoid handshakes even for new processes. - Keep per‑dataset snapshot counts low and prune aggressively; fewer entries make `zfs list -t snapshot` faster. - Limit scope to only datasets that truly need the cadence (filters like `--exclude-dataset`, `--skip-parent`). - In fleets, add small jitter to avoid thundering herds, and cap workers to match CPU, I/O, and link RTT.

How it works (nutshell) - Incremental sends from the latest common snapshot; bookmarks supported for safety and reduced state. - Persistent SSH sessions are reused across datasets/zpools and across runs to avoid handshake/exec overhead. - Snapshot enumeration uses a cache to avoid re‑scanning when nothing changed. - Job orchestration via bzfs_jobrunner: same config file runs on all hosts; add jitter to avoid thundering herds; set worker counts/timeouts for scale.

High‑frequency tips - Prune at a frequency proportional to snapshot creation to keep enumerations fast. - Use daemon mode; split snapshot/replicate/prune into dedicated loops. - Add small random start jitter across hosts to reduce cross‑fleet contention. - Tune jobrunner `--workers` and per‑worker timeouts for your I/O and RTT envelope.

Quick examples - Local replicate: `bzfs pool/src/ds pool/backup/ds` - Pull from remote: `bzfs user@host:pool/src/ds pool/backup/ds` - Jobrunner (periodic): run the shared jobconfig with daemon mode for 1s cadence: `... --replicate --daemon-replication-frequency 1s` (sub‑second like `100ms` is possible in constrained setups). Use separate daemons for `--create-src-snapshots`, `--replicate`, and `--prune-

Links - Code and docs: https://github.com/whoschek/bzfs - README: quickstart, filters, safety flags, examples - Jobrunner README: multi‑host orchestration, jitter, daemon mode, frequencies - 1.13.0 diff: https://github.com/whoschek/bzfs/compare/v1.12.0...v1.13.0

Notes - Standard tooling only (ZFS/Unix and Python); no extra runtime deps.

I’d love performance feedback from folks running 1s or sub‑second replication across multiple datasets/hosts: - per‑iteration wall time, number/size of incremental snapshots, dataset counts, and link RTTs help contextualize results.

Happy to answer questions!

Investors are on edge as talk of an AI bubble gets louder

'Your basis to live is checked at every step': India's ID system divides opinion

Observability in Go: What Real Engineers Are Saying in 2025

Reclining on your next WestJet flight may mean having to upgrade seats

Freed Israeli hostages recount brutal beatings, starvation

A toolkit for improving the quality of your LeRobot datasets

Avian Physics 0.4

Ask HN: Which cross platform desktop GUI to use instead of Electron?

Yzma = embedding+inference on VLM/LLM/SLM/TLM in pure Go using llama.cpp

What's the Deal with GitHub Spec Kit

SpaceX got what it needed from Starship V2

Hetzner Network Fault in NBG1

Show HN: Snapdeck – Generating presentation design, not templates

Build what makes you special. Buy the rest

Ask HN: What do you do while ChatGPT-5 is thinking?

What is my landing page missing? What can be improved?

From Nginx to ngrok: Dogfooding our own website with Traffic Policy

Windows 10 support ends on October 14, 2025

How one of the longest dinosaur trackways in the world was uncovered in the UK

Problem Details for HTTP APIs

The problem is humans 'can't stand each other'

I Create Presentations in HTML

Porta Alpina

I let my AI agents run unsupervised and they burned $200 in 2 hours

Comparing the power consumption of a 30 year old refrigerator to a brand new one

Technical Debt: Make Developers Happier Now or Pay More Later

CSI Driver for Rclone

Ouvriers de Luxe

Apple's Jobs says Michael Dell should eat his own words (2006)

AI Avatars Have Been in Politics for a Decade, but Diella Is Different