frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
232•theblazehen•2d ago•67 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
694•klaussilveira•15h ago•206 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
6•AlexeyBrin•59m ago•0 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
962•xnx•20h ago•554 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
130•matheusalmeida•2d ago•35 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
67•videotopia•4d ago•6 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
53•jesperordrup•5h ago•24 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
36•kaonwarb•3d ago•27 comments

ga68, the GNU Algol 68 Compiler – FOSDEM 2026 [video]

https://fosdem.org/2026/schedule/event/PEXRTN-ga68-intro/
10•matt_d•3d ago•2 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
236•isitcontent•15h ago•26 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
233•dmpetrov•16h ago•124 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
32•speckx•3d ago•21 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
335•vecti•17h ago•147 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
502•todsacerdoti•23h ago•244 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
386•ostacke•21h ago•97 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
300•eljojo•18h ago•186 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
361•aktau•22h ago•185 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
8•__natty__•3h ago•0 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
424•lstoll•21h ago•282 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
68•kmm•5d ago•10 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
96•quibono•4d ago•22 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
21•bikenaga•3d ago•11 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
19•1vuio0pswjnm7•1h ago•5 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
264•i5heu•18h ago•216 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
33•romes•4d ago•3 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
64•gfortaine•13h ago•28 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1076•cdrnsf•1d ago•460 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
39•gmays•10h ago•13 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
298•surprisetalk•3d ago•44 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
154•vmatsiiako•20h ago•72 comments
Open in hackernews

Rsync's defaults are not always enough

https://rachelbythebay.com/w/2025/05/31/sync/
38•rcarmo•8mo ago

Comments

baobun•8mo ago
Are rsync defaults ever enough?

I find "-apHAX" a sane default for most use and memorable enough. (think wardriving)

Very common contextuals:

-c (as mentioned, when you care about integrity)

-n (dryrun)

-v (verbose)

-z (compress when remoting)

Where it applies I usually do the actual syncing without -c and then follow up with -cnv to see if anything is off.

pabs3•8mo ago
-A implies -p btw
imtringued•8mo ago
ap is an acronym for access point
pabs3•8mo ago
Also, you are missing -S to preserve sparseness btw.
wodenokoto•8mo ago
Author doesn’t explain what happened or why the proposed flags will solve the problem.
stevekemp•8mo ago
Author explained that by default if two files are the same size, and have the same modification date/time, that rsync will assume they're identical, WITHOUT CHECKING THAT.

Author clarifies there are flags to change that behaviour, to make it actually compare file contents, and then shares those names.

It seems like you didn't read the article.

XorNot•8mo ago
They also have to have the same name though. The actual chances of this situation happening and persisting long enough to matter are pretty damn small.
rob_c•8mo ago
Genuinely that's a feature not a bug. If you didn't rt"friendly"m the problem explicitly exists between keyboard/vr-headset and chair/standing-desk.

This should never be a surprise to people unless this is their first time using Unix.

wodenokoto•8mo ago
Short of checking every single byte against each other you need to do some sort of short hand.

“Assume these two files are the same, check if they either system is saying they have modified it or check if the size has changed and call it a day” is pretty fair of an assumption and something even I knew RSync is doing and I’ve only used it once in a project 10 years ago. I am sure Rachel also knows this.

So, what is the problem? Is data not being synced? Is data being synced too often? And why do these assumptions lead to either happening? What horrors is the author expecting the reader to see when running the suggested command?

That is what is not explained in the article.

mustache_kimono•8mo ago
> Author doesn’t explain what happened or why the proposed flags will solve the problem.

Probably because she/he doesn't know. Could be lots of things, because FYI mtime can be modified by the user. Go `touch` a file.

In all likelihood, it happens because of a package installation, where a package install sets the same mtime, on a file which has the same sized, but has different file contents. That's where I usually see it.

`httm` allows one to dedup snapshot versions by size, then hash the contents of identical sized versions for this very reason.

    --dedup-by[=<DEDUP_BY>] comparing file versions solely on the basis of size and modify time (the default "metadata" behavior) may return what appear to be "false positives".  This is because metadata, specifically modify time and size, is not a precise measure of whether a file has actually changed. A program might overwrite a file with the same contents, and/or a user can simply update the modify time via 'touch'. When specified with the "contents" option, httm compares the actual file contents of same-sized file versions, overriding the default "metadata" only behavior...
account42•8mo ago
It is worth nothing that rsync doesn't compare just by size and mtime but also (relative) path - i.e. it normally compares an old copy of a file with the current version of the same file. So the likelyhood of "collisions" is much smaller than a file de-duplicating tool that compares random files.
mustache_kimono•8mo ago
I think you may misunderstand what httm does. httm prints the size, date and corresponding locations of available unique versions of files residing on snapshots.

And -- this makes it quite effective at proving how often this happens:

    > httm -n --dedup-by=contents /usr/bin/ounce | wc -l
    3
    > httm -n --dedup-by=metadata /usr/bin/ounce | wc -l
    30
benmmurphy•8mo ago
there is also some weirdness or used to be some weirdness with linux and modifying shared libraries. for example if you have a process is using a shared library and the contents of the file is modified (same inode) then what behaviour is expected? i think there are two main problems

1) pages from the shared library are lazily loaded into memory so if you try and access a new page you are going to get it from the new binary which is likely to cause problems

2) pages from the shared library might be 'swapped' back to disk due to memory pressure. not sure whether the pager will just throw the page away and try to swap back in from disk from the new file contents or if it will notice the disk page is dirty and use the swap for write back to preserve the original page.

also, i remember it used to be possible to trigger some error if you tried to open a shared library for writing while it was in use but I can't seem to trigger that error anymore.

reisse•8mo ago
Checksumming if not run on a spinning rust should've been default for years now.
throw93849494•8mo ago
Rsync also does not work with mounted filesystems, docker, snaps...
account42•8mo ago
What is that supposed to mean? It absolutely can work both restricted to a single file system or across file systems.
dedicate•8mo ago
Man, reading this just makes me wonder again why -c (checksum) isn't the default in rsync by now, especially with SSDs everywhere. Is it really just about that tiny bit of speed?
0rzech•8mo ago
Checksums are slow on HDDs and using SSDs for backups is a bad idea, particularly for cold backups, due to data longevity and the tendency for SSDs to experience sudden total failures instead of gradual degradation.
Aachen•8mo ago
The -c makes it do more reads, though, not unnecessary writes

Degradation won't get worse except when a file changed without metadata having been modified, but then that's exactly what you want

0rzech•8mo ago
Yes, but it's orthogonal to my comment.
kissgyorgy•8mo ago
It's not a tiny bit at all. For checksum, the whole file need to be read beforehand, the metadata is just a quick lookup. It's order of magnitude slower and eats I/O too.
account42•8mo ago
Because changing the defaults of command-line tools is generally a shitty thing to do.

And the speed difference is only tiny for tiny files.

nithssh•8mo ago
There are a lot of reasons why just making a copy of the files you need to another FS is not sufficient as a backup, clearly this is one of those. We need more checks to ensure integrity and robustness.

BorgBackup is clearly quite good as an option.

mustache_kimono•8mo ago
> BorgBackup is clearly quite good as an option.

After one enables rsync with checksums, doesn't Borg have the same issue? I believe Borg needs to do the same rolling checksum over all the data, now, as well?

ZFS sounds like the better option -- just take the last local snapshot transaction, then compare to the transaction of the last sent snapshot, and send everything in between.

And the problem re: Borg and rsync isn't just the cost of reading back and checksumming the data -- for 100,000s of small files (1000s of home directories on spinning rust), it is the speed of those many metadata ops too.

fpoling•8mo ago
As with rsync borg does not read files if their timestamp/length do not change since the last backup. And for million files on modern SSD it takes just few seconds to read their metadata.
mustache_kimono•8mo ago
> As with rsync borg does not read files if their timestamp/length do not change since the last backup.

...but isn't that the problem described in the article? If that is the case, Borg would seem to the worst of all possible worlds, because now one can't count on its checksums?

fpoling•8mo ago
If one worries about bitrot, the backup tools are not good place to detect that. Using a filesystem with native checksums is the way to go.

If one worries about silent file modifications that alters content but keep timestamp and length, then this sounds like malware and, as such, the backup tools are not the right tool to deal with that.

zamadatix•8mo ago
The latter type case is what the article is talking about though. At the same time, as the article also discusses, it's unlikely to have actually been caused by malware vs something like a poorly packaged update.

Backup tools should deal with file changes lacking corresponding metadata changes despite it being more convenient to say the system should just always work ideally. At the end of the day the goal of a backup tool is to backup the data, not to skip some of the data because it's faster.

rob_c•8mo ago
Amen!
mustache_kimono•8mo ago
> If one worries about bitrot, the backup tools are not good place to detect that. Using a filesystem with native checksums is the way to go.

Agreed. But I think that elides the point of the article which was "I worry about backing up all my data with my userspace tool."

As noted above, Borg and rsync seem to fail here, because it's wild how much the metadata can screw with you.

> If one worries about silent file modifications that alters content but keep timestamp and length, then this sounds like malware and, as such, the backup tools are not the right tool to deal with that.

Seen this happen all the time in non-malware situations, in what we might call broken software situations, where your packaging software or your update app tinker with mtimes.

I develop an app, httm, which prints the size, date and corresponding locations of available unique versions of files residing on snapshots. And -- this makes it quite effective at proving how often this can happen on Ubuntu/Debian:

    > httm -n --dedup-by=contents /usr/bin/ounce | wc -l
    3
    > httm -n --dedup-by=metadata /usr/bin/ounce | wc -l
    30
kissgyorgy•8mo ago
Maybe NixOS spoiled me, but I would not copy system files at all. The only data worth backing up is app state and your files usually under /home or /var/lib/whatever.
porridgeraisin•8mo ago
Yeah. But the system files add so little to the backup size that I just backup everything except tmpfs and xdg cache anyways. I also reduce backup size by simply not backing up ~/{Pictures,Videos}. I manually put those in the google photos account my phone backs up to since they very rarely change.
password4321•8mo ago
What are today's best options for programmatically determining the difference between user and system files for normal Windows users?
teekert•8mo ago
Everything is system files except for /home and var/lib/whatever in WSL ;) ;)

Or, since installing is such a pain, perhaps better consider everything user files ;) ;)

dezgeg•8mo ago
I wish checksumming filesystems had some interface to expose the internal checksums. Maybe it wouldn't be useful for rsync though as filesystems should have the freedom to pick the best algorithm (so filesystem checksum of a file on different machines would be allowed to differ e.g. if filesystem block size was different). But so that e.g. git and build systems could use it to tell 'these 2 files under a same directory tree are definitely identical'.
mustache_kimono•8mo ago
I actually once suggested this to ZFS, and saw some push back! See: https://github.com/issues/created?issue=openzfs%7Czfs%7C1453...

Maybe someone else will have better luck than me.

hello_computer•8mo ago
If they expose it, that ties them to a particular hash algo. Hash algos are as much art as science, and the opportunities for hardware acceleration vary from chip to chip, so maintaining the leeway to move from one algo to another is kind of important.
ThatPlayer•8mo ago
Also similarly is filesystem's transparent compression. The checksum should be on compressed blocks, to check everything without having to decompress them during a scrub.

You'd need to match compression settings on both ends. A different number of threads used will change the result too, would probably change depending on the hardware.

Would also apply to encryption. Probably shouldn't be using the same encryption key on different filesystems.

Or if you're using bcachefs with background compression, compression might not even happen till later.

rob_c•8mo ago
Sorry to break it to you. That's not about luck. You've asked for something which is nonsense if you want to "recycle" compute used to checksum records.

If you want them to store the checksum of the POSIX object as an attribute (we can argue about performance later) great, but using the checksums intrinsic to the zfs technology to avoid bitflips directly is a bad call.

mustache_kimono•8mo ago
> You've asked for something which is nonsense if you want to "recycle" compute used to checksum records.

As you will note from my request and discussion, I'm perfectly willing to accept I might want something silly.

Would you care to explain why you think this feature is wrongheaded?

> using the checksums intrinsic to the zfs technology to avoid bitflips directly is a bad call.

You should read the discussion. I was requesting for a different purpose, although this "rsync" issue is an alternative purpose. I wanted to compare file versions on snapshots and also the live version to find all unique file versions.

rob_c•8mo ago
> You should read the discussion. I was requesting for a different purpose, although this "rsync" issue is an alternative purpose. I wanted to compare file versions on snapshots and also the live version to find all unique file versions.

I have. I didn't need to. But I have.

And agree with the experts there and here... If you're struggling to follow I'm happy to explain in _great_ detail how you're off the mark. You have a nice idea, but it's unfortunately too naïve and is probably built on hearing "the filesystem stores checksums". Everything that is said as to why this is a bad idea is the same for btrfs too.

As I said, clear as day:

> If you want them to store the checksum of the POSIX object as an attribute...

This is what you _should_ be asking for. There are ways of building this even which _do_ recycle cpu cycles. But it's messy, oh god is it awkward, and by god it makes things so difficult to follow that the filesystem would suffer for want of this small feature.

If you're looking to store the checksum of the complete POSIX object _at write_, _as it's stored on disk_ for _that internal revision of the filesystem_ then it kinda by definition is turning into an extended POSIX attribute associated with that POSIX object.

Even if implemented, this is messy as it needs to be revised and amended and checked and updated and there will be multiple algorithms with different advantages/draw-backs.

I know because I work in a job where we replicate and distribute multiple 100s of PB globally. The only way this has been found to work and scale the way you want is to store the checksums alongside the data, either as additional POSIX objects on the filesystem, or in a db which is integrated and kept in sync with the filesystem itself.

People will and do burn a few extra cycles to avoid having unmaintainable extensions and pieces of code.

If you are worrying about data within individual records changing and replicating/transmitting/storing record-level changes (which would be the articles main complaint about rsync) ZFS has this in send/recv.

Again, as is being stated elsewhere here:

If you're concerned about data integrity handle it in the FS. If you're concerned about transfer integrity, handle it over the wire.

> Don't mix these up, it just leads to a painful view of the world.

heinrich5991•8mo ago
Working link: https://github.com/openzfs/zfs/issues/14536.
jauntywundrkind•8mo ago
That could also be useful for webserver etags too. Neat that the information is available! Just to admins only.

TIL that btrfs's checksums are per block, not per file. There's a dump-csum command, but doesn't seem likely to be very useful. https://unix.stackexchange.com/questions/191754/how-do-i-vie...

KWxIUElW8Xt0tD9•8mo ago
I moved to ZFS to get some guarantees regarding long-term storage of important data -- e.g. family pictures.
nottorp•8mo ago
> That's totally a problem that you have to worry about, especially if you're using SSDs to hold your bits and those SSDs aren't always being powered.

This deserves being in all caps.

1oooqooq•8mo ago
rsync is perfect. just have decent fs on both sides.
rob_c•8mo ago
It's 95-99% of the way there, but most of the corner cases are users wanting it to be sentient on their behalf I'll agree.
dspillett•8mo ago
No tool's defaults are always enough. Otherwise they'd be fixed settings not changeable defaults.

> "ooh, bit rot" and other things where one of the files has actually become corrupted while "at rest" for whatever reason. Those observers are right!

Yep. This is why you verify your backups occasionally. And perhaps your local “static” resources too depending on your accident/attack threat models and “time to recovery” requirements in the event of something going wrong.

> the first time you do a forced-checksum run, --dry-run will let you see the changes before it blows anything away, so you can make the call as to which version is the right one!

That reads like someone is not doing snapshot backups, or at least not doing backups in a way that means the past snapshots are not protected from being molested by the systems being backed up. This is a mistake, and not one rsync or any other tool can reliably save you from.

But yes, --dry-run is a good idea before any config change. Or just generally in combination with a checksum based run as part of a sanity check procedure (though as rsync is my backup tool, I prefer to verify my backups with a different method, checksums generated by another tool, or direct comparison between snapshots/current by another tool, just-in-case a bug in rsync is causing an issue that verification using rsync cannot detect because the bug affects said verification in the same way).

bo0tzz•8mo ago
This is one of many reasons to avoid rsync for "backups" and instead use an actual backup tool.