frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
631•klaussilveira•12h ago•187 comments

Start all of your commands with a comma

https://rhodesmill.org/brandon/2009/commands-with-comma/
19•theblazehen•2d ago•2 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
930•xnx•18h ago•547 comments

What Is Ruliology?

https://writings.stephenwolfram.com/2026/01/what-is-ruliology/
34•helloplanets•4d ago•26 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
110•matheusalmeida•1d ago•28 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
43•videotopia•4d ago•1 comments

Jeffrey Snover: "Welcome to the Room"

https://www.jsnover.com/blog/2026/02/01/welcome-to-the-room/
10•kaonwarb•3d ago•10 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
222•isitcontent•13h ago•25 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
213•dmpetrov•13h ago•103 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
323•vecti•15h ago•142 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
372•ostacke•19h ago•94 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
359•aktau•19h ago•181 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
478•todsacerdoti•21h ago•234 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
275•eljojo•15h ago•164 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
404•lstoll•19h ago•273 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
85•quibono•4d ago•21 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
25•romes•4d ago•3 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
56•kmm•5d ago•3 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
16•jesperordrup•3h ago•9 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
245•i5heu•16h ago•189 comments

Was Benoit Mandelbrot a hedgehog or a fox?

https://arxiv.org/abs/2602.01122
13•bikenaga•3d ago•2 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
53•gfortaine•10h ago•22 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
141•vmatsiiako•18h ago•64 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
281•surprisetalk•3d ago•37 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1060•cdrnsf•22h ago•435 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
133•SerCe•9h ago•118 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
177•limoce•3d ago•96 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
70•phreda4•12h ago•14 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
28•gmays•8h ago•11 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
63•rescrv•20h ago•23 comments
Open in hackernews

Git-Annex

https://git-annex.branchable.com/
225•keepamovin•5mo ago

Comments

EmilStenstrom•5mo ago
Happy to see use cases front and center in command line documentation. They seem to always start with ”obscure command flag that you’ll probably never use”.
ygritte•5mo ago
Could this be abused to simulate something like SVN externals? I always found git submodules to be a very bad replacement for that.
fragmede•5mo ago
GitHub really embraced the Microsoft-esque NIH with LFS, instead of adopting git-annex.
keepamovin•5mo ago
To its absolute detriment

Here is a talk by a person who adores it: Yann Büchau: Staying in Control of your Scientific Data with Git Annex https://www.youtube.com/watch?v=IdRUsn-zB2s

codemac•5mo ago
While Yann has built many things with git-annex, we should be clear that the creator of git-annex is relatively singular, Joey Hess.
keepamovin•5mo ago
Here is a comment about Joey: https://news.ycombinator.com/item?id=14908529

And an interview When power is low, I often hack in the evenings by lantern light. https://usesthis.com/interviews/joey.hess/

mathstuf•5mo ago
While I also find git-annex more elegant, its cross-platform story is weaker. Note that LFS was originally a collaboration between GitHub and Bitbucket (maybe? Some forge vendor I think). One had the implementation and the other had the name. They met at a Git conference and we have what we have today. My main gripes these days are the woefully inadequate limits GitHub has in place for larger projects. Coupled with the "must have all objects locally to satisfy an arbitrary push", any decently sized developer community will blow the limit fairly quickly.

FD: I have contributed to git-lfs.

andrewmcwatters•5mo ago
git-annex has some really awkward documentation.

You can apparently do, sort of, but not really, the same thing git-fetch-file[1] does, with git-annex:

    git fetch-file add https://github.com/icculus/physfs.git "**" lib/physfs-main
    git fetch-file pull
`add` creates this at `.git-remote-files`:

    [file "**"]
    commit = 9d18d36b5a5207b72f473f05e1b2834e347d8144
    target = lib/physfs-main
    repository = https://github.com/icculus/physfs
    branch = main
But git-annex's documentation goes on and on about a bunch of commands I don't really want to read about, whereas those two lines and that .git-remote-files manifest just told you what git-fetch-file does.

[1]: https://github.com/andrewmcwattersandco/git-fetch-file

nolist_policy•5mo ago
Not at all. git-annex is for managing large files in git and unlike git-lfs it preserves the distributed nature of git.
keepamovin•5mo ago
Here is a guide you might like: https://www.youtube.com/watch?v=p0eVyhv2rbk
nolist_policy•5mo ago
I use git-annex to manage all my data on all my drives. It automatically keeps track of which files are on which drives, it ensures that there are enough copies and it checksums everything. It works perfectly with offline drives.

git-annex can be a bit hard to grasp, so I suggest to create a throw-away repository, following the walkthrough[1] and try things out. See also workflows[2].

[1] https://git-annex.branchable.com/walkthrough/

[2] https://git-annex.branchable.com/workflow/

albertzeyer•5mo ago
How much data do you have? I'm using git-annex on my photos, and that are around 100k-1M files, several TB of data, on a ZFS. In the beginning, everything was fine, but it starts to become increasingly slow, such that every operation takes several minutes (5-30 mins or so).

I wonder a bit whether that is ZFS, or git-annex, or maybe my disk, or sth else.

riedel•5mo ago
It would be great to have comprehensive benchmarks for git lsf, git annex, dvc and alike. I am also always getting annoyed with one or the other , e.g. due to the hashing overhead, etc. However, in many cases the annoyances come with bad filesystem integration on Windows in my case.
rurban•5mo ago
My guess is the windows virus scaner
warp•5mo ago
My experience is the same, git-annex just doesn't work well with lots of small files. With annexes on slow USB disks, connected to a Raspberry Pi 3 or 4, I'm already annoyed when working with my largest annex (in file count) of 25000 files.

However, I mostly use annex as a way to archive stuff and make sure I have enough copies in distinct physical locations. So for photos I now just tar them up with one .tar file per family member per year. This works fine for for me for any data I want to keep safe but don't need to access directly very often.

matrss•5mo ago
I had tested a git-annex repository with about 1.5M files and it got pretty slow as well. The plain git repo size grew to multiple GiB and plain git operations were super slow, so I think this was mostly a git limitation. DataLad's approach of nested subdatasets (in practice git submodules where each submodule is a git-annex repository) can help, if it fits the data and workflows.
egwor•5mo ago
One thing to check is whether any security/monitoring software might be causing issues. Since there are so many files in git repos, it can put a lot of load on that type of software.
Borg3•5mo ago
Why? WHY?! Why the heck are you using (D)VFS on your immutable data? What is the reasoning? That stuff is immutable and usually incremental.. Just throw proper syncing algoritm on it and sync w/ backups.. thats all. I wonder aby logic behind this...

Docs and other files you often change is completly different story. This is where DVFS shines. I wrote my own very simple DVFS exacly for that case. You just create directory, init repo manager.. and vioala.. Disk wide VFS is kinda useless as most of your data there just sits..

ncann•5mo ago
I am looking into using Git for my photos/videos backup external HDDs and the reasoning is simple. It's not about keeping track of changes within the files themselves since like you said, they (almost) never change. Rather, it's about keeping track of changes in _folders_. That is, I want to keep track of when I last copied images from my phones, cameras, etc. to my HDDs, which folders did I touch, if I reorganized existing files into a different folder structure then what are the changes, etc. Also it acts as a rollback mechanism if I ever fat finger and delete something accidentally. I wonder if there's a better tool for this though
Borg3•5mo ago
Then I think some syncing software like rsync will probably be better. Now sure how often you keep changing archived folders. I split that work TRASH like dirs and archives. When I done w/ files, I move them out of TRASH do proper place and that it. I prefer KISS aproach, but whatever works for you :)
albertzeyer•5mo ago
I don't really need the versioning aspect too much, but sometimes I modify the photos a bit (e.g. rotating or so). But all the other things are relevant for me, like having it distributed, syncing, only partially having the data on a particular node, etc.

So, what solution would be better for that? In the end it seems that other solutions provide a similar set of features. E.g. Syncthing.

But what's the downside with Git-annex over Syncthing or other solutions?

Borg3•5mo ago
If you want two-way distributed syncing, that is a bit more complicated and error prone, but most tools support it, even rsync. Simpler aproach is to have central primary node (whatever it desktop or storage) when you sync copy data and sync it to backups.

As I said, handling immutable data (incremental) is easy. You just copy and sync. Kinda trival. The problem I had personaly was all the importand docs (and similar) files I work on. First, I wanted snapshots and history, in case of some mistake or failure. Data checksuming, because they are importand. Also, full peer2peer syncing because I have desktop, servers, VMs, laptop, so I want to sync data around. And because I really like GIT, great tool for VCS, I wanted something similar but for generic binary data. Hence I interested in DVFS system. First I wanted full blown mountable DVFS system, but that is complicated and much harder to make it portable.. Repository aproach is easy to implement and is portable (Cygwin, Linux, UNIX, Posix). Works like a charm.

As for downside, If you think git-annex will work for you, just use it :) For me, it was far too complicated (too much moving parts) even for my DVFS usecase. For immutable data is absolutly overkill, to keep 100s of GBs of data there. I just sync :)

alexdme•5mo ago
I also used to use git-annex on my photos, ended up getting frustrated with how slow it was and wrote aegis[1] to solve my use case.

I wrote a bit about why in the readme (see archiving vs backup). In my opinion, syncing, snapshots, and backup tools like restic are great but fundamentally solve a different problem from what I want out of an archive tool like aegis, git-annex, or boar[2].

I want my backups to be automatic and transparent, for that restic is a great tool. But for my photos, my important documents and other immutable data, I want to manually accept or reject any change that happens to them, since I might not always notice when something changes. For example if I fat finger an rm, or a bug in a program overrides something and I don't notice.

[1]: https://git.sr.ht/~alexdavid/aegis

[2]: https://github.com/mekberg/boar

fer•5mo ago
While I understand why git-annex wouldn't work for you, what gaps did you find in boar?
seanw444•5mo ago
I might have to give aegis a try.
mananaysiempre•5mo ago
> Why the heck are you using (D)VFS on your immutable data?

Git-annex does not put your data in Git. What it tracks using Git is what’s available where, updating that data on an eventually consistent basis whenever two storage sites come into contact. It also borrows Git functionality for tracking moves, renames, etc. The object-storage parts, on the other hand, are essentially a separate content-addressable store from the normal one Git uses for its objects.

(The concrete form of a git-annex worktree is a Git-tracked tree of symlinks pointing to .git/annex/objects under the repo root, where the actual data is stored as read-only files, plus location-tracking data indexed by object hash in a separate branch called “git-annex”, which the git-annex commands manipulate using special merge strategies.)

integralid•5mo ago
Why... not? Git just works for syncing data and version control and we're all familiar with it. It is also secure, reliable, available everywhere, decentralized, with built-in access control, deduplication, e2ee with gitcrypt... In short, it is great.

The problem is performance in some use cases, but I don't see anything fundamentally wrong with using git for sync.

Borg3•5mo ago
Git wasnt designed for generic binary blob handling. Sure, if you repo is small and you set proper .gitattributes, it will work fine. But I would advise to use generic DVFS for such task.
_Algernon_•5mo ago
I have thought about doing this in the past but ran into issues (one of them being the friction in permanently deleting files once added). I'd be curious how you use it if you have time to share.
Munksgaard•5mo ago
Git-Annex is a cool piece of technology, but my impression is that it works best for single-user repositories. So for instance, as @nolist_policy described in a sibling comment, managing all your personal files, documents, music, etc. across many different devices.

I tried using it for syncing large files in a collaborative repository, and the use of "magic" branches didn't seem to scale well.

geephroh•5mo ago
YMMV, but it works for my org. We're an archival institution and have been using git-annex for more than a decade as the storage backend for a digital repository system designed for long-term, robust preservation. Admittedly, we only have 15-20 staff; but +30TB of data, ~750K files (binaries + metadata), across hundreds of collection repos.
eigengrau•5mo ago
How (if you saw that need) did you address permissions concerns, e.g., around any Git users being able to force drop all files from a backend?

Back (long time ago) when I was looking into this, there was no KISS, out-of-the-box way to manage the Git Annex operations a Git user would be allowed to perform. Gitolite (or whatever Git platform of choice) can address access control concerns for regular Git pushes, but there is no way to define policies on Git Annex operations (configuration, storage management).

Might not be super hard to create a Gitolite plugin to address these, but ultimately for my use-case it wasn’t worth the effort (I didn’t really need shared Git Annex repos). Do you tackle these concerns somehow? I guess if people don’t interact with your repositories via Git/SSH but only through some custom UI, you might deal with it there.

ttiurani•5mo ago
Relevant discussion 9 days ago about the new native git large object promisers in "The future of large files in Git is Git":

https://news.ycombinator.com/item?id=44916783

avar•5mo ago
Thanks, also not-so-relevant, for the reasons I noted in a comment in that thread: https://news.ycombinator.com/item?id=44922405

I.e. annex is really in a different problem space than "big files in git", despite the obvious overlap.

A good way to think about it is that git-annex is sort of a git-native and distributed solution to the storage problem at the "other side" ("server side") of something like LFS, and to reason about it from there.

internet_points•5mo ago
The page doesn't say it, but git-annex was created by https://www.patreon.com/joeyh who also made the wonderful https://joeyh.name/code/moreutils/ and https://etckeeper.branchable.com/
kstrauser•5mo ago
More famously (to me), he was a core Debian contributor to Debian for a couple of decades, starting in 1996. A pretty big chunk of what we think of as Linux came from his keyboard.
kajika91•5mo ago
I'm using my self-hosted forgejo. I don't see any benefit of git-annex over LFS so far, I'm not even sure I could setup annex as easily.

Digging a little bit I found that git-annex is coded in haskell (not a fan) and seems to be 50% slower (expected from haskell but also only 1 source so far so not really reliable).

I don't see appeal of the complexity of the commands, they probably serve a purpose. Once you opened a .gitattributes from git-LFS you pretty much know all you need and you barely need any commands anymore.

Also I like how setting up a .gitattribute makes everything transparent the same way .gitignore works. I don't see any equivalent with git-annex.

Lastly any "tutorial" or guide about git-annex that won't show me an equivalent of 'git lfs ls-files' will definitely not appeal to me. I'm a big user of 'git status' and 'git lfs ls-files' to check/re-check everything.

stv0g•5mo ago
There is a soft-fork of Forgejo which adds support for git-annex:

https://codeberg.org/forgejo-aneksajo/forgejo-aneksajo

avar•5mo ago
Annex isn't slow because it's written in Haskell, it tends to be slow because of I/O and paranoia that's warranted as the default behavior in a distributed backup tool.

E.g. if you drop something it'll by default check the remotes it has access to for that content in real time, it can be many orders of magnitude faster to use --fast etc., to (somewhat unsafely) skip all that and trust whatever metadata you have a local copy of.

seanparsons•5mo ago
LFS and git-annex have subtly different use cases in my experience. LFS is for users developing something with git that has large files in the repo like the classic game development example. git-annex is something you'd use to keep some important stuff backed up which happens to involve large files, like a home folder with music or whatever in it. In my case I do the latter.
aragilar•5mo ago
What it works really well at is storing research data. LFS can't upload to arbitrary webdav/S3/sharepoint/other random cloud service.
aragilar•5mo ago
How big are the repos you have? The largest git-annex repo I have is multiple TB (spread across multiple systems), with some files 10s of GB.

I'm not sure what you are doing, but from looking at the git-lfs-ls-files manpage `git annex list --in here` is likely what you want?

goku12•5mo ago
My only problem with git-annex is Haskell. I don't hate the language itself, but the sheer number of dependencies it has to install is staggering. Many of those dependencies are not used by anything else, or may be incompatible versions when more than one application uses it. The pain is when you install them using the system package manager. Just two Haskell applications - annex and pandoc - are enough to fill your daily updates with may be a dozen little Haskell packages. God forbid you're on a distro that installs from source!

It's quite safe to just statically link most, if not all of them directly into the application, even when some of them are shared by other applications. I have seen this complaint repeated a few times. The reply from the Haskelliers seem to be that this is for the fine grained modularity of the library ecosystem. But why do they treat it like everything starts and ends with Haskell? Sometimes, there are other priorities like system administration. None of the other compiled languages have this problem - Rust, Go, Zig, ... Even plain old C and C++ aren't this frustrating with dependencies.

I need to clarify that I'm not hostile towards the Haskell language, its ecosystem and its users. It's something I plan to learn myself. But why does this problem exist? And is there a solution?

aragilar•5mo ago
Which package manager are you using? I've not seen any issues with apt-based systems with Haskell?
goku12•5mo ago
I used to have issues on Arch/pacman. Now on ebuilds/Gentoo.
IsTom•5mo ago
> It's quite safe to just statically link most, if not all of them directly into the application

If you're talking about distro's repos, isn't this a matter of distro and package manager policy?

goku12•5mo ago
Very likely. But then, why is this an issue with Haskell alone?
zeendo•5mo ago
As a full time Haskell developer, I have a similar aversion to Haskell-based distro packages which aren't statically linked.

There ARE statically linked Haskell packages in the AUR so it's at least feasible. I haven't even dug into the conversations around why packagers are insisting on dynamic linking of distro packages - I just avoid them for the same reasons you mention.

I can't really speak confidently to why it is exactly - I can only guess. Clearly dynamic linking makes sense in a lot of cases for internal application distribution - which is where Haskell is often used - so maybe people are incorrectly projecting that onto distro packages?

wonger_•5mo ago
Yes! Every time I `pacman -Syu`, half of the updates are Haskell packages. I think from pandoc and shellcheck iirc?
goku12•5mo ago
Pandoc and git-annex in my case. Basically any substantial application written in Haskell.
GZGavinZhao•5mo ago
The Haskell tooling already supports statically linking in the dependencies, I maintain the Haskell stack for Solus and for pandoc we just have a single binary that only depends on libc, all other Haskell dependencies are statically linked inside the binary just like how Rust dependencies work. So it's definitely doable.

I think it's more of the distro's maintainers' choice. For Solus the amount of dependencies was just too much for me to handle so we resorted to static linking.

goku12•5mo ago
Why do some distro maintainers choose otherwise? I have seen some users complaint about this before, without any correction being done. So my guess is that there is something that forces those distro maintainers to make that choice.
andunie•5mo ago
I've used this for years, but to me the big selling point was integration with cloud storage providers as a means of backup. That, however, was always flaky and dependent on unmaintained third-party plugins. I think there was also a bug at some point that caused some data inconsistencies, so eventually I stopped.

Does anyone know if the situation has improved on that front in the past 5 years?

matrss•5mo ago
Depends on the cloud storage provider, I think. The best chances are with those that support the more standard protocols like S3, webdav, sftp, etc.. A relatively new development is the special remote built into rclone, which should be better maintained than some other third-party special remotes and provides access to all rclone-supported remotes.
andunie•5mo ago
Oh, really? rclone is great, but as a standalone thing it's really annoying to use. I didn't realize until now that it was missing a git-annex integration to be great. Thank you! I'll start using it again.
BrandiATMuhkuh•5mo ago
Does this also work if I have data on SharePoint, DropBox, etc. and want to pull them (sync with local machine)?

My use case is mostly ETL related, where I want to pull all customers data (enterprise customer) so I can process them. But also keep the data updated, hence pull?

matrss•5mo ago
In an ideal world the rclone special remote would support git-annex' importtree feature. Then you could periodically run `git annex import <branch>:<subdir> --from <sharepoint/dropbox-remote>` to "pull" from those remotes (it is not really a pull as you aren't fetching version-controlled data from those remotes, rather you are importing from a non-version-controlled source and record the result as a new revision).

Unfortunately this is not (yet?) supported I think. But you could also just do something like this: `rclone copy/sync <sharepoint/dropbox-remote>: ./<local-directory> && git annex add ./<local-directory> && git commit -m <message>`.

gradientsrneat•5mo ago
git-annex does support rclone as a special remote iirc
matrss•5mo ago
Yes it does, but I don't think the special remote supports the importtree feature, which would be necessary for this.
the_absurdist•5mo ago
Ironically, I just spent a day last weekend writing my own version control system for large files

I dislike git-annex that much.

- it converts your files into blobs and bloats your file system

- As others have previously alluded, my primary use case is to ensure sync between distributed files, not version them (why would anyone possibly need that??)

- You can use AI to build a python based solution that will hash your files and put them into a lookup table, then create some helper methods to sync sources using rclone

Far simpler and more efficient methods exist.