At least a cursory glance at the repo suggests it might: https://github.com/Homebrew/brew/blob/700d67a85e0129ab8a893f...
The only situation where the RCE here is a problem is if you clone github repos containing data you don't want to execute. That's fairly unusual.
How does this achieve “remote code execution” as the article states? How serious is it from a security perspective?
> I'm not sharing a PoC yet, but it is an almost trivial modification of an exploit for CVE-2024-32002. There is also a test in the commit fixing it that should give large hints.
EDIT: from the CVE-2024-32002
> Repositories with submodules can be crafted in a way that exploits a bug in Git whereby it can be fooled into writing files not into the submodule's worktree but into a .git/ directory. This allows writing a hook that will be executed while the clone operation is still running, giving the user no opportunity to inspect the code that is being executed.
So a repository can contain a malicious git hook. Normally git hooks aren’t installed by ‘git clone’, but this exploit allows one to, and a git hook can run during the clone operation.
> When reading a configuration value, Git will strip any trailing carriage return (CR) and line feed (LF) characters. When writing a configuration value, however, Git does not quote trailing CR characters, causing them to be lost when they are read later on. When initializing a submodule whose path contains a trailing CR character, the stripped path is used, causing the submodule to be checked out in the wrong place.
> If a symlink already exists between the stripped path and the submodule’s hooks directory, an attacker can execute arbitrary code through the submodule’s post-checkout hook.
Along with a bunch of other git CVEs that are worth a look.
Some previous bugs have resulted in validation added to git fsck, but because clone URLs can't change after the submodules are initialised that's not going to have any benefit here. (There were some defence-in-depth measures discussed, there's definitely a few things that can be improved here.)
There are places for clever hand code, even in C, even in the modern world. Data interchange is very not much not one of them. Just don't do this. If you want .ini, use toml. Use JSON if you don't. Even YAML is OK. Those with a penchant for pain like XML. And if you have convinced yourself your format must be binary (you're wrong, it doesn't), protobufs are there for you.
But absolutely, positively, never write a parser unless your job title is "programming language author". Use a library for this, even if you don't use libraries for anything else.
[1] Fine fine, lexer. We are nitpicking, after all.
If CR is used correctly in windows, then its behaviour is already covered by the LF case (as required for POSIX systems) and if CR is used incorrectly then you end up with all kinds of weird edge cases. So you’re much better off just jumping over that character entirely.
I can’t imagine anyone targeting macOS 9 (or earlier) systems these days but you’re right that it’s an edge case people should be aware of.
If the format is not sensitive to additional empty lines then converting them all CR to LF in-place is likely a safer approach, or a tokenizer that coalesces all sequential CR/LF characters into a single EOL token.
I write a lot of software that parses control protocols, the differences between the firmware from a single manufacturer on different devices is astonishing! I find it shocking the number that actually have no delimiters or packet length.
If you’re targeting iMacs or the Commodore 64, then sure, it’s something to be mindful of. But I’d wager you’d have bigger compatibility problems before you even get to line endings.
Is there some other edge cases regarding CR that I’ve missed? Or are you thinking ultra defensively (from a security standpoint)?
That said, I do like your suggestion of treating CR like LF where the schema isn’t sensitive to line numbering. Unfortunately for my use case, line numbering does matter somewhat. So would be good to understand if I have a ticking time bomb
So you’re better off accepting the edge cases problems that white space introduces considering the benefits outweighs the pain.
Or are you talking about SP preceding CR and/or LF?
See here for example: https://en.cppreference.com/w/c/string/byte/isspace
Or here for Unicode: https://en.wikipedia.org/wiki/Whitespace_character#Unicode
Thanks for the responses
[0] https://protobuf.dev/reference/protobuf/textformat-spec/
Back then there wasn’t a whole lot of options besides rolling their own.
And when you also consider that git had to be somewhat rushed (due to Linux kernel developers having their licenses revoked for BitKeeper) and the fact that git was originally written and maintained by Linus himself, it’s a little more understandable that he wrote his own config file parser.
Under the same circumstances, I definitely would have done that too. In fact back then, I literally did write my own config file parsers.
And consider that consequences of such bug would be much worse if it was in a standard system library. At least here it is limited mostly to developers where machines are updated.
Every file format is underspecified in some way when you use it on enough platforms and protocols, unless the format is actually a reference implementation, and we've had enough problems with those. There's a reason IETF usually demands two independent implementations.
Similar problems can affect case insensitive filesystems, or when moving data between different locales which affect UTF-8 normalization. It's not surprising that an almost identical CVE was just one year ago.
Be careful what you wish for. They could have used yaml instead of ini for the config format, and we would have had more security issues over the years, not less.
> I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages
For Christ's sake, Turing taught us that any error in one language is possible in any other language. You can even get double free in Rust if you take the time to build an entire machine emulator and then run something that uses Malloc in the ensuing VM. Rust and similar memory safe languages can emulate literally any problem C can make a mine field out of.. but logic errors being "possible" to perform are significantly different from logic errors being the first tool available to pull out of one's toolbox.
Other comments have cited that in non-C languages a person would be more likely to reach for a security-hardened library first, which I agree might be helpful.. but replies to those comments also correctly point out that this trades one problem for another with dependency hell, and I would add on top of that the issue that a widely relied upon library can also increase the surface area of attack when a novel exploit gets found in it. Libraries can be a very powerful tool but neither are they a panacea.
I would argue that the real value in a more data-safe language (be that Rust or Haskell or LISP et al) is in offering the built-in abstractions which lend themselves to more carefully modeling data than as a firehose of octets which a person then assumes they need to state-switch over like some kind of raw Turing machine.
"Parse, don't validate" is a lot easier to stick to when you're coding in a language designed with a precept like that in mind vs a language designed to be only slightly more abstract than machine code where one can merely be grateful that they aren't forced to use jump instructions for every control flow action.
One tool I'd have probably reached for (long before having heard of this particular corner case to avoid) would have been whitespace trimming, and CR counts as whitespace. Plus folk outside of C are also more likely to aim a regex at a line they want to parse, and anyone who's been writing regex for more than 5 minutes gets into the habit of adding `\s*` adjacent to beginning of line and end of line markers (and outside of capture groups) which in this case achieves the same end.
I can't think of any features in Rust that will lead someone away from this pattern of error, where this pattern of error is not realizing that round-tripping the serialized output back through the deserializer can change the boundaries of line endings. It's really easy to think "if I have a bunch of single-line strings and I join them with newlines I now have multiline text, and I can split that back up into individual lines and get back what I started with". This is doubly true if you start with a parser that splits on newline characters and then change it after the fact to use BufRead::lines() in response to someone telling you it doesn't work on Windows.
It sounds defeatist, but non-trivial parsers end up with a huge state space very quickly - and entirely strange error situations and problematic inputs. And "non-trivial" starts a lot sooner than one would assume. As the article shows, even "one element per line" ends up non-trivial once you support two platforms. "foo\r\n" could be tokenized/parsed in 3 or even 4 different ways or so.
It just becomes worse from there. And then Unicode happened.
That doesn't have any relevance to a discussion about memory safety in C vs rust. Invalid memory access in the emulated machine won't be able to access memory from other processes on the host system. Two languages being turing complete does not make them the same language. And it definitely does not make them bug for bug compatible. Rust _really_ does enable you to write memory safe programs.
And that's exactly what the Git developers did here: They made an in-house configuration file format. If implemented in Rust, it would bypass most of Rust's safety features, particularly, type-safety.
No, just no. I'm sorry, Ive implemented countless custom formats in Rust and have NEVER had to side step safe/unsafe or otherwise sacrifice type safety. Just what an absurd claim.
Maybe for some binary (de)serialization you get fancy (lol and are still likely to be far better off than with C) but goodness, I cannot imagine a single reason why a config file parser would need to be (type)-unsafe.
The git bug in question could be written in 100% safe rust using as much or as little of the type system[1] as you want. It's a logic error when parsing a string.
I dev rust full-time, and I've spent a lot of time writing protocol parsers. It's easy to forget to check this or that byte/string for every possible edge case as you're parsing it into some rust type, and happens all the time in rust, just like it did in C or python or go when I used those languages. This bug (if anything) is the type of thing that is solved with good tokenizer design and testing, and using more small, independently tested functions - again not at all related to the type system.
[1] Although in rust you can arrange your types so that this sort of bug is harder to implement or easier to catch than in most languages... but doing that requires an up-front understanding that logic bugs are just as possible in rust as in other languages, as well as some experience to avoid awkwardness when setting the types up.
No, this wouldn't be a double free in Rust, it'd be a double free in whatever language you used to write the emulated code.
The distinction is meaningful, because the logic error he's talking about is possible in actual rust (even without unsafe), not just theoretically in some virtual system that you can use Rust to write a simulation for.
If you write a virtual machine in a memory safe language you can simulate a double free inside the VM, but the VM won't have the same memory contents and connections to the outside world as the real machine. You won't get the same outcome.
"Parse, don't validate" is easily doable in plain C and almost always has been. See https://www.lelanthran.com/chap13/content.html
Why git does not use Landlock? I know it is Linux-only, but why? "git clone" should only have r/o access to config directory and r/w to clone directory. And no subprocesses. In every exploit demo: "Yep, <s>it goes to a square hole</s> it launches a calculator".
I guess you're okay with breaking all git hooks, including post-checkout, because those are subprocesses as a feature.
You can always run your git operations in a container with seccomp or such if you're not using any of the many features that it breaks
Drop a git-something executable in your path and you can execute it as git something.
To your point, I would say that it’s “easy” rather than strictly helpful. There is a plugin I maintain internally that can be invoked by calling “helm <thing>” if I go through the necessary steps to have it installable by the helm plugin command. Under the hood it’s just a small binary that you can put in your $PATH and it’ll work fine, but there are tons of developers and PMs and other people at the company who don’t know what a path variable is, or how to set it, or what a terminal is, or what shell they’re running, or who know that they can do “helm X” and “helm Y”, so why not “helm Z” for my plugin, etc … It would be a hell of a lot easier to just ship the raw executable, but to those people and execs and mangers and stuff, it looks good if I can show it off next to the native stuff.
Whenever I have to help users with it, I notice that nearly everyone uses it with helm and not calling by the binary directly. It just comes down to the fact that presentation and perceived ease of use counts for a lot when people evaluate whether they want to make a tool part of their workflow.
For example in /usr/lib/git-core/ with git 2.25.1 on Ubuntu, "git-rebase" is a symlink to "git". But on an older Centos VM I have access to, in /usr/libexec/git-core/ with git 2.16.5, "git-rebase" is a separate shell script.
It also helps make plugins easier to distribute. I don't want to have to type `git-x` sometimes and `git y` others, and if I want my plugin to get adoption, I really really don't want that. So things like git-lfs, git-annex, etc can easily be distributed, documented as a plugin, and generally be considered as "a part of git", rather than a separate command.
This pattern is also not unique to git. Other softwares have followed it, notably cargo.
have you never used git over ssh?
I don't know the extent to which Landlock or even unveil would have helped here; maybe they would have only prevented the hook from running during the clone, but not subsequently when it's expected trusted hooks to run. But I'd bet adding unveil support to Git would be an easier task, notwithstanding (or even because of) the way Git invokes subprocesses.
* https://jdebp.uk/FGA/qmail-myths-dispelled.html#MythAboutBar...
"that may not be the most sensible advice now", says M. Leadbeater today. We were saying that a lot more unequivocally, back in 2003. (-:
As Mark Crispin said then, the interpretations that people put on it are not what M. Postel would have agreed with.
Back in the late 1990s, Daniel J. Bernstein did the famous analysis that noted that parsing and quoting when converting between human-readable and machine-readable is a source of problems. And here we are, over a quarter of a century later, with a quoter that doesn't quote CRs (and even after the fix does not look for all whitespace characters).
Amusingly, git blame says that the offending code was written 19 years ago, around the time that Daniel J. Bernstein was doing the 10 year retrospective on the dicta about parsing and quoting.
* https://github.com/git/git/commit/cdd4fb15cf06ec1de588bee457...
* https://cr.yp.to/qmail/qmailsec-20071101.pdf
I suppose that we just have to keep repeating the lessons that were already hard learned in the 20th century, and still apply in the 21st.
Absolutely, in particular the "Be conservative in what you do" would have prevented this bug.
See my discussion here: https://dwheeler.com/essays/fixing-unix-linux-filenames.html
One piece of good news: POSIX recently added xargs -0 and find -print0, making it a little easier to portably handle such filenames. Still, it's a pain.
I plan to complete my "safename" Linux module I started years ago. When enabled, it prevents creating filenames in certain cases such as those with control characters. It won't prevent all problems, but it's a decent hardening mechanism that prevents problems in many cases.
Control characters in filenames have no obviously valuable use case, they appear to be allowed only because "it's always been allowed". That is not a strong argument for them. Some systems do not allow them, with no obvious ill effects.
It seems like it would be appropriate to make it clear "this is important now" vs "don't worry you probably already patched this" in the headline to save our time for those who aren't just reading these posts out of interest.
>git version 2.39.5 (Apple Git-154)
[1]: https://launchpad.net/ubuntu/+source/git/1:2.43.0-1ubuntu7.3
[2]: https://oss-security.openwall.org/wiki/mailing-lists/distros
I think even Microsoft have noticed this, which is why WSL exists to provide an island of Linux-flavored open source tooling inside a Windows environment.
Directory separators are another can of worms. A lot of functionality in Windows is driven by command-line invocations taking slash-prefixed options, where it’s crucial that they are syntactically distinct from file system paths. I don’t think a transition is possible without an unacceptable amount of compatibility breakage.
For example something like SMTP can be modified from "lines must end in \r\n" to "lines must end in \n, and trailing \r must be ignored". All existing SMTP will continue to work, although it may be a while before all servers reliably support \n of course.
But it seems like almost no distributions have patched it yet
https://security-tracker.debian.org/tracker/CVE-2025-48386 (debian as an example)
And the security advisory is from yesterday: https://github.com/git/git/security/advisories/GHSA-4v56-3xv...
Did git backdate the release?
therealmarv•7mo ago
leipert•7mo ago
Fishkins•7mo ago
therealmarv•7mo ago