The curious case of shell commands, or how "this bug is required by POSIX" (2021)

https://notes.volution.ro/v1/2021/01/notes/502e747f/

151•wonger_•2d ago

Comments

degamad•2d ago

(2021)

o11c•2d ago

This is woefully misguided. Half the time passing it to the shell is explicitly a feature, e.g. `popen("gzip > foo.gz")`. If you have user input you should always sanitize it regardless of API.

But `ssh` does deserve all the shame. It's a pity the real problems are hard to find in an article full of nonsense.

Note also that if you're using a deficient shell that supports neither `printf %q` nor `${var@Q}` it's still easy to do quoting in `sed`. GNU `./configure` scripts do this internally, including special-casing to only quote the right side of `--arg=value`.

hello_computer•2d ago

> article full of nonsense

Pls elaborate. Seems like a decent list of shell gotchas to me.

bee_rider•2d ago

It is kind of a journey in an annoying way. Like, do we really need to know all the stuff about: this man page says to sanitize input, this one doesn’t, blah blah blah.

Or, let’s just look at an excerpt, here’s the section “proper solution:”

I’ve emphasized the actionable advice.

> The proper solution would be dropping that broken tool immediately, securely erasing it from your hard-drive, then running and screaming that tool's name out-loud in shame... (Something akin to Game of Throne's walk of atonement...)

Joke

> I'm not kidding... This kind of broken tools are the cause of many stupid bugs, ranging from the funny ups-rm-with-spaces (i.e. rm -Rf / some folder with spaces /some-file), to serious security issues like the formerly mentioned shellshock...

Joke/contentless stakes raising.

> So, you say someone holds you at gun point, thus you must use that tool? Check if the broken tool doesn't have a flag that disables calling sh -c, and instead properly executes the given command and arguments directly via execve(2). (For example watch has the -x flag as mentioned.)

Here it is, the paragraph that has something!

> Alternatively, given that most likely the tool in question is an open-source project written by someone in his spare time, perhaps open a feature request describing the issue, and if possible contribute with a patch that solves it.

This doesn’t seem practically actionable, at least in the short term—most projects might ignore your patch, or maybe it will take multiple years to get pushed out to distros.

> Still no luck? Make some popcorn and prepare for the latest block-buster "convoluted solutions for simple problems in UNIX town"...

Dramatic buildup/joke.

steamrolled•2d ago

The original post asserted the article is nonsense; you're trying to justify that by saying you don't like the author's writing style. Two separate things...

The article is mostly correct, although it makes some weird claims (e.g., the Shellshock bug had nothing to do with the class of bugs the article is complaining about - it was a vulnerability in the shell itself). It definitely has a "newcomer hates things without understanding why they are the way they are" vibe, but you actually need that every now and then. The old-timers tend to say "it was originally done this way for a reason and if you're experienced enough, you know how to deal with it", but what made sense 30-40 years ago might not make much sense today.

theamk•2d ago

I dunno, "mostly" normally means some large fraction, maybe 50% or 90% depending on the person.. Given that executing commands by itself is neither a bug nor a security vulnerability (those only occur from bad/lack-of quoting), the majority of the article is wrong.

akdev1l•2d ago

> Note also that if you're using a deficient shell that supports neither `printf %q` nor `${var@Q}` it's still easy to do quoting in `sed`. GNU `./configure` scripts do this internally, including special-casing to only quote the right side of `--arg=value`.

With the assumption that:

1. The person knows to do this weird thing 2. They do it consistently every time 3. They never forget

Also not sure how to use those solutions for the popen() example you provided.

The correct way is:

    subprocess.run([
       "gzip",
       "-c",
       "—-to-stdout",
       user_input
    ], stdout=open("foo.gz"))

And now I don’t have to worry about any of these weird things

panzi•2d ago

You need to ensure that user_input doesn't start with `-`. You can do that by forcing an absolute path. Some programs accept `--` as a marker that any arguments after that are non-options.

pwdisswordfishz•19h ago

No need for an absolute path, just a './' prefix.

account42•16h ago

Unless the path was already absolute.

panzi•3h ago

Yes, but running it through a function that turns it into an absolute path is a way to ensure that. A way. An easy way.

theamk•2d ago

The idea is you have some 3rd-party app, which might accept a parameter and pass it to popen as-is, something like "--data-handler=command"

In this case, current "popen" semantics - a single shell command - works pretty well. You can pass it a custom process:

    --data-handler="scripts/handle_data.py"

or a shell fragment:

    --data-handler="gzip -c > last_data.gz"

or even mini-shell script:

    --data-handler "jq .contents | mail -s 'New data incoming' data-notify"

this is where the "shell command" arguments really shine - and note you cannot simulate all of this functionality with command vector.

akdev1l•2d ago

Yes, in this specific use case you need a shell.

But that’s the same as saying you technically need SQL injection so that `psql -c 'command'` can work

> you cannot simulate all of this with command vector

Uhh, yes we can just call a shell:

    subprocess.run(["bash", "-c", data_handler])

As a bonus this way we get control of which shell is being used and I find it is more explicit so I prefer it

oguz-ismail•1d ago

> subprocess.run(["bash",

Not the same thing, this is vulnerable to $PATH interception. You can hardcode the path to bash to avoid that but there's no guarantee that it'll always be there. system() on the other hand is guaranteed to run the operating system's command interpreter.

akdev1l•15h ago

Yes the user controls the path and in the example provided they could just call whatever command they want anyway

This doesn’t give the attacker any access that they wouldn’t have.

Also you can just clobber the “PATH” variable if it is so inclined.

> system() on the other hand is guaranteed to run the operating system’s command intepreter

Yeah that just it is means less predictable.

Please show me python programs which support the pattern shown by the parent post and which actually work when running under powershell. (Or Oil shell or any other non-POSIX shell)

Aside: `/bin/sh` is guaranteed to exist by POSIX

oguz-ismail•15h ago

> `/bin/sh` is guaranteed to exist by POSIX

Quite the contrary

> Applications should note that the standard PATH to the shell cannot be assumed to be either /bin/sh or /usr/bin/sh, and should be determined by interrogation of the PATH returned by getconf PATH,ensuring that the returned pathname is an absolute pathname and not a shell built-in.

https://pubs.opengroup.org/onlinepubs/9799919799/utilities/s...

pwdisswordfishz•19h ago

    subprocess.run(["bash", "-c", "--", data_handler])

The very thing TFA complains about.

akdev1l•15h ago

Do you think using `psql -c "SELECT 1"` is actually doing sql injection?

Because yeah if your program provides “invoking the shell as a feature” then it sure as fuck needs to invoke the shell. I was just replying to this far-fetched example.

By the way, I think it is still better to do this than calling system because if I read “run([bash” I know the developer meant to do this explicitly. If I read “system()” then I’m probably gonna assume they were just lazy and probably didn’t even know about the extra shell being invoked. (I also said this in my previous comment, please read before replying)

rcxdude•22h ago

If you're sanitizing, you're losing. You need to either have a) a watertight escaping process or b) a format that doesn't mix the code and data in the first place (notably, shell lacks either).

Wicher•2d ago

For SSH specifically (ssh user@host "command with args") I've written this workaround pseudoshell that makes it easy to pass your argument vector to execve unmolested.

https://crates.io/crates/arghsh

theamk•2d ago

Note that at least in python, you can use "shlex.quote" instead - it's in stdlib and does not need any extra tools.

    >>> import subprocess
    >>> import shlex
    >>> subprocess.run(['ssh', 'host', shlex.join(['ls', '-la', 'a filename with spaces'])])
    ls: cannot access 'a filename with spaces': No such file or directory

works nested, too

    >>> layer2 = ['ls', '-la', 'a filename with spaces']
    >>> layer1 = ['ssh', 'host1', shlex.join(layer2)]
    >>> layer0 = ['ssh', 'host0', shlex.join(layer1)]
    >>> subprocess.run(layer0)

(I am not sure if Rust has equivalent, but if it does not, it's probably easy to implement.. Python version is only a few lines long)

eternauta3k•2d ago

This just confirms my habit of switching to python as soon as a shell script reaches any level of complexity

steveklabnik•2d ago

> I am not sure if Rust has equivalent

Not in the standard library, but there are packages.

CGamesPlay•1d ago

Wrong! SSH is very much the worst: it uses the user's login shell, not sh -c. So if the user's login shell isn't POSIX compatible, it still fails!

   >>> subprocess.run(["fish", "-c", shlex.join(["echo", "this isn\\'t working"])])
   fish: Unexpected end of string, quotes are not balanced
   echo 'this isn\'"'"'t working'

theamk•8h ago

Well, you gotta draw the line somewhere, right? You can ssh into all sort of weird places, like native windows machines, or routers which expose their own shell, and you cannot expect them to be as usable as the regular ones.

The systems with non-POSIX non-interactive shell are firmly in the "special" category. If a user decided to set their _non_interactive_ shell to fish - they know they are heading for trouble and should not be surprised. I would not worry about such users in my scripts for example.

CGamesPlay•6h ago

Please correct me if I'm am wrong, but POSIX doesn't define a non-interactive or interactive shell, it only defines a login shell. You can't "set your non-interactive shell", only "set your login shell". OpenSSH could easily have decided to "join all arguments with spaces and pass them to sh -c", which would also have been a bad decision for the reasons listed in this article, but instead chose the even more esoteric choice of using the login shell, even when running non-interactive commands.

hackernudes•2d ago

I think this is a topic that every Linux user eventually stumbles into. It is indeed quite frustrating.

I found the article hard to follow, but maybe because I was already familiar with the problem and was just skimming. Skip to "Some experiments..." for the actual useful examples.

I disagree with the conclusion, though. I think there should just be more obvious ways to escape the input so one can keep their sanity with nested 'sh -c' invocation. Maybe '${var@Q}' and print '%q' are enough (can't believe I didn't know those existed!)

panzi•2d ago

I knew print '%q' but not ${var@Q}, so that is good to know!

mrspuratic•2d ago

A Long Time Ago I used to admin Apache httpd (back when "Apache" meant "httpd") before it could self-chroot. One of the issues when you did a manual chroot was piped logs (|rotatelogs) was invoked via "/bin/sh -c". I wrote a stub "sh" that allowed only "sh -c command ..." which it passed to execv(). Just primitive [ \t] argument splitting, no funny business, and ideally statically linked. Also worked well with PHP (e.g. SquirrelMail invoking, er, sendmail).

hello_computer•2d ago

There are so many neo-shells that go crazy with colors, autocompletions, & SQL-like features while the most basic problems (like handling of newlines/spaces/international chars) are mostly swept under the rug with -null/-print0, which is more hack than solution. I think Tom Duff's rc shell was an excellent start in that direction, which sadly went nowhere.

chubot•2d ago

YSH addresses the "string safety" problem:

What is YSH? https://oils.pub/ysh.html

I am writing a quoting module now, but the key point is that it's a powerful enough language to do so. It is more like Python or JS; you don't have to resort to sed to parse and quote strings.

I posted the quote-argv solution above -- in YSH it will likely be:

    var argv = :| ls 'arg with space' |   # like bash argv=()
    ssh example.com $[quote.sh(argv)]

But you can write such a function NOW if you like

---

quote.sh follows the (subtle) idiom of replacing a single quote ' with

    '\''

which means it works on systems with remote POSIX sh, not just YSH !

e.g. "isn't" in POSIX shell is quoted as

    'isn'\''t'

which is these three word parts:

    'isn' \' 't'

YSH also has:

- JSON, which can correctly round trip every Unicode string, without writing your own parsing functions

- JSON8, an optional extension that can round trip every byte string you get from the Unix kernel

https://oils.pub/release/latest/doc/j8-notation.html

hello_computer•2d ago

I like it. Hope it gets some traction.

chubot•2d ago

It also fixes the problem with eval that is shared with ssh:

    ysh-0.29$ eval ls $dir
      eval ls $dir
      ^~~~
    [ interactive ]:11: 'eval' requires exactly 1 argument

And it fixes word evaluation

YSH Doesn't Require Quoting Everywhere - https://www.oilshell.org/blog/2021/04/simple-word-eval.html

https://oils.pub/release/latest/doc/simple-word-eval.html

o11c•2d ago

To be fair, most of those "basic problems" have basic solutions as long as you're not trying to avoid GNU tools.

The nastiest case is probably `globasciiranges`.

panzi•2d ago

Yeah, system() should definitely be deprecated and you should never use it if you write any new program. At least there is exec*() and posix_spawn() under POSIX. Under Windows there is no such thing and every program might parse the command line string differently. You can't naively write a generic posix_spawn() like interface for Windows, see this related Rust CVE: https://blog.rust-lang.org/2024/04/09/cve-2024-24576/ Why is it a CVE in Rust, but not in any other programming language? Did other language handle it better? Dunno, I just know that Rust has a big fat warning about this in their documentation (https://doc.rust-lang.org/std/process/struct.Command.html#me...), but e.g. Java doesn't (https://docs.oracle.com/javase/8/docs/api/java/lang/ProcessB...).

Galanwe•2d ago

I don't understand what you are complaining about. I don't understand what the article is complaining about either.

exec* are not "better replacements" of the shell, they are just used for different use cases.

The whole article could be summarized to 3 bullet points:

1) Sanitize your inputs

2) If you want to execute a specific program, exec it after 1), no need for the shell

3) Allow the shell if there is no injection risk

panzi•2d ago

I'd say: Don't use the shell if what you want to do is to execute another program.

You don't need to handle any quoting with exec*(). You still need to handle options, yes. But under Windows you always have to to handle the quoting yourself and it is more difficult than for the POSIX shell and it is program dependent. Without knowing what program is executed you can't know what quoting syntax you have to use and as such a standard library cannot write a generic interface to pass arguments to another process in a safe way under Windows.

I just felt it sounded like POSIX is particularly bad in that context, while in fact it is better than Windows here. Still, the system() function is a mistake. Use posix_spawn(). (Note: Do not use _spawn*() under Windows. That just concatenates the arguments with a space between and no quoting whatsoever.)

oguz-ismail•2d ago

>Still, the system() function is a mistake. Use posix_spawn().

They are entirely different interfaces though. If you'd implemented system() using posix_spawn() it'd be just as bad as system()

panzi•2d ago

Why would you implement system() at all?

oguz-ismail•2d ago

Because I don't want to implement a shell???

panzi•2d ago

If you want to run a shell script, run a shell script. I.e. a text file with the executable bit set and a shebang. If you want to generate a shell script on the fly to then run it, take a step back and think about what you're doing.

oguz-ismail•1d ago

Or I can just use system() because it's there and not going anywhere

theamk•2d ago

parse commands from config file? command-line arguments for hooks?

https://news.ycombinator.com/item?id=44239036

panzi•2d ago

I understand that it is convenient for running small snippets like that, but I don't really think it's worth the risk. And putting it into a config file is different, IMO. You don't get tempted to do some bad string interpolation there, because you can't, unless the config file format has support for that, but then I criticize that. If you need to pass things to such a snipped do it via environment variables or standard IO, not string interpolation.

If you say you don't make such mistakes: Yeah, but people do. People that write the code that runs on your system.

theamk•1d ago

But if you want a command-line option for hook, what are the alternatives?

Force user to always create a wrapper script? that's just extra annoyance and if user is bad at quoting, they'll have the same problems with a script

Disable hooks at all? that's bad functionality regression

Ask for multiple arguments? this makes command-line parsing much more awkward.. I have not seen any good solutions for that.

(The only exception is writing a command wrapper that takes exactly 1 user command, like "timeout" or "xargs".. but those already using argument vector instead of parsing)

frumplestlatz•1d ago

You define a config file format that supports only the minimal syntax required to specify a multi-argument command (e.g. spaces separate arguments, arguments with spaces in them may be quoted or use backslashes to escape them).

Then, you parse that out into a proper argument array and pass it to exec*/posix_spawn.

account42•16h ago

So instead of a well known (i.e. POSIX) quoting semantics and existing tool support, you want to introduce your own ad-hoc format? No thanks.

frumplestlatz•11h ago

A correct parser for the syntax I described can be written in less than a 100 lines of code — even in C. It’s a strict subset of the shell command language defined by POSIX, and it’s sufficiently expressive as to support specifying any argument array unambiguously.

To correctly escape arbitrary shell syntax, not only do you need to handle the full POSIX syntax (which is quite complex) …

https://pubs.opengroup.org/onlinepubs/9799919799/utilities/V...

… but you must also cover any bugs and undocumented/underspecified extensions implemented by the actual shell providing /bin/sh on every platform and platform version to which your code will be deployed.

That’s not just difficult — it’s impossible, and everyone that has tried has failed, repeatedly. Leading to security bugs, repeatedly.

https://gist.github.com/Zenexer/40d02da5e07f151adeaeeaa11af9...

There’s a reason why we use parameterized queries instead of escaping to prevent SQL injection, and SQL syntax and parsing behavior is far more rigorously specified than the shell.

jcranmer•2d ago

The article spends a lot of time dancing around its central points rather than addressing them directly, but the basic problems with shell boil down to this:

There's two ways to think of "running a command:"

1. A list of strings containing an executable name (which may or may not be a complete path) and its arguments (think C's const char **argv).

2. A single string which is a space-separated list of arguments, with special characters in arguments (including spaces) requiring quoting to represent correctly.

Conversion between these two forms is non trivial. And the basic problem is that there's a lot of tools which incorrectly convert the former to the latter by just concatenating all of the arguments into a single string and inserting spaces. Part of the problem is that shell script itself makes doing the conversion difficult, but the end effect is that if you have to with commands with inputs that have special characters (including, but not limited to, spaces), you end up just going slowly insane trying to figure out how to get the quoting right to work around the broken tools.

In my experience, the world is so much easier if your own tools just break everything up into the list-of-strings model and you never to try to use an API that requires single-string model.

What GP is referring to is the fact that that solution doesn't work as well on Windows, because the OS's native idea of a command line isn't list-of-strings but rather a single-string, and how that single string is broken up into a list-of-strings is dependent on the application being invoked.

theamk•2d ago

I think "non trivial" and "slowly going insane" parts only happen if you don't have right tools, or not using POSIX-compatable system.

In python you have "shlex.quote" and "shlex.join". In bash, you have "${env@Q}". I've found those to work wonderfully to me - and I did crazy things like quote arguments, embed into shell script, quote script again for ssh, and quote 3rd time to produce executable .sh file.

In other languages.. yeah, you are going to have bad time. Especially on Windows, where I'd just give up and move to WSL.

jcranmer•2d ago

To be honest, I've never heard of Bash's @Q solution before today--I can't find it in https://tldp.org/LDP/abs/html/, which is my usual goto guide for "how do I do $ADVANCED_FEATURE in bash?"

o11c•2d ago

To be fair that's missing a lot. I'm not sure how much is just showing its age and how much it never had. The actual bash manual is quite informative.

In particular, failure to mention `printf -v` is horrible. Not only is it better performing than creating a whole process for command substitution, it also avoids the nasty newline problem.

LukeShu•19h ago

`printf -v` was added in Bash 3.1 (2005). I think revisions of ABS predates that; but ABS has certainly been updated since then (last in 2014), and has no excuse for not including it.

LukeShu•19h ago

@Q was added in Bash 4.4 (2016), ABS was last updated in 2014.

mrheosuper•22h ago

if every developer can follow best practice, we won't need Rust.

tedunangst•2d ago

https://flatt.tech/research/posts/batbadbut-you-cant-securel...

panzi•2d ago

Yeah, especially the thing about variable substitution is insane. How can you mess this up so thoroughly!? Appendix B is a nice overview. I'm amazed that Java doesn't even mention this in their documentation!

steamrolled•2d ago

The main reason system() exists is that people want to execute shell commands; some confused novice developers might mix it up with execl(), but this is not a major source of vulnerabilities. The major source of vulnerabilities is "oh yeah, I actually meant to execute shell".

So if you just take away the libcall, people will make their own version by just doing execl() of /bin/sh. If you want this to change, I think you have to ask why do people want to do this in the first place.

And the answer here is basically that because of the unix design philosophy, the shell is immensely useful. There are all these cool, small utilities and tricks you can use in lieu of writing a lot of extra code. On Windows, command-line conventions, filesystem quirks, and escaping gotchas are actually more numerous. It's just that there's almost nothing to call, so you get fewer bugs.

The most practical way to make this class of bugs go away is to make the unix shell less useful.

rcxdude•22h ago

most calls to system() that I've seen could be replaced with exec without much difficulty. There's relatively few that actually need the shell functionality.

oguz-ismail•21h ago

system() involves fork()ing, setting up signal handlers, exec()ing and wait()ing. You won't be replacing it with exec, most of the time you'll be reimplementing it for absolutely no reason.

kragen•16h ago

Python has os.spawnl, os.spawnv, etc., which fork()s, wait4()s, etc., without involving a shell. This is much better; this is the library function you should be using instead of system() most of the time. Unfortunately I don't think glibc has an equivalent!

    strace -o tmp.spawnlp -ff python3 -c 'import os; os.spawnlp(os.P_WAIT, "true", "true")'

In parent:

    clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdc03233310) = 225954
    wait4(225954, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 225954
    --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=225954, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---

In child:

    set_robust_list(0x7fdc03233320, 24)     = 0
    gettid()                                = 225954
    clock_gettime(CLOCK_MONOTONIC, {tv_sec=2458614, tv_nsec=322829153}) = 0
    clock_gettime(CLOCK_MONOTONIC, {tv_sec=2458614, tv_nsec=323030718}) = 0
    execve("/usr/local/bin/true", ["true"], 0x7ffdc5008458 /* 44 vars */) = -1 ENOENT (No such file or directory)
    execve("/usr/bin/true", ["true"], 0x7ffdc5008458 /* 44 vars */) = 0

Here, I think strace shows clone() rather than fork() because glibc's fork() is a library function that invokes clone(), rather than a real system call.

oguz-ismail•16h ago

> Python has os.spawnl, os.spawnv, etc., which fork()s, wait4()s, etc., without involving a shell.

Good. How do you pipeline commands with these?

kragen•15h ago

These functions can't do it. In Python you have to use the subprocess module if you want to pipeline commands without the bugs introduced by the shell. From https://docs.python.org/3.7/library/subprocess.html#replacin...:

    p1 = Popen(["dmesg"], stdout=PIPE)
    p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
    p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
    output = p2.communicate()[0]

Of course, now, nobody has an hda, and dmesg is root-only. A more modern example is in http://canonical.org/~kragen/sw/dev3/whereroot.py:

    p1 = subprocess.Popen(["df"], stdout=subprocess.PIPE)
    p2 = subprocess.Popen(["grep", "/$"], stdin=p1.stdout, stdout=subprocess.PIPE)
    p1.stdout.close()
    return p2.communicate()[0]

Note that the result here is a byte string, so if you want to print it out safely without the shell-like bugginess induced by Python's default character handling (what happens if the device name isn't valid UTF-8?), you have to do backflips with sys.stdout.buffer or UTF-8B.

Python got a lot of things wrong, and it gets worse all the time, but for now spawning subprocesses is one of the things it got right. Although, unlike IIRC Tcl, it doesn't raise an exception by default if one of the commands fails.

Apart from the semantics of the operations, you could of course desire a better notation for them. In Python you could maybe achieve something like

    (cmd(["df"]) | ["grep", "/$"]).output()

but that is secondary to being able to safely handle arguments containing spaces and pipes and whatnot.

oguz-ismail•15h ago

Dunno, so much work to achieve so little. I'm even more inclined to stick with shell scripts now

kragen•14h ago

The Bourne shell is definitely less work unless you want your code to correctly or reliably handle user input. Then it's more work.

oguz-ismail•13h ago

Not in my experience. Any concrete examples off the top of your head, where it's more work than setting up pipes in python manually?

hollerith•9h ago

The Python code Kragen gave is more characters to type, but fewer footguns.

Shell scripts are much higher in footguns per character than most programming languages.

It is possible for a coder to understand bash so well that he never shoots his own foot off, but it requires more learning hours than the same feat in another language requires, and unless I've also put in the (many) learning hours, I have no way of knowing whether a shell script written by someone I don't know contains security vulnerabilities or fragility when dealing with unusual inputs that will surface in unpredictable circumstances.

The traditional Unix shell might be the most overrated tool on HN.

panzi•3h ago

There is posix_spawn(). Some operating systems even implement that as a system call (not Linux). Implementing that as a system call has the advantage that spawning a new process from a process that has huge memory mapping is fast, because the memory mappings don't need to be copied (yes, I know the memory is copy on write, but the mappings themselves have to be correctly copied with the information needed for copy on write).

twic•15h ago

Java has a bunch of code which looks like it's trying to do the right kind of escaping for msvcrt vs cmd.exe:

https://github.com/openjdk/jdk/blob/jdk-26%2B1/src/java.base...

But i would be lying if i said i understood what was going on there. Some googling suggests this was added around 1.7, ie in the early 2010s.

But then, that Rust CVE seems to originate in this work, and this guy claims Java said "won't fix", which suggests it is vulnerable:

https://flatt.tech/research/posts/batbadbut-you-cant-securel...

But there's no link, and i can't find any discussion about it, so i don't know what the actual situation is.

panzi•3h ago

Yeah, part of the problem is how Windows does variable substitution before the command line syntax is parsed, and at a glance I don't see any % in that file.

chubot•2d ago

The article mentioned printf '%q ', but it is a bit hard to find. Here is a handy way to remember it.

First, define this function:

    quote-argv() { printf '%q ' "$@"; }
    # (uses subtle vectorization of printf over args)

Now this works correctly:

    ssh example.com "$(quote-argv ls 'file with spaces')"
    ls: cannot access 'file with spaces': No such file or directory

In contrast to:

    $ ssh example.com ls 'file with spaces'
    ls: cannot access 'file': No such file or directory
    ls: cannot access 'with': No such file or directory
    ls: cannot access 'spaces': No such file or directory

And yes the "hidden argv join" of ssh is VERY bad, and it is repeated in shell's eval builtin.

They should both only take a SINGLE arg.

It is basically a self-own because spaces are an OPERATOR in shell! (the operator that separates words)

When you concatenate operators and variables, then you are mixing code and data, which is a security problem.

---

As for the exec workaround, I think this is also deficiency of shell. Oils will probably grow an 'invoke' builtin which generalizes 'command' and 'builtin', which are non-orthogonal.

'command true' means "external or builtin" (disabling shell function lookup), but there should be something that means "external only".

o11c•2d ago

Use ' %q' and you also fix the problem of program names starting with a dash.

chubot•2d ago

Ah yes, that's clever:

    $ sh -c "$(quote-argv -echo 'file with spaces')"
    sh: 0: Illegal option -h

    $ sh -c "$(quote-argv-left -echo 'file with spaces')"
    sh: 1: -echo: not found

Over ssh:

    $ ssh example.com "$(quote-argv-left -dashtest 'file with spaces')"
    -dashtest
    file with spaces

blueflow•2d ago

> hidden argv join

It is not hidden. It is written down in plain sight:

  A complete command line may be specified as command, or it may have additional arguments. If supplied, the arguments will be appended to the command, separated by spaces, before it is sent to the server to be executed.

- third line in `man 1 ssh`

chubot•2d ago

It's hidden in the sense that it creates ambiguity at the usage site. Compare with sudo:

    $ sudo ls 'file with spaces'
    ls: cannot access 'file with spaces': No such file or directory

If ssh (and sh eval) did not accept multiple arguments, then this wouldn't even get to ls:

    $ ssh example.com ls 'file with spaces'
    ls: cannot access 'file': No such file or directory
    ls: cannot access 'with': No such file or directory
    ls: cannot access 'spaces': No such file or directory

Accepting argv is better. Or forcing this is better:

    $ ssh example.com "ls 'file with spaces'"

So it's clear it's a single shell string.

Accepting a shell string is sometimes OK, but silently joining multiple args is useless, and insecure.

"RTFM" is not a good answer when security is involved.

blueflow•2d ago

This stubborn attitude to refuse to consult the documentation at all and then expect the tool to work according to your preconceptions.

Tools do have rough edges, if you don't want to learn about them, you will get bitten.

chubot•2d ago

It’s a design mistake because it adds exactly zero functionality.

The only thing it adds is insecurity.

If the feature didn’t exist, then it wouldn’t need to be documented, and the world would be better.

nothrabannosir•2d ago

This statement can be true without contradicting anything anyone said upstream. Otherwise could use it to justify just about any bad design decision.

Yes it’s in the docs. Yes people who carefully read the docs won’t get bitten. Also yes the design could be improved so people don’t make this mistake even without reading the docs.

Both things can be true. We’re currently only talking about the latter, though.

blueflow•1d ago

> We’re currently only talking about the latter, though.

I'm surprised, as i started this subthread explicitly to contest that the argv join is "hidden".

immibis•22h ago

This very stubborn attitude to defend a bad design because it's documented.

Bugs can be fixed.

blueflow•19h ago

It is bad design, but your idea of something does not make anything non-conforming a bug.

pwdisswordfishz•19h ago

> Tools do have rough edges, if you don't want to learn about them, you will get bitten.

I presume you consider INTERCAL to be a sanely designed programming language.

blueflow•18h ago

I'm not defending SSH's design, im criticizing peoples unwillingness to learn about the design as it is so they can work around it.

Edit: The INTERCAL handbook is a great read, and despite being satirical, it is more detailed and qualified than the documentation of some other popular projects.

scbrg•17h ago

That's honestly not particularly clear. It doesn't say the command will be invoked by a shell on the remote host. Sure the whole "separated by spaces" thing sorta implies it will, as spaces don't mean much to anything but a shell, but it's still fairly vague.

In fact, later on the man page only mentions a shell in the part that talks about the behavior when no additional arguments are given:

  When the user's identity has been accepted by the server, the server either executes the given command in a non-interactive session or, if no command has been specified, logs into the machine and gives the user a normal shell as an interactive session.

The wording "executes the given command" would generally not imply "I'll just throw it at $SHELL and see what happens".

A few lines later it gets even more confusing:

  The session terminates when the command or shell on the remote machine exits and all X11 and TCP connections have been closed.

...which I definitely would say suggests that either a shell is executed or the command supplied as argument to ssh. That it means "command as interpreted by a shell on the remote host" is far from obvious.

blueflow•9h ago

> The wording "executes the given command" would generally not imply "I'll just throw it at $SHELL and see what happens".

"command" means exactly that. Evaluation by shell. With that in mind, the manual page should read less ambiguous to you.

I actually don't have a good source for that, but you can check the execve(2) manpage. If command would refer to the execution of an argument vector, it would have been mentioned in there.

The other meaning of "command" refers to specific programs like those in /bin.

blueflow•2d ago

Its not the manual pages that are ambiguous on this, its the author who used the word "command" but seemingly had a mental model of it as if it was an argument vector. A command and an argument vector are different things....

a_t48•2d ago

I've definitely gone down the rabbit hole of trying/being forced to fix issues like this. It starts off as just someone taking a shortcut of doing a little shell scripting in a python program or whatever. Generally the best tool I've found for fixing this is python's shlex.quote - https://docs.python.org/3/library/shlex.html but YMMV (multiple levels may be needed). The real best solution is not to shell out from your program when possible. :)

endiangroup•2d ago

AD: Huh! I just wrote a utility cmd [1] this weekend to deal with restricting ssh keys to executing only commands that match a rule set via `ForceCommand` in `sshd_config` or `Command=""` in `authorized_keys`. I'm curious to see how susceptible it is to the aforementioned issues, it does delegate to `<shell> -c '<cmd>'` under the hood [2], but there are checks to ensure only a single command option argument `--` is passed (to mitigate metacharacter expansions) [3].

Note this tool is only intended to be another layer in security.

[1] https://github.com/endiangroup/cmdjail [2] https://github.com/endiangroup/cmdjail/blob/main/main.go#L30... [3] https://github.com/endiangroup/cmdjail/blob/main/config.go#L...

blueflow•2d ago

The docs say that exec.Command works with execv directly, so there should be no issue? You dont seem to call out to /bin/sh at all.

pabs3•1d ago

Note that OpenSSH always runs commands in a shell, and so far they refused to add support for exec.

https://bugzilla.mindrot.org/show_bug.cgi?id=2283

kouteiheika•2d ago

> Wall of shame: Ruby's backtick feature -- provides easy access to system(3);

It also provides easy access to escape whatever arguments you want to pass:

    out = `bash -c #{arg.shellescape}`

...here "arg" will be always passed as a single argument.

1vuio0pswjnm7•2d ago

"bash, obviously for scripting;"

It's possible to use bash for both interactive use and scripting. For example, this author claims to use bash as his scripting shell.

But Debian and the popular Debian-derived distributions do not use bash for scripts beginning with "#!/bin/sh", i.e., "shell scripts".

The interactive shell may be bash, but the scripting shell, /bin/sh, is not bash.

https://www.man7.org/linux/man-pages/man1/dash.1.html

https://wiki.ubuntu.com/DashAsBinSh

https://wiki.archlinux.org/title/Dash

https://www.oreilly.com/library/view/shell-scripting-expert/... ^1

https://www.baeldung.com/linux/dash-vs-bash-performance

https://en.wikipedia.org/wiki/Almquist_shell

https://lwn.net/Articles/343924/

https://scriptingosx.com/2020/06/about-bash-zsh-sh-and-dash-... ^2

I use an Almquist shell, not bash, for both interactive use and scripting. I often write scripts interactively. I use the same scripts on Linux and BSD. I restored tabcomplete and the fc builtin to dash so it feels more like the shell from which it was derived: NetBSD sh.

1. "This makes it smaller, lighter and faster than bash."

2. "... this is strong indicator that Apple eventually wants to use dash as the interpreter for sh scripts."

o11c•2d ago

"/bin/sh does not point to bash" is not the same as "does not use bash for scripting"

On my system there are 42 scripts in /bin -> /usr/bin (merged) that start with some variant of `#! /bin/bash` and at least two that do `bash -c`, but that's excluding who-knows-how-many scripts that look slightly different or are in other directories.

And keep in mind that on Debian, almost all first-party software is implemented in Perl, with a small minority in Python.

zahlman•2d ago

> No, let's just try it out (I've put both the Python and the plain sh -c invocations):

  > python2 -c 'import os; os.system("-x")'
  > sh -c -x
  sh: -c: option requires an argument

I can't reproduce this in Python (including my local 2.7 build), only using sh directly. Going through Python, `sh` correctly tells me that the `-x` command isn't found.

But now I'm wondering: how is one supposed to use `which` (or `type`, or `file`) for a program named `-x`, supposing one had it?

o11c•2d ago

Use:

  command -v -- -x

The POSIX documentation for almost all commands (including `command`) says "The command utility shall conform to [...] Utility Syntax Guidelines" which specifies the behavior of `--`, even if it's not explicitly mentioned.

akovaski•1d ago

> I can't reproduce this in Python (including my local 2.7 build), only using sh directly.

Same for me. It looks like the POSIX folks accepted the author's suggestion in 2022 and system() in glibc was updated in 2023.

https://sourceware.org/git/?p=glibc.git;a=blobdiff;f=sysdeps...

  #include <stdlib.h>
  int main(void) {
      system("-x");
      return 0;
  }

...

> [pid 172293] execve("/bin/sh", ["sh", "-c", "--", "-x"], 0x7ffe221d2f58 /* 76 vars */) = 0

pabs3•1d ago

Related problems: command-line options that allow code execution[1], and commands that execute arbitrary code from the current directory.

1. https://web.archive.org/web/20201111203646if_/https://www.de...

orbisvicis•19h ago

I don't understand the rsync example with '-e sh shell.c' as short-form options with space-separated values are expected to be two separate args and yet the glob expands it to a single arg. Right? Unless the tool does additional argument processing?

oguz-ismail•19h ago

'-e sh shell.c' as a single argument is the same as '-e' and ' sh shell.c' as separate arguments (see <https://pubs.opengroup.org/onlinepubs/9799919799/functions/g...>). rsync executes ' sh shell.c' as a shell script and shells usually trim leading and trailing spaces when executing commands.

pabs3•1d ago

My 2014 blog post about this same issue:

https://bonedaddy.net/pabs3/log/2014/02/17/pid-preservation-...

ptx•15h ago

> BTW, this is not something Linux specific. Unfortunately it is a trait inherited from the UNIX ancestry by almost all operating systems, including all BSD variants [...] Hmm... Strange... There is nothing to quote from this manual about warnings, issues or sanitization...

This is not a problem on FreeBSD, if the problem is (as the article seems to say) that the documentation fails to warn about the requirement to properly encode arguments passed to the shell.

Here's the FreeBSD man page [1] for system(3):

  SECURITY CONSIDERATIONS
     The system() function is easily misused in a manner that enables a
     malicious user to run arbitrary command, because all meta-characters
     supported by sh(1) would be honored.  User supplied parameters should
     always be carefully santized before they appear in string.

[1] https://man.freebsd.org/cgi/man.cgi?query=system&sektion=3&m...

0xbadcafebee•15h ago

What the author is talking about here is the growing pains of learning how 60-year old *NIX systems work, and shells/shell scripting. Once you learn how it works, it all works fine. But it's not easy to learn. If we had a standard education track for this stuff, it would be easier. (but there would still be people who avoid the education, rush into things, then bump their shins into the coffee table, and blame the coffee table)

GuB-42•3h ago

system() is for running shell commands and the article complains that it runs shell commands...

I rarely use it, and almost never in production, but it has its place. Think of it as the eval() of the POSIX world. If you want to build pipelines, or anything a shell has to offer, and do it simply, then system() is for you.

Security-wise, if you are using system() with user input, you are essentially giving shell access to the user, which may or may not be a big deal. If the intended users are people who already have a shell, that's fine maybe even desitable, otherwise, use something else, like exec*().

As for OpenSSH, what is the problem? The "SH" at the end means "shell", it runs shell commands, what did you expect?

teo_zero•28m ago

In POSIX world you aren't encouraged to use a specific command for each goal you might have. Instead you get a set of basic tools and combine them to achieve the desired result. Without system() you wouldn't have pipes, and you would need one command for every task you might want to run.

Point in case I encountered this week: I was editing a list and wanted to remove duplicates without changing the order of the lines. There's no ready-made program to do that, but this sequence of piped command served the purpose:

  cat -n | sort -uk 2 | sort | cut -f 2-

Fortunately my text editor supports system().

Show HN: I wrote a BitTorrent Client from scratch

Jemalloc Postmortem

Frequent reauth doesn't make you more secure

Rendering Crispy Text on the GPU

Slow and steady, this poem will win your heart

Zero shot forecasting: finding the right foundation model for O11Y forecasting

A Dark Adtech Empire Fed by Fake CAPTCHAs

A receipt printer cured my procrastination

iPhone 11 emulation done in QEMU

Show HN: Tritium – The Legal IDE in Rust

Urban Design and Adaptive Reuse in North Korea, Japan, and Singapore

Three Algorithms for YSH Syntax Highlighting

Show HN: McWig – A modal, Vim-like text editor written in Go

Maximizing Battery Storage Profits via High-Frequency Intraday Trading

The curse of Toumaï: an ancient skull and a bitter feud over humanity's origins

Show HN: Tool-Assisted Speedrunning the Boring Parts of Animal Crossing (GCN)

Rust compiler performance

Major sugar substitute found to impair brain blood vessel cell function

Why does my ripped CD have messed up track names? And why is one track missing?

Worldwide power grid with glass insulated HVDC cables

Solving LinkedIn Queens with SMT

Chatterbox TTS

Roundtable (YC S23) Is Hiring a President / CRO

Microsoft Office migration from Source Depot to Git

First thoughts on o3 pro

Dancing brainwaves: How sound reshapes your brain networks in real time

Helion: A modern fast paced Doom FPS engine in C#

Quantum Computation Lecture Notes (2022)

The Case for Software Craftsmanship in the Era of Vibes

US-backed Israeli company's spyware used to target European journalists

Show HN: I wrote a BitTorrent Client from scratch

Jemalloc Postmortem

Frequent reauth doesn't make you more secure

Rendering Crispy Text on the GPU

Slow and steady, this poem will win your heart

Zero shot forecasting: finding the right foundation model for O11Y forecasting

A Dark Adtech Empire Fed by Fake CAPTCHAs

A receipt printer cured my procrastination

iPhone 11 emulation done in QEMU

Show HN: Tritium – The Legal IDE in Rust

Urban Design and Adaptive Reuse in North Korea, Japan, and Singapore

Three Algorithms for YSH Syntax Highlighting

Show HN: McWig – A modal, Vim-like text editor written in Go

Maximizing Battery Storage Profits via High-Frequency Intraday Trading

The curse of Toumaï: an ancient skull and a bitter feud over humanity's origins

Show HN: Tool-Assisted Speedrunning the Boring Parts of Animal Crossing (GCN)

Rust compiler performance

Major sugar substitute found to impair brain blood vessel cell function

Why does my ripped CD have messed up track names? And why is one track missing?

Worldwide power grid with glass insulated HVDC cables

Solving LinkedIn Queens with SMT

Chatterbox TTS

Roundtable (YC S23) Is Hiring a President / CRO

Microsoft Office migration from Source Depot to Git

First thoughts on o3 pro

Dancing brainwaves: How sound reshapes your brain networks in real time

Helion: A modern fast paced Doom FPS engine in C#

Quantum Computation Lecture Notes (2022)

The Case for Software Craftsmanship in the Era of Vibes

US-backed Israeli company's spyware used to target European journalists

The curious case of shell commands, or how "this bug is required by POSIX" (2021)

Comments