Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

28•rept0id-2•8h ago

Comments

rept0id-2•8h ago

If an overflow is detected, it calls abort(), you crash and get Aborted (core dumped).

This way, you prioritize memory safety over silently running in a wrong state, without needing to call strtol and check errors manually everywhere.

timewizard•7h ago

"This way, you open yourself up to DDoS attacks, instead of just handling your own errors correctly."

kstrauser•7h ago

I prefer this approach. It's kind of like using `.expect()` or `.unwrap()` in Rust when there's no plausible way the call should fail. Like if my program writes out a JSON file and then reads it back in, and the file isn't value JSON, panicking is a reasonable way to deal with the situation. Someone the world got itself into a strange state I can't reasonably recover from.

Well, same here. If you're using strtol on data that must be well-formed, and it's not, and you're at an earlier stage of startup like parsing the config file, go ahead and blow up. That's almost certainly better than plowing ahead with invalid data.

LukeShu•6h ago

> Like if my program writes out a JSON file and then reads it back in, and the file isn't value JSON, panicking is a reasonable way to deal with the situation.

Like how if I hit ctrl-C at just the right moment during the build, the next time I build, cmake will segfault and I have to delete the build directory.

threeducks•7h ago

I think strtol is just a badly designed function. The return value should have been an error code and the actual long should have been "returned" via a pointer. Checking the return value is much easier than checking endptr and errno and remembering to set errno before calling strtol.

The fact that the strtol example in the manual is 50 lines long, of which most is error handling, speaks for itself. https://man7.org/linux/man-pages/man3/strtol.3.html#:~:text=...

That being said, I can't imagine many applications where crashing is a good solution.

wahern•6h ago

OpenBSD added strtonum to <stdlib.h>. It's quite opinionated, but fits the usage patterns OpenBSD developers prefer.

> 50 lines long, of which most is error handling

That's a little exaggerated considering the example is an entire C program, including main. It's more like 14 lines, ignoring the '\0' check (which isn't necessary if you don't want to parse additional items), and even that includes whitespace and stderr logging.

I agree the biggest headache is reliance on errno and the nuanced--albeit conventional, consistent[1], and documented--semantics of only setting errno on error. Some POSIX APIs, at least those defined from scratch (e.g. pthreads, as opposed to incorporated common extensions) prefer returning the error code directly as you recommend. But this would technically only save you a single line, though it might save alot of confusion about semantics.

However, in this case I might keep returning the value directly and take an error pointer, similar to OpenBSD's strtonum. Otherwise you would need separate routines for char, short, int, long, and long long. Though there's still the issue of unsigned integers, and using _Generic it might be possible to at least hide all the type-specific variants behind a single interface.

And there's still the issue of needing to check for trailing garbage, or dropping the ability to use the interface inline with other parsing code. There's alot of dimensions to the problem. Parsing integers may be conceptually simple[2], but designing an interface that's easy to integrate into applications across a variety of contexts is much less simple.

[1] Consistent across libc interfaces, excepting those that may invoke syscalls internally, like printf, where errno may be incidentally modified even on success.

[2] I personally often prefer to just write a simple loop to manually parse integers. It's sometimes easier to integrate the desired error checking inline, or even elide it altogether (garbage-in, garbage-out).

roelschroeven•3h ago

> conventional, consistent[1], and documented--semantics of only setting errno on error.

From that man page:

> The implementation may also set errno to EINVAL in case no conversion was performed (no digits seen, and 0 returned).

There are error cases where the implementation may set errno to EINVAL. There's not even a guarantee. I did a quick test. errno is only set if you pass an invalid base, or if the string does contain a number but it is out of range. If you pass a string which doesn't even remotely look like a number, errno is not set. You have to check endptr.

  #include <errno.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <string.h>

  int main(void)
  {
    const char *s = "this is not a number";
    printf("strtol(\"%s\")\n", s);
    errno = 0;
    long a = strtol(s, NULL, 10);
    printf("errno: %d; strerror: %s\n", errno, strerror(errno));
    printf("result: %d\n", a);
    return 0;
  }

Output:

  strtol("this is not a number")
  errno: 0; strerror: Success
  result: 0

https://godbolt.org/z/TbEKss8zM

That's just rubbish design. In what is in my opinion the most common error case, namely that the string doesn't represent a number while you expect that it does, errno is not set.

spyrja•1h ago

Personally I'd just go the wrapper approach and ignore errno entirely.

  #include <assert.h>
  #include <ctype.h>
  #include <limits.h>
  #include <stdbool.h>
  #include <stdint.h>
  #include <stdlib.h>

  bool copy_str_to_long_long(const char* str, long long* ptr) {
    while (isspace(*str))
      ++str;
    if (*str == 0)
      return false;
    char* parsed = NULL;
    long long value = strtoll(str, &parsed, 0);
    while (isspace(*parsed))
      ++parsed;
    if (*parsed != 0)
      return false;
    *ptr = value;
    return true;
  }

  long long str_to_long_long(const char* str) {
    long long result;
    assert(copy_str_to_long_long(str, &result));
    return result;
  }

  bool copy_str_to_long(const char* str, long* ptr) {
    long long buffer;
    if (!copy_str_to_long_long(str, &buffer) || buffer < LONG_MIN || buffer > LONG_MAX)
      return false;
    *ptr = buffer;
    return true;
  }

  long str_to_long(const char* str) {
    long result;
    assert(copy_str_to_long(str, &result));
    return result;
  }

  /*
    ...define copy_str_to_short, copy_str_to_int32, etc...
  */

gpderetta•6h ago

C can actually return structs by value and small structs are actually handled quite efficiently in some ABIs, so a pair result/error would be quite convenient although I guess not idiomatic.

PaulDavisThe1st•5h ago

> The return value should have been an error code and the actual long should have been "returned" via a pointer.

Oh, you mean like:

    int ret = sscanf (str, "%d", &value);

?

sltkr•5h ago

Yes, that API is actually great, but the problem with (s)scanf() is that reading an invalid value is undefined behavior. So you can't use it if you don't already know the result fits in &value, which is exactly the situation where you'd use strtol() instead.

CamperBob2•4h ago

Presumably 'undefined behavior' means you'll get an undefined int value back -- which you will (of course) range-check -- not that it will wipe out the next 600KB of memory starting at &value or do something similarly hazardous.

roelschroeven•3h ago

No, that's very much not what undefined behavior means. Undefined Behavior (the man page on my system actually capitalizes both words) means that there are no guarantees at all about the behavior of the whole program. It can very much wipe out whole chunks of memory, or crash (not necessarily in or around the sscanf call), or get stuck in an infinite loop, or whatever.

PaulDavisThe1st•2h ago

From the man page on my system:

> Use of the numeric conversion specifiers produces Undefined Behavior for invalid input. See C11 7.21.6.2/10 ⟨https://port70.net/%7Ensz/c/c11/n1570.html#7.21.6.2p10⟩. This is a bug in the ISO C standard, and not an inherent design issue with the API. However, current implementations are not safe from that bug, so it is not recommended to use them. Instead, programs should use functions such as strtol(3) to parse numeric input. This manual page deprecates use of the numeric conversion specifiers until they are fixed by ISO C.

CamperBob2•2h ago

If the CRTL maintainers don't care, why should I? Such behavior is broken, not "undefined."

As the other poster points out, the bug is in the spec, and I'd be astonished if the library function itself actually misbehaves with any given input.

duneroadrunner•4h ago

So I don't write much C code these days, but I recently encountered strtol() again and am I mistaken or does the interface also violate const correctness? I mean it takes a const char* as the first parameter and then gives you back a (non-const) char* potentially pointing into the same string, right? Like, does strtol() get a pass because it's old, or is const correctness (still) not generally a concern of C programmers?

jefftk•4h ago

There are unfortunately a lot of old C library functions that violate const correctness. Consider dirname: https://www.jefftk.com/p/dirname-is-evil

spyrja•4h ago

More than a few C library functions do that kind of thing. Like `strstr`, which takes const strings as arguments but returns a readily modifiable pointer to char. Const-correctness just wasn't on the top of the list when they standardized this stuff, I guess. (Heck, back in those days, most PROGRAMS for that matter weren't written with much care for it.)

wahern•4h ago

It's a consequence of the peculiarity of C type semantics, which disallows implicit conversions of pointer-to-pointer to pointer-to-pointer-to-const. C23 6.5.16.1 EXAMPLE 3 explains why:

  const char **cpp; char *p;
  const char c = ’A’;
  cpp = &p;   // constraint violation
  *cpp = &c;  // valid
  *p = 0;     // valid

  The first assignment is unsafe because it would allow the 
  following valid code to attempt to change the value of the 
  const object c.

There are proposals on the table for C2y to redefine various APIs, including strtol, strchr, memcpy, etc, to preserve const correctness. Implementations might make use of _Generic (there are some issues there, though), newly specified language features, or possibly use internal extensions not available in the language proper, to accomplish this.

tedunangst•4h ago

The idea is that if the input was not const, it's really inconvenient to get a const endptr back out. If your intention is to break your program, there are easier ways to do so than washing the pointer through strtol.

hedora•3h ago

Crashing on error is almost always the right default behavior. If you do anything else by default, then you get in to the land of data corruption and security holes.

If you do this, and then find your thing crashes in production, then that means you didn't test it well enough.

Before someone mentions safety critical systems, consider the fact that the first apollo landing's computer crashed when it detected it was violating realtime bounds ("alarm 1201" means the OS detected CPU exhaustion + rebooted without clearing process state). At that point, it went into a reboot loop, and got close enough to the surface for Neil Armstrong to nudge the lander to a safe landing.

https://apollo11space.com/apollo-11-computer-problem/

bmink•2h ago

I love the Apollo 11 computer stories but by today’s standards it was more of an MCU than a computer. And sure, in the embedded space it is true that in a lot of cases error recovery doesn’t make a whole lot of sense and it makes more sense to reset quickly.

But there are many systems today that take a long time to restart so you can’t just abort if you have a chance to recover.

ksherlock•7h ago

Your errno checks aren't correct.

errno is only set on an error. It's not cleared on success. If errno was previously set, the function will always abort(). So you need to do something like:

    int saved_errno = errno;
    errno = 0;
    ....
    errno = saved_errno;
    return aInt;

ben0x539•6h ago

Doesn't it set errno to 0 first thing?

ksherlock•5h ago

ahh, you're right. It's still polite to save and restore though :)

kazinator•3h ago

No ISO C standard library function sets errno to zero, but functions not in the standard library are not so obliged.

Functions that clobber errno to zero make it impossible to call several functions and use a nonzero value of errno to conclude that one or more of them went wrong.

If you don't think that such code is a good idea (for instance because you believe that every function that can fail should be checked for its specific error), then you probably have nothing against functions which clobber errno to zero.

snarfy•6h ago

pun driven development

cmovq•5h ago

Interestingly, a complete implementation of strtol [1] is shorter than this wrapper. If you don't like strtol's API or error handling, just implement your own.

[1]: https://github.com/gcc-mirror/gcc/blob/master/libiberty/strt...

> If an overflow is detected, it calls abort()

An aside, but this doesn't detect overflows on Windows due to both long and int being 32 bits (you'd want strtoll for that).

kazinator•3h ago

Where is it documented that atoi wraps strtol?

eqvinox•2h ago

"long" is not guaranteed to be larger than "int", and in fact it isn't on 64-bit Windows. (Let alone 32-bit platforms in general.)

https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

You need to use "long long" or "intmax_t" (and matching strtoll/strtoimax)

Gemini Diffusion

Getting a paper accepted

For algorithms, a little memory outweighs a lot of time

Show HN: Display any CSV file as a searchable, filterable, pretty HTML table

ITXPlus: A ITX Sized Macintosh Plus Logicboard Reproduction

Devstral

Tales from Mainframe Modernization

Gemini figured out my nephew’s name

Google releases Material 3 Expressive, a more emotional UI design system

CERN gears up to ship antimatter across Europe

Rocky Linux 10 Will Support RISC-V

Collaborative Text Editing Without CRDTs or OT

OpenAI to buy AI startup from Jony Ive

Show HN: Confidential computing for high-assurance RISC-V embedded systems

Animated Factorization (2012)

The curious tale of Bhutan's playable record postage stamps (2015)

Possible new dwarf planet found in our solar system

How AppHarvest’s indoor farming scheme imploded (2023)

Sorcerer (YC S24) Is Hiring a Lead Hardware Design Engineer

The Machine Stops (1909)

LLM function calls don't scale; code orchestration is simpler, more effective

Show HN: ClipJS – Edit your videos from a PC or phone

An upgraded dev experience in Google AI Studio

Storefront Web Components

Did Akira Nishitani Lie in the 1994 Capcom vs. Data East Lawsuit?

ZEUS – A new two-petawatt laser facility at the University of Michigan

I have tinnitus. I don't recommend it

Introducing the Llama Startup Program

Understanding the Go Scheduler

London’s water pumps: Where strange history flows freely (2024)

Gemini Diffusion

Getting a paper accepted

For algorithms, a little memory outweighs a lot of time

Show HN: Display any CSV file as a searchable, filterable, pretty HTML table

ITXPlus: A ITX Sized Macintosh Plus Logicboard Reproduction

Devstral

Tales from Mainframe Modernization

Gemini figured out my nephew’s name

Google releases Material 3 Expressive, a more emotional UI design system

CERN gears up to ship antimatter across Europe

Rocky Linux 10 Will Support RISC-V

Collaborative Text Editing Without CRDTs or OT

OpenAI to buy AI startup from Jony Ive

Show HN: Confidential computing for high-assurance RISC-V embedded systems

Animated Factorization (2012)

The curious tale of Bhutan's playable record postage stamps (2015)

Possible new dwarf planet found in our solar system

How AppHarvest’s indoor farming scheme imploded (2023)

Sorcerer (YC S24) Is Hiring a Lead Hardware Design Engineer

The Machine Stops (1909)

LLM function calls don't scale; code orchestration is simpler, more effective

Show HN: ClipJS – Edit your videos from a PC or phone

An upgraded dev experience in Google AI Studio

Storefront Web Components

Did Akira Nishitani Lie in the 1994 Capcom vs. Data East Lawsuit?

ZEUS – A new two-petawatt laser facility at the University of Michigan

I have tinnitus. I don't recommend it

Introducing the Llama Startup Program

Understanding the Go Scheduler

London’s water pumps: Where strange history flows freely (2024)

Ratatoi is a C libary that wraps stdlib's strtol (as atoi does), but it's evil.

Comments