Unsigned Sizes: A Five Year Mistake

https://c3-lang.org/blog/unsigned-sizes-a-five-year-mistake/

31•lerno•1h ago

Comments

kevin_thibedeau•1h ago

Systems programmers love to hate on unsigned integers. Generations have been infected with the Java world model that integers have to be pretend number lines centered on zero. Guess what, you still have boundary conditions to deal with. There are times when you really really need to use the full word range without negative values. This happens more often with low level programming and machines with small word sizes, something fewer people are engaged in. It doesn't need to be the default. Ada has them sequestered as modular types but it's available to use when needed.

uecker•1h ago

Having them available is not the issue, using them for sizes and indices is what causes a lot of tricky bugs.

throwaway894345•17m ago

Why does an unsigned type for sizes or indices fare worse than a signed type? When do I want the -247th element in an array? When do I have a block that is -10 bytes in size?

kevin_thibedeau•11m ago

There are (rare) times when you want negative array indices. C lets you index in both directions from a pointer to the middle of an array. That's why array indexing is signed in C. Some libc ctypes lookup tables do this. For sizing there is no strong case for negatives other than to shoehorn them into signed operations.

throwaway894345•5m ago

That’s interesting but seems pretty dangerous. How do you know you aren’t going to decrement off the front of the array? Keeping the pointer to the first element in the array and using offsets seems safer for humans and I don’t think the computer would care.

uecker•6m ago

the reason is not that you want a negative index or size, but that you want the computation of the index to be correct, and you want to have obvious errors. Both turns out to be easier with signed types.

einpoklum•43m ago

> There are times when you really really need to use the full word range without negative values.

There are a few of those, but that is the niche case. Certainly when we're talking about 64-bit size types. And if you want to cater to smaller size types, then just just template over the size type. Or, OK, some other trick if it's C rather than C++.

pixelesque•6m ago

Sometimes (and very often in some scenarios/industries, i.e. HPC for graphics and simulation with indices for things like points, vertices, primvars, voxels, etc) you want pretty good efficiency of the size of the datatype as well for memory / cache performance reasons, because you're storing millions of them, and need to be random addressing (so can't really bit-pack to say 36 bytes, at least without overhead away from native types, which are really needed for maximum speed without any branching).

Losing half the range to make them signed when you only care about positive values 95% of the time (and in the rare case when you do any modulo on top of them you can cast, or write wrappers for that), is just a bad trade-off.

Yes, you've still then only doubled the range to 2^32, and you'll still hit it at some point, but that extra byte can make a lot of difference from a memory/cache efficiency standpoint without jumping to 64-bit.

So very often uint32_t is a very good sweet spot for size: int32_t is sometimes too small, and (u)int64_t is generally not needed and too wasteful.

pron•16m ago

In Java, unsigned arithmetic is available through an API and, as you said, it is pretty much only needed when marshalling to certain wire protocols or for FFI. Built-in unsigned types are useful primarily for bitfields or similar tiny types with up to 6 bits or so.

ks2048•58m ago

I know language designers have a lot of trade-offs to consider... But I would say if you know a value will logically always be >= 0, better to have a type that reflects that.

The potential bugs listed would be prevented by, e.g. "x--" won't compile without explicitly supplying a case for x==0 OR by using some more verbose methods like "decrement_with_wrap".

The trade-off is lack of C-like concise code, but more safe and explicit.

mamcx•37m ago

I think it should be alike in Pascal where you have size ranges as types, and then, you can declare that this collection fall on this range (and very nicely, you can make it at enum):

https://www.freepascal.org/docs-html/ref/refsu4.html

pron•12m ago

> But I would say if you know a value will logically always be >= 0, better to have a type that reflects that.

Except that's not quite what unsigned types do. They are not (just) numbers that will always be >= 0, but numbers where the value of `1 - 2` is > 1 and depends on the type. This is not an accident but how these types are intended to behave because what they express is that you want modular arithmetic, not non-negative integers.

> e.g. "x--" won't compile without explicitly supplying a case for x==0

If you want non-negative types (which, again, is not what unsigned types are for) you also run into difficulties with `x - y`. It's not so simple.

There are many useful constraints that you might think it's "better to have a type that reflects that" - what about variables that can only ever be even? - but it's often easier said than done.

Groxx•7m ago

That's true for signed numbers too though? `int_min - 2 > int_min`

I agree they're a bit more error-prone in practice, but I suspect huge part of that is because people are so used to signed numbers. And, legitimately, zero is a more commonly-encountered value... but that can push errors to occur sooner, which is generally a desirable thing.

LegionMammal978•58m ago

> But what about the range? While it’s true that you get twice the range, surprisingly often the code in the range above signed-int max is quite bug-ridden. Any code doing something like (2U * index) / 2U in this range will have quite the surprise coming.

Alas, (2S * signed_index) / 2S will similarly result in surprises the moment the signed_index hits half the signed-int max. There's no free lunch when trying to cheat the integer ranges.

lerno•6m ago

The difference is that in the unsigned case you get a seemingly plausible value, and in the signed case you get a negative value which you can be sure is wrong. This is the problem.

ximm•53m ago

Is the text on this page really #bbbdc3 on #ffffff? How is anyone supposed to be able to read that?

idbehold•37m ago

For me it's #353841 on #ffffff which meets WCAG AAA standards for accessible text.

sureglymop•24m ago

Weirdly, you have to turn on javascript for the text color to change...

Groxx•52m ago

>If sizes are unsigned, like in C, C++, Rust and Zig – then it follows that anything involving indexing into data will need to either be all unsigned or require casts. With C’s loose semantics, the problem is largely swept under the rug, but for Rust it meant that you’d regularly need to cast back and forth when dealing with sizes.

TBH I've had very little struggle with this at all. As long as you keep your values and types separate, the unsigned type that you got a number from originally feeds just fine into the unsigned type that you send it to next. Needing casting then becomes a very clear sign that you're mixing sources and there be dragons, back up and fix the types or stop using the wrong variable. It's a low-cost early bug detector.

Implicitly casting between integer types though... yeah, that's an absolute freaking nightmare.

jonstewart•47m ago

I hate using languages that only have signed integers. Using integers that can’t be negative fits many problems nicely and avoids the edge case of having to check for negative.

einpoklum•39m ago

It's not "can't be negative", it's just that the semantics for negativity is wrapping around.

And - yes, there are very important use cases for unsigned/modulo-2n/wraparound values. But sizes of data structures are generally _not_ one of those use cases. The fact that the size is non-negative does not mean that the type should be unsigned. You should still be able to, say, subtract sizes and get a signed value which may be negative.

throwaway894345•9m ago

That’s definitely not true. Unsigned ints have no “negativity” semantic. Wrapping around is what happens when you decrement the minimum value of any integer type, including signed types. Regardless of the type you use to represent an integer value that cannot legally be negative, you will have to take care not to allow your program to return values lower than zero for things like indices or sizes.

EdSchouten•40m ago

> If sizes are unsigned, like in C, C++, Rust and Zig – then it follows that anything involving indexing into data will need to either be all unsigned or require casts.

I don’t really get this claim. Indexing should just look up the element corresponding to the value provided. It’s easy to come up with semantics that are intuitive and sound, even if signed integers or ones smaller than size_t are used.

adrian_b•21m ago

Indexing does that, but the indices must vary in a certain range, whose limits are frequently determined by using something like "sizeof(array)/sizeof(element)" which is an unsigned number.

This is especially inconvenient in C, where there exist extremely dangerous legacy implicit casts between signed integers and unsigned integers, which have a great probability of generating incorrect values.

Because the index is typically a signed integer, comparing it with an unsigned limit without using explicit casts is likely to cause bugs. Using explicit casts of smaller unsigned integers towards bigger signed integers results in correct code, but it is cumbersome.

These problems are avoided as said in TFA, by making "sizeof" and the like to have 64-bit signed integer values, instead of unsigned values.

Well chosen implicit conversions are good for a programming language, by reducing unnecessary verbosity, but the implicit integer conversions of C are just wrong and they are by far the worst mistake of C much worse than any other C feature.

Other C features are criticized because they may be misused by inexperienced or careless programmers, but most of the implicit integer conversions are just incorrect. There is no way of using them correctly. Only the conversions from a smaller signed integer to a bigger signed integer are correct.

Mixed signedness conversions have always been wrong and the conversions between unsigned integers have been made wrong by the change in the C standard that has decided that the unsigned integers are integer residues modulo 2^N and they are not non-negative integers.

For modular integers, the only correct conversions are from bigger numbers to smaller numbers, i.e. the opposite of the implicit conversions of C. The implicit conversions of C unsigned numbers would have been correct for non-negative integers, but in the current C standard there are no such numbers.

The current C standard is inconsistent, because the meaning of sizeof is of a non-negative integer and this is also true for the conversions between unsigned numbers, but all the arithmetic operations with unsigned numbers are defined to be operations with integer residues, not operations with non-negative numbers.

The hardware of most processors implements at least 3 kinds of arithmetic operations: operations with signed integers, operations with non-negative integers and operations with integer residues.

Any decent programming language should define distinct types for these kinds of numbers, otherwise the only way to use completely the processor hardware is to use assembly language. Because C does not do this, you have to use at least inline assembly, if not separate assembly source files, for implementing operations with big numbers.

cperciva•24m ago

I don't get it. Is this a parody of poor design decisions?

Sure, it's possible to write bugs in C. And if you really want to, you can disable the compiler warnings which flag tautologous comparisons and mixed-sign comparisons (a common reason for doing this is to avoid spurious warnings in generic-type code).

But, uhh, "people can deliberately write bugs" has got to be the weakest justification I've ever seen for changing a language feature -- especially one as fundamental as "sizes of objects can't be negative".

IshKebab•21m ago

It seems like they've identified common bugs patterns in C that would have been ameliorated by using signed, but come to the wrong conclusion that signed is the correct answer rather than that C is poorly designed for making the broken code the easy option.

Fix the language. Don't hack around it by using the wrong type.

Dav2d

Inventions for battery reuse and recycling increase more than 7-fold in last 10y

NetHack 5.0.0

Do_not_track

Unsigned Sizes: A Five Year Mistake

Flue is a TypeScript framework for building the next generation of agents

California to begin ticketing driverless cars that violate traffic laws

Barman – Backup and Recovery Manager for PostgreSQL

How fast is a macOS VM, and how small could it be?

Why does it take so long to release black fan versions?

Little Magazines Are Back

Roblox shares plummet 18% as child safety measures weigh on bookings

Uber wants to turn its drivers into a sensor grid for self-driving companies

Refusal in Language Models Is Mediated by a Single Direction

Welcome to Hell Developer

Why IPv6 is so complicated

The USB Situation

Why are there both TMP and TEMP environment variables? (2015)

Open Design: Use Your Coding Agent as a Design Engine

Dotcl: Common Lisp Implementation on .NET

Canonical Under Attack

Ti-84 Evo

America's Expanding Domestic Surveillance

Show HN: Pollen – distributed WASM runtime, no control plane, single binary

Show HN: DAC – open-source dashboard as code tool for agents and humans

Artemis II Photo Timeline

Also-RANS: Asymmetric Numeral Systems for Entropy Coding

Zugzwang

AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

DeepSeek V4—almost on the frontier

Dav2d

Inventions for battery reuse and recycling increase more than 7-fold in last 10y

NetHack 5.0.0

Do_not_track

Unsigned Sizes: A Five Year Mistake

Flue is a TypeScript framework for building the next generation of agents

California to begin ticketing driverless cars that violate traffic laws

Barman – Backup and Recovery Manager for PostgreSQL

How fast is a macOS VM, and how small could it be?

Why does it take so long to release black fan versions?

Little Magazines Are Back

Roblox shares plummet 18% as child safety measures weigh on bookings

Uber wants to turn its drivers into a sensor grid for self-driving companies

Refusal in Language Models Is Mediated by a Single Direction

Welcome to Hell Developer

Why IPv6 is so complicated

The USB Situation

Why are there both TMP and TEMP environment variables? (2015)

Open Design: Use Your Coding Agent as a Design Engine

Dotcl: Common Lisp Implementation on .NET

Canonical Under Attack

Ti-84 Evo

America's Expanding Domestic Surveillance

Show HN: Pollen – distributed WASM runtime, no control plane, single binary

Show HN: DAC – open-source dashboard as code tool for agents and humans

Artemis II Photo Timeline

Also-RANS: Asymmetric Numeral Systems for Entropy Coding

Zugzwang

AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

DeepSeek V4—almost on the frontier

Unsigned Sizes: A Five Year Mistake

Comments