Some nonstring Turbulence

https://lwn.net/SubscriberLink/1018486/1dcd29863655cb25/

136•jwilk•6mo ago

Comments

jey•6mo ago

That the annotation applies to variables and not types is surely an oversight or mistake right? Seems like it could have been easier to initially implement that way but it just doesn’t seem to fit with how C type system works. (Yes it will make declarations uglier to do it on types but that ship has sailed long ago; see cdecl.org)

leni536•6mo ago

I either don't understand how the annotation would work on types, or what would be gained by it. What type would be annotated? A typedef to char[]?

edit: Unless what they actually mean is annotating struct members, that would actually make sense.

_nalply•6mo ago

I do understand.

I imagine that it could work a little bit like unsigned: a modifier to integer types that tells that an integer's MSB is not to be used as a sign bit.

__nonstring__ tells that the last byte of a byte sequence doesn't need to be NUL.

I would find it sensible allowing putting the attribute to a type, but whatever.

leni536•6mo ago

But that doesn't make any difference in the way you have to address existing `char arr[4] = "abcd"` declarations.

_nalply•6mo ago

True.

This would be only useful in typedefs. An API could declare some byte arrays not strings. But again, whatever.

rurban•6mo ago

And how would you type a string vs byte array then? C doesn't even have proper string support yet, ie unicode strings. Most wchar functions don't care at all about unicode rules. Zero-terminated byte buffers are certainly not strings, just garbage.

C will never get proper string support, so you'll never be able to seperate them from zero-terminated byte buffers vs byte-buffers in the type system.

So annotating vars is perfectly fine.

The problem was that the PM and Release manager was completely unaware of the state of the next branch, of its upcoming problems and fixes, and just hacked around in his usual cowboy manner. Entirely unprofessional. A release manager should have been aware of Kees' gcc15 fixes.

But they have not tooling support, no oversight, just endless blurbs on their main mailinglist. No CI for a release candidate? Reminds us of typical cowboys in other places.

iforgotpassword•6mo ago

I think the idea is simply to

  typedef __nostring__ char* bytes;

And then use that type instead of annotating every single variable declaration.

OskarS•6mo ago

But you would still need to change it everywhere, right? Like, instead of changing the annotation everywhere you have to change the type everywhere. Doesn't seem like a huge difference to me.

Animux•6mo ago

There is a difference if the type is used inside a structure.

OskarS•6mo ago

Fair, good point.

timewizard•6mo ago

> No CI for a release candidate?

If the CI system didn't get the Fedora upgrade then it would not have caught it. Aside from that the kernel has a highly configurable build process so getting good coverage is equally complex.

Plus, this is a release candidate, which is noted as being explicitly targeted at developers and enthusiasts. I'm not sure the strength of Kees' objections are well matched to the size of the actual problem.

badmintonbaseba•6mo ago

But Linus broke the kernel for gcc<15, a CI would have surely caught it.

And Linus is usually much more critical in what gets into master when it comes to other people's contribution, let alone into an RC.

dataflow•6mo ago

> That the annotation applies to variables and not types is surely an oversight or mistake right?

I don't think so. It doesn't make sense on the type. Otherwise, what should happen here?

  char s[1];
  char (__nonstring ns)[1];  // (I guess this would be the syntax?)
  s[0] = '1';
  ns[0] = '\0';
  char* p1 = s;  // Should this be legal?
  char* p2 = ns;  // Should this be legal?
  char* __nonstring p3 = s;  // Should this be legal?
  char* __nonstring p4 = ns;  // Should this be legal?

  foo(s, ns, p1, p2, p3, p4);  // Which ones can foo() assume to be NUL-terminated?
                               // Which ones can foo() assume to NOT be NUL-terminated??

By putting it in the type you're not just affecting the initialization, you're establishing an invariant throughout the lifetime of the object... which you cannot enforce in any desirable way here. That would be equivalent to laying a minefield throughout your code.

dwattttt•6mo ago

Do you mean s & ns to be swapped? ns starts with a NUL terminator and s does not.

dataflow•6mo ago

No actually, that was the point. I was asking, what do you think should happen if you store a NUL when you're claiming you're not. Or if you don't store a NUL, when you claim it's there.

dwattttt•6mo ago

Well, as a human compiler, I said "Hey, you've non-NUL terminated a NUL terminated string". If that was what you intended you should use the type annotation for that, so I think that case worked as intended.

EDIT: > what do you think should happen if you store a NUL when you're claiming you're not

I don't believe nonstring implies it doesn't end with a NUL, just that it isn't required to.

dataflow•6mo ago

But char[] already isn't required to be NUL-terminated to begin with. char a[1] = {'a'} is perfectly fine, as is a[0] = '1'. If all you want to do is to document the fact that a type can do exactly what it already can... changing the type to something new doesn't make sense.

Note that "works as intended" isn't sole the criterion for "does it make sense" or "should we do this." You can kill a fly with a cannon too, and it achieves the intended outcome, but that doesn't mean you should.

toast0•5mo ago

Is ns NUL terminated, or is it an array of chars that happens to end with NUL?

dwattttt•5mo ago

If ns is __nonstring, it could be the latter. Without it, it should be the former and warn if it's not. That's not ambiguous.

_nalply•6mo ago

Perhaps unsigned could help here with understanding.

unsigned means, don't use of an integer MSB as sign bit. __nonstring means, the byte array might not be terminated with a NUL byte.

So what happens if you use integers instead of byte arrays? I mean cast away unsigned or add unsigned. Of course these two areas are different, but one could try to design such features that they behave in similar ways where it makes sense.

I am unsure but it seems, if you cast to a different type you lose the conditions of the previous type. And "should this be legal", you can cast away a lot of things and it's legal. That's C.

But whatever because it's not implemented. This all is hypothetical. I understand GCC that they took the easier way. Type strictness is not C's forte.

dataflow•6mo ago

> Perhaps unsigned could help here with understanding.

No, they're very different situations.

> unsigned means, don't use of an integer MSB as sign bit.

First: unsigned is a keyword. This fact is not insignificant.

But anyway, even assuming they were both keywords or both attributes: "don't use an MSB as a sign bit" makes sense, because the MSB otherwise is used as a sign bit.

> __nonstring means, the byte array might not be terminated with a NUL byte.

The byte array already doesn't have to contain a NUL character to begin with. It just so happens that you usually initialize it somewhere with an initializer that does, but it's already perfectly legal to strip that NUL away later, or to initialize it in a manner that doesn't include a NUL character (say, char a[1] = {'a'}). It doesn't really make sense to change the type to say "we now have a new type with the cool invariant that is... identical to the original type's."

> I understand GCC that they took the easier way. Type strictness is not C's forte.

People would want whatever they do to make sense in C++ too, FWIW. So if they introduce a type incompatibility, they would want it to avoid breaking the world in other languages that enforce them, even if C doesn't.

AceJohnny2•6mo ago

Fedora stupidly uses beta compiler in new release, Torvalds blindly upgrades, makes breaking, unreviewed changes in kernel, then flames the maintainer who was working on cleanly updating the kernel for the not-yet-released compiler?

I admire Kees Cook's patience.

eb0la•6mo ago

IMHO Cook is following good development practices.

You need to know what you support. If you are going to change, it must be planned somehow.

I find Torwalds reckless by changing his development environment before release. If he really needs that computer to release the kernel, it must be stable one. Even better: it should be a VM (hosted somewhere) or part of a CI-CD pipeline.

evgpbfhnr•6mo ago

He releases rc every single week (ok, except before rc1 there's two weeks for merge window), there's no "off" time to upgrade anywhere.

Not that I approve the untested changes, I'd have used a different gcc temporarily (container or whatever), but, yeah, well...

jaapz•6mo ago

I find it surprising that linus bases his development and release tools based on whatever's in the repositories at that time. Surely it is best practice to pin to a specified, fixed version and upgrade as necessary, so everyone is working with the same tools?

This is common best practice in many environments...

Linus surely knows this, but here he's just being hard headed.

IsTom•6mo ago

People downloading and compiling the kernel will not be using a fixed version of GCC.

charcircuit•6mo ago

Why not specify one?

mort96•6mo ago

What would that help? People use the compilers in their distros, regardless of what's documented as a supported version in some readme.

GTP•6mo ago

Because then, if something that is expected to compile doesn't compile correctly, you know that you should check your compiler version. It is the exact same reason why you don't just specify which library your project depends on but also the libraries' version.

leenify•6mo ago

That can work, but it can also bring quite a few issues. Mozilla effectively does this; their build process downloads the build toolchain, including a specific clang version, during bootstrap, i.e., setting up the build environment.

This is super nice in theory, but it gets murky if you veer off the "I'm building current mainline Firefox path". For example, I'm a maintainer of a Firefox fork that often lags a few versions behind. It has substantial changes, and we are only two guys doing the major work, so keeping up with current changes is not feasible. However, this is a research/security testing-focused project, so this is generally okay.

However, coming back to the build issue, apparently, it's costly to host all those buildchain archives. So they get frequently deleted from the remote repository, which leads to the build only working on machines that downloaded the toolchain earlier (i.e., not Github action runner, for example).

Given that there are many more downstream users of effectively a ton of kernel versions, this quickly gets fairly expensive and takes up a ton of effort unless you pin it to some old version and rarely change it.

So, as someone wanting to mess around with open source projects, their supporting more than 1 specific compiler version is actually quite nice.

charcircuit•5mo ago

Conceptually it's no different than any other build dependency. It is not expensive to host many versions. $1 is enough to store over 1000 compiler versions which would be overkill for the needs of the kernel.

dooglius•6mo ago

People are usually going to go through `make`, I don't see a reason that couldn't be instrumented to (by default) acquire an upstream GCC vs whatever forked garbage ends up in $PATH

bombcar•6mo ago

This would result in many more disasters as system GCC and kernel GCC would quickly be out of sync causing all sorts of "unexpected fun".

dooglius•6mo ago

Why would it go wrong, the ABI is stable and independent of compiler? You would hit issues with C++ but not C. I have certainly built kernels using different versions of GCC than what /lib stuff is compiled with, without issue.

ndesaulniers•5mo ago

You'd think that, but in effect kconfig/kbuild has many cases where they say "if the compiler supports flag X, use it" where X implies an ABI break. Per task stack protectors comes to mind.

dooglius•5mo ago

Ah that's interesting, thanks

hyperpape•6mo ago

I'm completely unsure whether to respond "it was stable, he was running a release version of Fedora" or "there's no such thing as stable under Linux".

The insanity is that the Kernel, Fedora and GCC are so badly coordinated that the beta of the compiler breaks the Kernel build (this is not a beta, this is a pre-alpha in a reasonable universe...is the Kernel a critical user of GCC? Apparently not), and a major distro packages that beta version of the compiler.

To borrow a phrase from Reddit: "everybody sucks here" (even Cook, who looks the best of everyone here, seems either oblivious or defeated about how clownshoes it is that released versions of major linux distros can't build the Kernel. The solution of "don't update to release versions" is crap).

(Writing this from a Linux machine, which I will continue using, but also sort of despise).

ploxiln•6mo ago

The real problem here was "-Werror", dogmatically fixing warnings, and using the position of privilege to push in last-minute commits without review.

Compilers will be updated, they will have new warnings, this has happened numerous times and will happen in the future. The linux kernel has always supported a wide range of compiler versions, from the very latest to 5+ years old.

I've ranted about "-Werror" in the past, but to try to keep it concise: it breaks builds that would and should otherwise work. It breaks older code with newer compiler and different-platform compiler. This is bad because then you can't, say, use the exact code specified/intended without modifications, or you can't test and compare different versions or different toolchains, etc. A good developer will absolutely not tolerate a deluge of warnings all the time, they will decide to fix the warnings to get a clean build, over a reasonable time with well-considered changes, rather than be forced to fix them immediately with brash disruptive code changes. And this is a perfect example why. New compiler fine, new warnings fine. Warnings are a useful feature, distinct from errors. "-Werror" is the real error.

mort96•6mo ago

With or without -Werror, you need your builds to be clean with the project's chosen compilers.

Linux decided, on a whim, that a pre-release of GCC 15 ought to suddenly be a compiler that the Linux project officially uses, and threw in some last-minute commits straight to main, which is insane. But even without -Werror, when the project decides to upgrade compiler versions, warnings must be silenced, either through disabling new warnings or through changing the source code. Warnings have value, and they only have value if they're not routinely ignored.

For the record, I agree that -Werror sucks. It's nice in CI, but it's terrible to have it enabled by default, as it means that your contributors will have their build broken just because they used a different compiler version than the ones which the project has decided to officially adopt. But I don't think it's the problem here. The problem here is Linus's sudden decision to upgrade to a pre-release version of GCC which has new warnings and commit "fixes" straight to main.

llm_nerd•6mo ago

This is my take-away as well. Many projects let warnings fester until they hit a volume where critical warnings are missed amidst all the noise. That isn't ideal, but seems to be the norm in many spaces (for instance the nodejs world where it's just pages and pages of warnings and deprecations and critical vulnerabilities and...).

But pushing breaking changes just to suppress some new warning should not be the alternative. Working to minimize warnings in a pragmatic way seems more tenable.

MrJohz•5mo ago

Ironically, as a NodeJS dev, I was going to say the opposite: I'm very used to the idea that you have a strict set of warnings that block the build completely if they fail, and I find it very strange in the C world that this isn't the norm. But I think that's more to do with being able to pin dependencies more easily: by default, everyone on projects I work with uses the same set of dependencies always, including build departments and NodeJS versions. And any changes to that set of dependencies will be recorded as part of the repository history, so if be warnings/failures show up, it's very easy to see what caused it.

Whereas in a lot of the C (and C++, and even older Python) codebases I've seen, these sorts of dependencies aren't locked to the same extent, so it's harder to track upgrades, and therefore warnings are more likely to appear, well without warning.

But I think it's also probably the case that a C expert will produce codebases that have no warnings, and a C novice will produce codebases filled with warnings, and the same for JS. So I can imagine if you're just "visiting" the other language's ecosystem, you'll see worse projects and results than if you've spent a while there.

ndesaulniers•5mo ago

Sadly, I lost that battle with Torvalds. You can see me make some of those points on LKML.

ploxiln•5mo ago

I see, thanks. ( Found it here: https://lkml.org/lkml/2021/9/7/716 )

josefx•6mo ago

> makes breaking, unreviewed changes in kernel,

And reverted them as soon as the issue became apparent.

> then flames the maintainer who was working on cleanly updating the kernel for the not-yet-released compiler?

Talking aboutchanges that he had not pushed by the time Linus published the release candidate.

Also the "not yet released" seems to be a red herring, as the article notes having beta versions of compilers in new releases is a tradition for some distros, so that should not be unexpected. It makes some sense since distros tend to stick to a compiler for each elease, so shipping a soon to be out of maintenance compiler from day one will only cause other issues down the road.

Kwpolska•6mo ago

Fedora releases are supported for about 13 months after release. They could live with an older version of GCC for a year.

Denvercoder9•5mo ago

> They could live with an older version of GCC for a year.

That's just not what Fedora is, though. Being on the bleeding edge is foundational to Fedora, even if it's sometimes inconvenient. If you want battle-tested and stable, don't run Fedora, but use Debian or something.

Kwpolska•5mo ago

Bleeding-edge is fine, but shipping a beta C compiler seems a bridge too far. Even Arch does not ship GCC 15 yet.

rwmj•6mo ago

The GCC 15 transition has been very disruptive, but Fedora is known for being on the bleeding edge ("first" is in the "four foundations" [1]). Be glad because eventually everyone will get GCC 15, and we've worked out most of the problems for you already.

[1] https://docs.fedoraproject.org/en-US/project/

genewitch•6mo ago

Do you work in marketing

stefan_•6mo ago

GCC 15.1 was released today. Your Fedora release was two weeks earlier, now using a nonexistent version of 15.0.1, ironically now including bugs you reported and that were fixed for 15.1. That just seems like poor decision making.

rwmj•6mo ago

You're belittling the large amount of work done across thousands of packages to get them ready for GCC 15, which did involve backporting fixes to GCC 15 itself. All those fixes went into GCC upstream. GCC 15.1 was released two hours ago as of writing this message, even before the US wakes up, yet I'm sure there will be a build of it in Fedora later today.

blueflow•6mo ago

Creating the fake release for gcc was by no means necessary for that.

rwmj•6mo ago

GCC 15.1 building: https://koji.fedoraproject.org/koji/buildinfo?buildID=270512...

ahoka•6mo ago

This is just GCC 2.96 again, they will never learn.

bonzini•6mo ago

GCC 2.96 lasted a year or more and even after GCC 3.0 was released it wasn't able to compile a working kernel. This lasted two weeks and the issue is just a new warning; it's just bad timing across the release cycles of two projects.

mackal•5mo ago

Gentoo also has a tracker [1] for GCC 15 issues that they've been working on as well. (Note: GCC 15 is masked in Gentoo so you have to go out of your way to install it)

[1] https://bugs.gentoo.org/932474

JoshTriplett•6mo ago

Exactly. As quoted in the article:

> you didn't coordinate with anyone. You didn't search lore for the warning strings, you didn't even check -next where you've now created merge conflicts. You put insufficiently tested patches into the tree at the last minute and cut an rc release that broke for everyone using GCC <15. You mercilessly flame maintainers for much much less.

Hypocrisy is an even worse trait than flaming people.

genewitch•6mo ago

On the one hand, sure, fine. He has raked people for less. However this is just an RC. Further, how long has Linus been doing this?

I remember Maddox on xmission having a page explaining that while he may make a grammatical error from time to time, he has published literally hundreds of thousands of words, and the average email he receives contains 10% errors.

However, Linus is well-known for being abrasive, abusive, call it what you want. If you can't take it, don't foist it, Linus. Even if you've earned the right, IMO.

7bit•6mo ago

Nobody earns the right to be an asshole. That is nothing that can be earned.

wizzwizz4•6mo ago

I'd say if you're doing truly-heroic solo efforts, then you can earn that. (But I can only think of fictional examples.) For team efforts like the Linux kernel, sure, no amount of individual contribution to that project grants you the right to belittle the other contributors.

nick__m•6mo ago

Fabrice Bellard as earned that right but somehow I don't think he is !

wizzwizz4•5mo ago

Fabrice Bellard's work is impressive, but I wouldn't call it heroic. I was thinking more like, the grumpy-guts who ensures the local homeless shelter is adequately stocked with food, clean bedding, and toiletries, day-in and day-out, even in the depths of winter. You're allowed to be vaguely misanthropic in your interpersonal relationships if you're doing something like that, at least in my book.

Again, the only non-fictional people I know who qualify, are actually really nice to people.

sanderjd•5mo ago

Still nope.

kelnos•5mo ago

This idea that if you've done great things, then you've earned the right to treat people poorly, needs to go away. It's toxic and gross, and we should expect and demand better of our heroes (and ourselves).

mannykannot•6mo ago

Indeed. On the other hand, the right to show that you are an asshole is available to anyone, and it has become quite popular!

deeThrow94•6mo ago

> Hypocrisy is an even worse trait than flaming people.

Eh I mean everyone's a hypocrite if you dig deep enough—we're all a big nest of contradictions internally. Recognition of this and accountability is paramount though. He could have simply owned his mistake and swallowed his pride and this wouldn't have been such an issue.

jaapz•6mo ago

Torvalds is known for being flamey towards kernel maintainers, but most of the time that is for good reason. Here however, he should just admit he made a mistake instead of doubling down. Admitting your own mistakes is a mark of a great maintainer as well.

bonzini•6mo ago

Yeah, admitting he's wrong is certainly not his strong suit. He will do so years down the road but not in the heat of the argument.

bastawhiz•5mo ago

Torvalds is exactly the sort of person I'd leave a company to be as far away from as possible. He's brilliant but absolutely insufferable whether it's deserved or not. Anyone with admin access who leans into a pissing contest after breaking the build because they upgraded their operating system just before a release rather than taking the hour to fix the mess they made is going to make you hate your job. God bless the well-mannered kernel maintainers who grind past it.

ndesaulniers•5mo ago

I don't miss working on the kernel, tbf. Constant arguments.

conradfr•5mo ago

Maybe that's the sign that someone has been doing this job for too long and need to be replaced by someone not tired of it.

Ukv•6mo ago

From the comments:

> C "strings" work the way they do because C is a low level language, where you want to be able to do low-level things when necessary. It's a feature, not a deficiency.

Are NUL-terminated strings really considered preferable, even for low-level work? I always just considered them an unfortunate design choice C was stuck with.

Many O(1) operations/checks become O(n) because you have to linearly traverse the entire string (or keep a second pointer) to know where it ends/how long it is; you can't take a substring within another string without reallocating and copying that part over with a new NUL appended at the end; you can't store data that may contain a NUL (which text shouldn't, in theory, but then you need a separate approach for binary data); and plenty of security issues arise from missing or extra NULs.

formerly_proven•6mo ago

C's design is probably the most post-hoc rationalized thing in the world directly after Abrahamic scripture.

"Of course the null-terminated strings of C are more low-level than the length-prefixed strings of Pascal, because the elders of C wisely designed them to be so." Alternatively, something is low-level because it works like C because C semantics have simply become the universal definition of what is thought of as low-level, regardless of machine mismatch.

Likewise, maybe it's not such a good idea that UNIXv6 or other educational unix-likes are used in operating system classes in universities. It's well-applicable, sure, but that's not the point of that education. Maybe we should use a Japanese or German clone of some IBM mainframe system instead, so that people actually get exposed to different ideas, instead of slightly simpler and less sophisticated versions of the ideas they are already familiar with. Too much unix-inbreeding in CS education isn't good.

pjmlp•6mo ago

Especially when C advocates tend to ignore the history of systems programming languages predating the language by a decade, because the authors decided it was cooler to do their own thing, notice a similar pattern to other languages?

> Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.

-- https://www.nokia.com/bell-labs/about/dennis-m-ritchie/chist...

And using Pascal as counter example gets tiresome, not only it wasn't designed for systems programming, most of its dialects did fix those issues including its revised report (ISO Extended Pascal), by 1978 Niklaus Wirth had created Modula-2, based on Mesa (Xerox PARC replacement for their use of BCPL), both of which never had problem with string lengths.

formerly_proven•6mo ago

Well it's just the common name for that particular string representation, even though it certainly existed before Pascal - just like C did not invent null-terminated strings, either.

pjmlp•6mo ago

The name has nothing to do with the insecure way it was implemented in C.

xnorswap•6mo ago

I agree there's a teaching problem happening somewhere. I'm not sure I blame CS-education since I'd wager that most developers don't have a formal CS background.

I too regularly however come across people who believe some or all of the following:

- "Everything is ultimately just C"

- "All other languages just compile to C, so you should use it to be fast"

- "C is faster because it's closer to bare metal"

- "C is fast because it doesn't need to be interpreted unlike all other languages"

The special elevated position of C, being some kind of "ground truth" of computers is bizarre. It leads to all kinds of false-optimizations in practitioners in other languages out of some kind of misplaced confidence in the speed of C relative to all other languages.

The idea that C is "naturally faster" due to being some kind of representation of a computer that no other language could achieve is a hard myth to shake.

IshKebab•6mo ago

Yeah it was clearly an old design mistake. There's never a situation now where null-terminated strings make more sense than length-prefixed. I'm dubious they were ever better.

nofriend•5mo ago

Null terminated strings make writing parsers really clean. The null byte becomes another character for you to check against in your parser code, so you don't need a separate check for the length (usually checks are inclusive: it is character 'a'? if not, defer to caller. so checking for null byte can happen in a single location, whereas checking for length would need to happen in every function). And it means you have lots of *ptr++ spread around your code, rather than having to pass around a struct and modify it, or call methods on it.

IshKebab•5mo ago

It's really not hard to check the length. Checking null bytes also adds an awkward memory data dependency that can make SIMD more awkward. It also makes strlen O(n) which is kinda shit - for example it led to that famous GTA5 accidental O(n^2) bug.

For situations where a null terminator really is better it's easy to add them to a length-prefixed string, whereas the reverse is not true.

They clearly got this wrong.

GuB-42•5mo ago

Zero-termination is not lower level than having a separate size, or an end pointer.

What is low level is deciding on an memory representation and working with it directly. A high level language will just have a "string" object, its internal representation is hidden to the programmer and could potentially be changed between versions.

In C, "string" has a precise meaning, it is a pointer to a statically allocated array of bytes with the characters 's', 't', 'r', 'i', 'n', 'g' followed by a zero. That is the low level part, C programmers manipulate the memory directly and need such guarantees. Had it been defined as the number of characters in 4 bytes followed by each character of 2 bytes each in native endian would be just as low level. Defining it as "it is a character string, use the standard library and don't look too closely", as it is the case in Java is high level.

The "feature" is that the memory representation of strings is well defined. The choice of zero-termination has some pros and cons.

Note that in many cases, you can use size+data instead, using mem* functions instead of the str* ones. And though it is not ideal, you can also use "%.*s" in printf(). Not ideal, but workable.

wat10000•5mo ago

They have one advantage, which is saving 3 bytes of memory (depending on what you decide your max supported string length should be) per string. It's hard to imagine an environment where that's a worthwhile tradeoff, even in the most constrained embedded systems (where you can probably get away with a 16-bit length field and thus only save one byte), but they're not completely without merit.

Philpax•6mo ago

Linus was a hypocritical asshole here, but more to the point, why are they using strings for this anyway? No byte arrays / literals in their C dialect?

xnorswap•6mo ago

My layman understanding is that it's the other way around, C doesn't have a string type.

Since C doesn't have a string type, "quoted strings" are actually char[] but with '\0' as an extra last character.

People have therefore made warnings happen when defining a char[] which silently truncates the '\0', because that's a common source of bugs.

They've then had to develop a way of easily disabling that warning from being generated, because it's also common enough to want to avoid the warning.

All of this seems insane coming from a modern language.

But look at the complete disaster that was the Python 2 -> 3 migration, a large motivator for which was "fixing" strings from a non-unicode to unicode compatible type. A decade or more of lost productivity as people struggled to migrate.

There's no way to actually fix C. Just keep recommending that people don't use it.

Philpax•5mo ago

Yep, agreed on all accounts; I'm an advocate for Rust for Linux for these reasons, among others.

My thinking was that the Linux kernel already uses a custom dialect of C with specific features that benefit their workflow; I'm surprised that one of those features wasn't a

    char[] charset = b"abcdefghijklmnopqrstuvwxyz";

that would allow for intent to be signalled to the compiler.

surajrmal•5mo ago

Rust for Linux still uses null terminated strings for comparability.

Incipient•6mo ago

This is definitely unexpected for me - I'd have thought something like an RC for a kernel would have to be 'approved' for release only after passing all tests, which should include building with all official compilers (and all official architectures, etc).

Unless either the older GCC or the beta GCC isn't "official"? In which case that's not necessarily expected to be picked up in an RC?

bonzini•6mo ago

Release candidates are just time-based for many projects. For Linux, in addition, the rhythm of stabilization can be different for various subsystems.

Philpax•5mo ago

My understanding is that Linux has no form of CI [0], so they don't actually have an automated way to check for compilation across all platforms and compilers.

[0]: https://lwn.net/Articles/1018802/

bjourne•6mo ago

Err... we teach C neophytes that you should never write values to variables that are larger than what the variables can hold. Don't write an int to a short, don't write a short to a char, and don't initialize five bytes to an array storing four bytes. Am I missing something here? char foo[4] = "ABCD" is always incorrect, no ifs and buts. If you want "readable" bytes, use character literals. You should never discount the null terminator.

SonOfLilit•5mo ago

Yes, you're missing something.

Sometimes I'll need an array of 4 ints, so I'll define one:

    int a[4] = {1,2,3,4};

other times I'll want 4 bytes. So sure, I can write:

    char a[4] = {'A','B','C','D'};

However, (I hope) I'll get the exact same compiler warning as the more readable:

    char a[4] = "ABCD";

that does the exact same. So I'll need the __nonstring__ anyway. And then why not use the more readable syntax, since I'm telling the compiler and reader explicitly that I don't want a null terminator?

The core issue is C's habit of using the exact same language construct for different purposes, here char[] for both uint8_array and null_terminated_str.

bjourne•5mo ago

Your char a[4] is not more readable and sooner or later you'll get screwed by strlen(a) or some such. It's quite telling that the construct is not legal in c++.

SonOfLilit•5mo ago

Why would I run strlen() on the second one but not on the first one? Presumably I know that I defined an array of chars and not a cstring? Or if I forget, couldn't I forget in the first case too? Once I defined it, they're both just char[4]s.

nofriend•5mo ago

> However, (I hope) I'll get the exact same compiler warning as the more readable:

The latter is a null terminated string, the former is not. Compiler warnings are principally a set of heuristics for bad code. Heuristically the first example is more likely to be intentional than the latter.

mastax•6mo ago

It’s still shocking to me that there’s no official kernel CI.

quink•5mo ago

Wouldn’t mind if Torvalds could do to CI/CD what he did to VCS with git.

MinIO (apparently) becomes source-only

Internet's biggest annoyance: Cookie laws should target browsers, not websites

Die shots of as many CPUs and other interesting chips as possible

A Brain-like LLM to replace Transformers

Jaguar Land Rover hack cost UK economy an estimated $2.5B

Tesla Recalls Almost 13,000 EVs over Risk of Battery Power Loss

Evaluating the Infinity Cache in AMD Strix Halo

Go subtleties

Infracost (YC W21) Hiring First Dev Advocate to Shift FinOps Left

Knocker, a knock based access control system for your homelab

Show HN: Cadence – A Guitar Theory App

Greg Newby, CEO of Project Gutenberg Literary Archive Foundation, has died

French ex-president Sarkozy begins jail sentence

Starcloud

The Stagnant Order. and the End of Rising Powers

Distributed Ray-Tracing

rlsw – Raylib software OpenGL renderer in less than 5k LOC

LLMs can get "brain rot"

Power over Ethernet (PoE) basics and beyond

Subprime Lender PrimaLend Enters Bankruptcy After Bond Default

Ask HN: Our AWS account got compromised after their outage

Evaluating Argon2 adoption and effectiveness in real-world software

NASA chief suggests SpaceX may be booted from moon mission

Neural audio codecs: how to get audio into LLMs

Ghostly swamp will-O'-the-wisps may be explained by science

Show HN: Modshim – A new alternative to monkey-patching in Python

The Hidden Engineering of Niagara Falls

Replacing a $3000/mo Heroku bill with a $55/mo server

Sentence Transformers is joining Hugging Face

The Gypsy Life of Robert Louis Stevenson

MinIO (apparently) becomes source-only

Internet's biggest annoyance: Cookie laws should target browsers, not websites

Die shots of as many CPUs and other interesting chips as possible

A Brain-like LLM to replace Transformers

Jaguar Land Rover hack cost UK economy an estimated $2.5B

Tesla Recalls Almost 13,000 EVs over Risk of Battery Power Loss

Evaluating the Infinity Cache in AMD Strix Halo

Go subtleties

Infracost (YC W21) Hiring First Dev Advocate to Shift FinOps Left

Knocker, a knock based access control system for your homelab

Show HN: Cadence – A Guitar Theory App

Greg Newby, CEO of Project Gutenberg Literary Archive Foundation, has died

French ex-president Sarkozy begins jail sentence

Starcloud

The Stagnant Order. and the End of Rising Powers

Distributed Ray-Tracing

rlsw – Raylib software OpenGL renderer in less than 5k LOC

LLMs can get "brain rot"

Power over Ethernet (PoE) basics and beyond

Subprime Lender PrimaLend Enters Bankruptcy After Bond Default

Ask HN: Our AWS account got compromised after their outage

Evaluating Argon2 adoption and effectiveness in real-world software

NASA chief suggests SpaceX may be booted from moon mission

Neural audio codecs: how to get audio into LLMs

Ghostly swamp will-O'-the-wisps may be explained by science

Show HN: Modshim – A new alternative to monkey-patching in Python

The Hidden Engineering of Niagara Falls

Replacing a $3000/mo Heroku bill with a $55/mo server

Sentence Transformers is joining Hugging Face

The Gypsy Life of Robert Louis Stevenson

Some __nonstring__ Turbulence

Comments

Some nonstring Turbulence