Some C habits I employ for the modern day

https://www.unix.dog/~yosh/blog/c-habits-for-me.html

221•signa11•2w ago

Comments

skywalqer•2w ago

Nice post, but the flashy thing on the side is pretty distracting. I liked the tuples and maybes.

smnplk•2w ago

Not distracting at all, it feels nostalgic to me. Id rather have these flashy things than a million popups and registration forms following you around, which is basically the modern web. I hate it so much. This site is pure balsam for my soul.

Vedor•2w ago

Both nostalgic and distracting for me.

matheusmoreira•2w ago

> In the absence of proper language support, “sum types” are just structs with discipline.

With enough compiler support they could be more than that. For example, I submitted a tagged union analysis feature request to gcc and clang, and someone generalized it into a guard builtin.

https://github.com/llvm/llvm-project/issues/74205

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112840

GCC proved to be too complex for me to hack this in though. To this day I'm hoping someone better than me will implement it.

nine_k•2w ago

With proper discipline, one can even program a Turing machine directly. The problems are two: (1) Doing so is very slow and arduous, and (2) a chance of making a dangerous error is still quite high.

For instance, it appears that no amount of proper discipline, even in the best developers, allows to replace proper array support with a naked pointer to a memory area.

matheusmoreira•2w ago

The compiler's job is to program the turing machine for us. It should help as much as possible. For example, I really like using enums because compilers have extensive support for checking that all values have been handled in switch statements.

I don't like it when compilers start getting in the way though. We use C because we want to do raw things like point a structure at some memory area in order to access the data stored there. The compiler's job is to generate the expected code without screwing it up by "optimizing" it beyond recognition because of strict aliasing or some other nonsense.

convolvatron•2w ago

you can certainly wrap the array with a structure which provides either bounds information to be checked with generic runtime functions, or specific function pointers (methods) to get and set.

you can paper over _alot_ of Cs faults. ultimately its not really worth it, but its not nearly as fragile and arduous as you make it out to be

adrianN•1w ago

You can do such things until you have to interface with other code, eg the operating system.

convolvatron•1w ago

So that’s an interesting case. I’d really like to keep language neutrality, because I don’t think we’re finished evolving yet. So this is a place where we need an abi. The first things we try to do is be simple…except for a terrible mistake with select, we don’t send arrays across that interface, sadly, we send c structs sometimes and I think that’s pretty horrible, because we have to try to lay them out in a compatible way, which is pretty fragile. The other sad bit is that we need to verify the addresses before we can operate on them, and that’s hugely prone to error.

Im curious if you have a suggestion about how to fix both of those. The structure thing can clearly be a more robust serialization. Addresses? Idk

nine_k•1w ago

As a matter of course, every structure that may have a variable size should start with a length designator. Lengths 1 to 32767 take two bytes of a designator, 32768 to 2147483647 take four bytes, larger takes 8 bytes. Realistically 62 bits should suffice for any practical case, but arbitrary-size integers are well-known, and are easy to unpack and operate on.

This may slightly increase the size of some structures, but most of the time it would not, because of the alignment padding inherent to most structures anyway. But an entire class of vulnerabilities would be gone. This doesn't even need a change in the language, even though direct syntactic support would be nice. It just takes discipline when designing APIs.

apaprocki•1w ago

FWIW, Coverity (maybe others) has a checker that creates an error if it detects tagged union access without first checking the tag. It’s not as strict as enforcing which fields belong to which tag values, but it can still be useful. I’d much rather have what was proposed in the GCC bug!

uecker•1w ago

It is on my list (also as a proposal to WG14). Sorry, I am a bit too overloaded currently. (If people want to help with such improvements - with either time or money, let me know.).

canpan•2w ago

Regarding memory, I recently changed to try to not use dynamic memory, or if I need to, to do it once at startup. Often static memory on startup is sufficient.

Instead use the stack much more and have a limit on how much data the program can handle fixed on startup. It adds the need to think what happens if your system runs out of memory.

Like OP said, it's not a solution for all types of programs. But it makes for very stable software with known and easily tested error states. Also adds a bit of fun in figuring out how to do it.

vbezhenar•2w ago

In recent years I had to write some firmware code with C and that was exactly the approach I took. So far I never had need for any dynamic memory and I was surprised how far I can get without it.

thisoneisreal•2w ago

I've been looking into Ada recently and it has cool safety mechanisms to encourage this same kind of thing. It even allows you to dynamically allocate on the stack for many cases.

apaprocki•1w ago

You can allocate dynamically on the stack in C as well. Every compiler will give you some form of alloca().

adrian_b•1w ago

True, but in many environments where C is used the stacks may be configured with small sizes and without the possibility of being grown dynamically.

In such environments, it may be needed to estimate the maximum stack usage and configure big enough stacks, if possible.

Having to estimate maximum memory usage is the same constraint when allocating a static array as a work area, then using a custom allocator to provide memory when needed.

apaprocki•1w ago

Sure, the parent was commenting more about the capability existing in Ada in contrast to C. Ada variable length local variables are basically C alloca(). The interesting part in Ada is returning variable length types from functions and having them automatically managed via the “secondary stack”, which is a fixed size buffer in embedded/constrained environments. The compiler takes care of most of the dirty work for you.

We mainly use C++, not C, and we do this with polymorphic allocators. This is our main allocator for local stack:

https://bloomberg.github.io/bde-resources/doxygen/bde_api_pr...

… or this for supplying a large external static buffer:

https://bloomberg.github.io/bde-resources/doxygen/bde_api_pr...

lelanthran•1w ago

> You can allocate dynamically on the stack in C as well. Every compiler will give you some form of alloca().

And if it doesn't, VLAs are still in there until C23, IIRC.

apaprocki•1w ago

`-Wvla` Friends don’t let friends VLA :)

uecker•1w ago

alloca is certainly worse. Worst-case fixed size array on the stack are also worse. If you need variable-sized array on the stack, VLAs are the best alternative. Also many other languages such as Ada have them.

agentultra•1w ago

This is the way. Allocate all memory upfront. Create an allocator if you need to divy it up dynamically. Acquire all resources up front. Try to fit everything in stack. Much easier that way.

Only allocate on the heap if you absolutely have to.

lelanthran•1w ago

This.

As someone who spent most of their career as an embedded dev, yes, this is fine for (like parent said) some types of software.

Even for places where you'd think this is a bad idea, it's still can be a good approach, for example allocating and mapping all memory up to the limit you are designing. Honestly this is how engineering is done - you have specified limits in the design, and you work explicitly to those limits.

So "allocate everything at startup" need not be "allocate everything at program startup", it can be "allocate everything at workflow startup", where "workflow" can be a thread, a long-running input-directed sequence of functions, etc.

For example, I am starting a tiny stripped down web-server for a project, and my approach is going to be a single 4Kb[1] block for each request, allocated via a pool (which can expand on pressure up to some maximum) and returned to the pool once the response is sent.

The 4Kb includes at most 14 headers (regardless of each headers size) with the remaining data for the JSON payload. The JSON payload is limited to at most 10 fields. This makes parsing everything "allocate-less" because the array holding pointers to the keys+values of the header is `const char *headers[14]` and to the payload JSON data `const char *fields[10]`.

A request that doesn't fit in any of that will be rejected. This means that everything is simple and the allocation for each request happens once at startup (pool creation) even while parsing the input.

I'm toying with the idea of doing the same for responses too, instead of writing it out as and when the output is determined during the servicing of the request.

-------------------------

[1] I might switch to 6Kb or 8Kb if requests need more; whatever number is chosen, it's going to be a static number.

Gibbon1•1w ago

I have some firmware that runs an event loop. There is no malloc anywhere. But I do have an area which gets reset event handler after each call. Useful for passing objects up the call stack.

One other thing I tend to do anything that needs to live longer than the current call stack gets copied into a queue of some sort. I feel it's kinda doing manually what rusts borrow checker tries to enforce.

glouwbug•1w ago

Dynamic memory allocation solves the problem of dynamic business requirements.

If you know your requirements up front, static memory initialisation is the way.

For instance, indexing a typed array with an enum is no different then an unordered map of string to int, IF you have all your business requirements up front

sys_64738•2w ago

#define BEGIN {

#define END }

/* scream! */

unwind•1w ago

Uh that piece of horror was not in the post. Phew.

zedai00•1w ago

/* huh */

JamesTRexx•2w ago

Two things I thought while reading the post: Why not typedef BitInt types for stricter size and accidental promotion control when typedeffing for easier names anyway? I came across a post mentioning using regular arrays instead of strings to avoid the null terminatorand off-by-one pitfalls.

I still have a lot of conversion to do before I can try this in my hobby project, but these are interesting ideas.

jcalvinowens•2w ago

  #if CHAR_BIT != 8
   #error "CHAR_BIT != 8"
  #endif

In modern C you can use static_assert to make this a bit nicer.

  static_assert(CHAR_BIT == 8, "CHAR_BIT is not 8");

...although it would be a bit of a shame IMHO to add that reflexively in code that doesn't necessarily require it.

https://en.cppreference.com/w/c/language/_Static_assert.html

gdjjg•2w ago

Gtav

procaryote•1w ago

Even if the code might not end up requiring it, if you write it with the assumption that bytes are 8 bits, it's good to document that with a static assert so someone porting things knows there will be dragons

It's a pretty neat way to drop some corner cases from your mental load without building subtle traps

jcalvinowens•1w ago

That's pretty silly IMHO, it should be incredibly obvious to anybody who is ever in a position to port code to a machine with non-8-bit-bytes that there will be dragons there. It also requires including limit.h which you might not otherwise need.

It's just not a realistic edge case, the machines like this are either antiquated or are tiny microcontrollers that can't practically run a POSIX OS. Very little code in the real world is generic enough to be useful in that environment (a good example might be a fixed point signal processing library).

There is no assertion in the entire Linux kernel that CHAR_BIT is eight, despite that assumption being hardcoded in many places.

WalterBright•2w ago

> I’ve long been employing the length+data string struct. If there was one thing I could go back and time to change about the C language, it would be removal of the null-terminated string.

It's not necessary to go back in time. I proposed a way to do it in modern C - no existing code would break:

https://www.digitalmars.com/articles/C-biggest-mistake.html

It's simple, and easy to implement.

publicdebates•2w ago

> the fatal error was not combining the array dimension with the array pointer; all it needs is a little new syntax a[...]; this won’t fix any existing code. Over time, the syntax a[] can be deprecated by convention and by compilers.

You're thinking in decades. C standard committee is slower than that. This could have worked in practice, but probably never will happen in practice. Maybe people should start considering a language like D[1] as an alternative, which seems to have the spirit of both C and Go, but with much more pragmatism than either.

[1] https://en.wikipedia.org/wiki/D_(programming_language)#Criti...

billforsternz•2w ago

There is some irony in someone replying to the author of the D language suggesting that maybe the D language is the real solution he's looking for.

I_am_uncreative•1w ago

A tale as old as time.

publicdebates•1w ago

It might be the language he is looking for, but it might not, and more likely than not is not. D is one of those odd languages which most likely ought to have gotten a lot more popular than it did, but for one reason or another, never quite caught on. Perhaps one reason is because it lacks a sense of eccentricity and novelty that other languages in its weight class have. Or perhaps it's just too unfamiliar in all the wrong ways. Whatever the case may be, popularity is in fact one of the most useful metrics when ruling out a potential language for a new project. And if D does not meet GP's requirements in terms of longevity or commercial support, I would certainly not suggest GP adopt it too eagerly, simply because it happens to check off most or all their technological requirements.

RickHull•1w ago

I think that D meets Walter Bright's requirements.

BeetleB•1w ago

I would hope so. He invented the damn language.

WalterBright•1w ago

There's always room for improvement!

WalterBright•1w ago

D is an elegant re-imagine of C and C++. For a trivial example,

    typedef struct S { int a; } S;

becomes simply:

    struct S { int a; }

and unlike C:

    extern int foo();
    int bar() { return foo(); }
    int foo() { return 6; }

you have:

    int bar() { return foo(); }
    int foo() { return 6; }

For more complex things:

    #include <foo.h>

becomes:

    import foo;

hardlianotion•1w ago

Everything except the import looks like standard c++ since at least 98.

WalterBright•1w ago

C++ does not allow forward references outside of structs. The point-of-instantiation and point-of-declaration rules for templates produces all kinds of subtle problems. D does not have that issue.

Yes, you absolutely can get the job done with C and C++. But neither is an elegant language, and that puts a cognitive drag on writing and understanding code.

publicdebates•1w ago

Smoe of these are definitely nice-to-haves*, but when you're evaluating a C++ alternative, there are higher priority features to research first.

How are the build times? What does its package system(s) look like, and how populated are they? What are all its memory management options? How does it do error handling and what does that look like in real world code? Does it have any memory safety features, and what are their devtime/comptime/runtime costs? Does it let me participate in compile time optimizations or computations?

Don't get me wrong, we're on the same page about wanting to find a language that fills the C++ niche, even if it will never be as ideal as C++ in some areas (since C++ is significantly worse in other areas, so it's a fair trade off). But just like dating, I'm imagining the fights I'll have with the compiler 3 months into a full time project, not the benefits I'll get in the first 3 days.

* (a) I've been using structs without typedef without issue lately, which has its own benefits such as clarifying whether the type is simple or aggregate in param lists, while auto removes the noise in function bodies. (b) Not needing forward declarations is convenient, but afaik it can't not increase compile times at least somewhat. (c) I like the consistency here, but that's merely a principle; I don't see any practical benefit.

WalterBright•1w ago

Build times are quite a bit faster.

The package system is called dub.

Memory management options include:

1. stack allocation

2. malloc allocation

3. write your own allocator

4. static allocation

5. garbage collection

You can use exceptions or returns for error handling.

The biggest memory safety feature it has is length-delimited arrays. No more array overflows! The cost of it is the same as in std::vector when you do the bounds checked option. D also uses refs, relegating pointers to unusual uses. I don't know what you mean by "participating in optimizations".

(a) C doesn't have the hack that C++ has regarding the tag names. D has auto.

(b) D has much faster compile times than C++.

1718627440•1w ago

Your first example doesn't make sense, because

    struct S { int a; };

is also fine and idiomatic in C. It is rather

    typedef struct S { int a; } S;

that doesn't make sense, because why would you make something opaque and expose it immediately again in the same line?

The others are ... different. I can't tell whether they are really better. The second maybe, although I like it that the compiler forces me to forward type stuff, it makes the code much more readable. But then again I don't really get the benefit of

    import foo;

    #include <foo>

include vs import is no difference. # vs nothing makes it clear that it is a separate feature instead of just a language keyword. < vs " make it clear whether you use your own stuff or stuff from the system. What do you do when your file contains spaces? Does import foo bar; work for including a file a single file, named "foo bar"?

WalterBright•1w ago

> is also fine and idiomatic in C

It's inelegant because without the typedef, you need to prefix it always with `struct`. This is inelegant because all other types do not need a prefix. It also makes it clumsier to refactor the code (adding or subtracting the leading `struct`). The typedef workaround is extremely commonplace.

> I like it that the compiler forces me to forward type stuff, it makes the code much more readable

That means when opening a file, you see the first part of the file first. In C, then you see a list of forward references. This isn't what you want to see - you want to see first the public interface, not the implementation details. (This is called "above the fold", coming from what you see in a folded stack of newspapers for sale. The headlines are not hidden below the fold or in the back pages.) In C, the effect of the forward reference problem is that people tend to organize the code backwards, with the private leaf functions first and the public functions last.

> include vs import is no difference

Oh, there is a looong list of kludgy problems stemming from a separate macro processor that is a completely distinct language from C. Even the expressions in a macro follow different rules than in C. If you've ever used a language with modules, you'll never want to go back to #include!

> What do you do when your file contains spaces?

A very good question! The module names must match the filename, and so D filenames must conform to D's idea of what an identifier is. It sounds like a limitation, but in practice, why would one want a module name different from its filename? I can't recall anyone having a problem with it. BTW, you can write:

    import core.stdc.stdio;

and it will look up `core/stdc/stdio.d` (Linux, etc.) or `core\stdc\stdio.d` on Windows.

1718627440•1w ago

> It's inelegant

We obviously disagree with the coding organization we prefer, so I find that rather elegant, but this doesn't sound like a substantial discussion. You as the language author are obviously quite content with the choices D made.

> This is inelegant because all other types do not need a prefix.

I don't find that. It makes it rather possible to clearly distinguish between transparent and opaque types. That these are a separate namespace makes it also possible to use the same identifier for the type and object, which is not always a good choice, but sometimes when there really is no point in inventing pointless names for one of the two, it really is. (So I can write struct message message; .) It also makes it really easy to create ad-hoc types, which honestly is my killer feature that convinced me to switch to C. I think this is the most elegant way to make creating new types for single use, short of getting rid of explicit types altogether.

> It also makes it clumsier to refactor the code (adding or subtracting the leading `struct`).

I never had that problem, and don't know when it occurs and why.

> The typedef workaround is extremely commonplace.

In my opinion that is not a workaround, but a feature. I also use typedefs when I want to declare an opaque type. This means that in the header file all function declarations refer to the opaque type, and in the implementation the type is only used with "struct". This also makes it obvious which types internals you are supposed to touch and which not. (This is also what e.g. the Linux style guide recommends.)

> This isn't what you want to see - you want to see first the public interface, not the implementation details.

Maybe you, but I don't. As in C public interface and implementation are split into different files, this problem doesn't occur. When I want to see the interface, I'm going to read the interface definition. When I look into the implementation file, I definitely don't expect to read the interface. What I rather see is first the dependencies (includes) and then the internal types. This fits "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious." . Then I typically see default values and configuration. Afterwards yes, I see the lowest methods.

> people tend to organize the code backwards, with the private leaf functions first and the public functions last.

Which results in a consistent organization. It also fits how you would write in in math or an academic context, that you only use what is already defined. It makes the file readable from top to bottom. When you are just looking for a specific thing, instead of trying to read it in full, you are searching and jumping around anyway.

> Oh, there is a looong list of kludgy problems stemming from a separate macro processor that is a completely distinct language from C. Even the expressions in a macro follow different rules than in C. If you've ever used a language with modules, you'll never want to go back to #include!

A macro language is surprising for the newcomer, but you get used to it, and I don't think there is a problem with include. Textual inclusion is kind of the easiest mental modal you can have and is easy to control and verify. Coming from a language with modules, before learning C, I never found that to be an issue, and rather find the emphasis on the bare filesystem rather refreshing.

> but in practice, why would one want a module name different from its filename?

True, I actually never wanted to include a file with spaces, but it is something where your concept breaks. Also you can write #include "foo/bar/../baz" just fine, and can even use absolute paths, if you feel like it.

publicdebates•1w ago

> macro language is surprising for the newcomer, but you get used to it

This was one of the biggest paradigm shifts for me in mastering C. Once I learned to stop treating the preprocessor as a hacky afterthought, and realized that it's actually a first-class citizen in C and has been since its conception, I realized how beautiful and useful it really is when used the way the designers intended. You can do anything with it, literally anything, from reflection to JSON and YAML de/serialization to ad hoc generics. It's so harmonious, if unsightly, like the fat lady with far too much makeup singing the final opus.

1718627440•1w ago

> You can do anything with it, literally anything, from reflection to JSON and YAML de/serialization to ad hoc generics.

Wow. Do you have any pointers? I always thought random computation with it is hard, because it doesn't really wants to do recursion by design. Or are you talking about using another program as the preprocessor?

WalterBright•1w ago

D accomplishes this by using Compile Time Function Execution to build D source code from strings, and then inline compiling the D code. Learning a macro language is unnecessary, as it's just more D code.

This kind of thing is very popular among D users.

https://dlang.org/spec/statement.html#mixin-statement

https://dlang.org/spec/expression.html#mixin_expressions

WalterBright•1w ago

Can you justify C rejecting the following code:

    int *p, *q;
    auto x = p * q;

?

1718627440•1w ago

Yes, because I also don't know what this is supposed to mean? The product of two addresses? Dereferencing one pointer, and then combining them without an operator? And what's the type going to be, "pointer squared"?

Also what has this to do with the current discussion?

WalterBright•1w ago

> Also what has this to do with the current discussion?

The point is C does not allow doing anything you want. The C type system, for example, places all kinds of restrictions on what code can be written. The underlying CPU does not have a type system - it will multiply two pointers just fine without complaint. The CPU does not even have a concept of a pointer. (The C preprocessor doesn't have a notion of types, either.)

The point of a type system is to make the code more readable and reduce user errors.

We have a difference of opinion on C. Mine is that C should have better rules to make code more readable and reduce user errors. Instead it remains stuck in a design from the 1970s, and has compromised semantics that result from the severe memory constraints of those days. You've defended a number of these shortcomings as being advantages.

Just for fun, I'll throw out another one. The C cast syntax is ambiguous:

    (T)(3)

Is that a function call or a cast of 3 to type T? The only way to disambiguate is to keep a symbol table of typedef's so one can determine if T is a type or not a type. This adds significant complexity to the parser, and is completely unnecessary.

The fix D has for this is:

    cast(T)(3)

where `cast` is a keyword. This has another advantage in that casts are a blunt tool and are associated with hiding buggy code. Having `cast` be easily searchable makes for better code reviews.

1718627440•1w ago

> The point is C does not allow doing anything you want.

I thought we were discussing specific issues, I did not claim, that C doesn't have things that could be different. For example the interaction of integer promotion and fixed size types is completely broken (as in you can't write correct portable code) in my opinion.

> The C type system, for example, places all kinds of restrictions on what code can be written. The underlying CPU does not have a type system - it will multiply two pointers just fine without complaint. The CPU does not even have a concept of a pointer.

As you wrote a pointer is not an address. The CPU lets you multiply addresses, but C also let's you multiply addresses just fine. The type for that is uintptr_t. Pointers are not addresses, e.g. ptr++ does not in general increment the address by one.

> The C preprocessor doesn't have a notion of types, either.

It doesn't even have a concept of symbols and identifiers, which makes it possible for you to construct these.

> You've defended a number of these shortcomings as being advantages.

Because I think they are. It's not necessarily the reason why they are there, but they can be repurposed for useful stuff and often are. Also resource constraints often result in a better product.

I still only declare variables at the begin of a new block, not because I wouldn't write C99+, I do, but because it makes the code easier to read when you can reason about the participating variables up front. I can still introduce a variable when I feel like, just by starting a new block. This enables me to also decide when the variables go out of scope again, so my variables only exist for the time, I really want them to, even if that is only for 3 lines.

> Just for fun, I'll throw out another one.

That's just a minor problem in compiler implementation, and doesn't result in problems for the user. Using the same symbol for pointer dereference and multiplication is also similar.

    (a) *b

Is that a cast or a multiplication? These make for funny language quizzes, but are of rare practical relevance. Real world compilers don't completely split syntactic and semantic parsing anyway, so they can emit better diagnostics and keep parsing upon an error.

> You've defended a number of these shortcomings as being advantages.

My initial comment was about a shortcoming, which doesn't actually exist.

pests•1w ago

I'm sorry, is this an in-joke or satire or something? I can't tell really. Maybe a woosh moment, and as others have said, the GP/person you are speaking about, Walter Bright, is the creator of the D language. Maybe you didn't read your parent's post? Not saying its intentional, but it almost seems rude to keep speaking in that way about someone present in the conversation.

girvo•1w ago

GP literally invented the D language.

WalterBright•1w ago

> maybe the D language is the real solution he's looking for

Yes, I realized that after not finding any 'droids.

WalterBright•1w ago

The C committee is not afraid to add new syntax. And this is an easy addition.

Not only does it deliver a massive safety improvement, it dramatically speeds up strlen, strcmp, strcpy, strcat, etc. And you can pick out a substring without needing to allocate/copy. It's easy money.

pjmlp•1w ago

The C standard committee even refused Dennis Ritchie proposal for fat pointers.

https://www.nokia.com/bell-labs/about/dennis-m-ritchie/varar...

Meanwhile after UNIX was done at AT&T, the C language authors hardly cared for the C standard committee in regards to the C compiler supported features used in Plan 9 and Inferno, being only "mostly" compatible, followed up having a authoring role in Alef, Limbo and Go.

> The language accepted by the compilers is the core ANSI C language with some modest extensions, a greatly simplified preprocessor, a smaller library that includes system calls and related facilities, and a completely different structure for include files.

https://doc.cat-v.org/plan_9/4th_edition/papers/comp

I doubt most C advocates ever reflect on this.

lelanthran•1w ago

> Meanwhile after UNIX was done at AT&T, the C language authors hardly cared for the C standard committee in regards to the C compiler supported features used in Plan 9 and Inferno, being only "mostly" compatible, followed up having a authoring role in Alef, Limbo and Go.

> I doubt most C advocates ever reflect on this.

What would be the conclusion of this reflection? Assuming you have reflected on this, what was your conclusion?

pjmlp•1w ago

That the language authors concluded C was done, there was no point collaborating with WG14, and there were better tools to do their operating systems research on.

AlexeyBrin•1w ago

> there were better tools to do their operating systems research on.

I think that's the key, Ritchie, Thompson, Pike were interested in OS research while people that love C today just want a simple and powerful language with manual memory management. It is not the first time in history when the creation has a separate life from the creator's wishes.

JamesTRexx•1w ago

As I see it, the problem with languages trying to replace C is that they not only try to fix fundamental flaws, but feel compelled to add unneeded features and break C's simplicity.

WalterBright•1w ago

C is a simple language, but that simplicity leads to non-portable code and lots of klunky, ugly things like using the preprocessor as a substitute for conditional compilation, imports, lambdas, metaprogramming, etc.

You don't have to use unneeded features that are in D. The core language is as simple as C, to the point where it is easy to translate C to D (in fact, the compiler will do it for you!).

JamesTRexx•1w ago

"lots of klunky, ugly things" I wonder how much of that is caused by too complex thinking. It seems that simplicity is very difficult for most people.

"You don't have to use unneeded features" True, but that doesn't work in practice, for example the intentions to use only limited C++ features in new projects, that end up bogged down with the other features anyway because of "new toy to play with" effect. What isn't there can't be used and keeps the language lean and clean (and not mean ;-) ).

WalterBright•1w ago

I agree that often programmers are tempted to use features "just because they are there".

We've introduced the notion of "editions" in D lately, and its purpose is to remove features that have not proved their value over time.

1718627440•1w ago

Also they leave a important point of C behind: backward-compatibility.

WalterBright•1w ago

D supports mixed C and D files in a project. The D code can call C functions and use C types, and the C code can call D functions using C types.

The D compiler will even translate C source code to D source code if you prefer! After using a mixed D/C program for a while, I bet you'll feel motivated to take the extra step and translate the C stuff to D, as the inelegance in C will become obvious :-)

cogwheel•1w ago

https://web.archive.org/web/20260116161616/https://www.digit... for anyone here while we're swamping Walter's site

WalterBright•1w ago

The site is built out of static pages, so it takes a lot to swamp it!

userbinator•1w ago

Even simpler, you can do something like this to have length-delimited AND null-terminated strings (written from memory, no guarantees of correctness etc.):

    char *lenstrdup(char *s) {
       int n = strlen(s);
       char *p = malloc(n + sizeof(int) + 1);
       if(p) {
          strcpy(p + sizeof(int), s);
          *(int*)p = n;
          p += sizeof(int);
       }
       return p;
    }

    void lenstrfree(char *s) {
        free(s-sizeof(int));
    }

jkercher•1w ago

One of the advantages to the pointer + length approach is free substrings. This inline approach doesn't allow that.

WalterBright•1w ago

The ability to slice substrings results in a massive speed increase for string handling.

BigJono•2w ago

I really dislike parsing not validating as general advice. IMO this is the true differentiator of type systems that most people should be familiar with instead of "dynamic vs static" or "strong vs weak".

Adding complexity to your type system and to the representation of types within your code has a cost in terms of mental overhead. It's become trendy to have this mental model where the cost of "type safety" is paid in keystrokes but pays for itself in reducing mental overhead for the developers. But in reality you're trading one kind of mental overhead for another, the cost you pay to implement it is extra.

It's like "what are all the ways I could use this wrong" vs "what are all the possibilities that exist". There's no difference in mental overhead between between having one tool you can use in 500 ways or 500 tools you can use in 1 way, either way you need to know 500 things, so the difference lies elsewhere. The effort and keystrokes that you use to add type safety can only ever increase the complexity of your project.

If you're going to pay for it, that complexity has to be worth it. Every single project should be making a conscious decision about this on day one. For the cost to be worth it, the rate of iteration has to be low enough and the cost of runtime bugs has to be high enough. Paying the cost is a no brainer on a banking system, spacecraft or low level library depended on by a million developers.

Where I think we've lost the plot is that NOT paying the cost should be a no brainer for stuff like front end web development and video games where there's basically zero cost in small bugs. Typescript is a huge fuck up on the front end, and C++ is a 30 year fuck up in the games industry. Javascript and C have problems and aren't the right languages for those respective jobs, but we completely missed the point of why they got popular and didn't learn anything from it, and we haven't created the right languages yet for either of those two fields.

Same concept and cost/benefit analysis applies to all forms of testing, and formal verification too.

lelanthran•1w ago

While I broadly agree with your general point, in that engineering is making a set of trade-offs, I don't necessarily agree that ditching type-safety in the example contexts you posted is the appropriate trade-off.[1]

I'll ditch type-safety in experimental/exploratory code; I'll use Lisp (or, more recently, Python) to test if something is a good idea. For anything that ships to production, I think a basic level of type enforcement is necessary, even if you don't want the whole type zoo.

For your Javascript f/end context, I like the proposed TC39 approach (https://github.com/tc39/proposal-type-annotations?tab=readme...). The typing is optional, does not break existing syntax and can still be used to enforce a basic level of type safety if the developer wants it.

----------------------------

[1] I upvoted you anyway. Your broader point is still valid.

BigJono•1w ago

I'm not talking about ditching type safety. I'm saying the whole concept of "safe" and "unsafe" as most people on HN understand it is flawed. The interesting part of a type system isn't whether the compiler checks types or if we just go lmao fuck it let's not even bother, it's whether or not you need to represent the types in your code in order for the compiler to check them. For the majority of what people want from type safety in a language like Javascript, the answer is that no, you don't need to, as long as you're willing to not have every single language feature under the sun.

With compiled languages you can statically infer a ton of type information without having to pepper your codebase with repeated references to what something is. Nominal typing essentially boils down to a double-check of your work, you specify the type separately and then purposely assign it to a variable, so that if you make a mistake with either part the compiler picks it up.

But those kinds of double-checks can be done for almost anything (outside of dynamic boundaries like io/dlls) without nominal type signatures in the code, as long as you jettison the ability to change types at runtime. No language as far as I can tell actually does this because we're all so obsessed with the false dichotomy of nominal and dynamic typing.

In JS everyone likes to use string unions in place of enums so let's use that as an example. If you have something that is only ever set as "foo" or "bar", that's effectively a boolean. If you receive that string in another function, make a typo and write if (str == "boo"), then in every single language I'm aware of that passes a compiler check. But it shouldn't, because the compiler has all the information it needs to statically catch that error and fail the build. The set of assignments to that variable and the set of equality checks on it provide the two parts of the double-check.

In a perfect world we'd have 10 of these "middle of the road" strongly typed static languages to choose from that all optimise for minimal type representation in their own unique way. But every time I see one of these projects pop up on HN it gets like 10 comments then disappears into the sunset because the programming community is so enraptured with the nominal type system of C and all the fucking bullshit Bjarne Stroustrup pasted on top of it 40 years ago. So we end up with this silly situation where the only things considered "safe" by the crowd are strict descendants of C/C++ with the array/pointer/string screw-ups that made those languages unsafe removed.

tom_•2w ago

If you really insist on not having a distinction between "u8"/"i8" and "unsigned char"/"signed char", and you've gone to the trouble of refusing to accept CHAR_BIT!=8, I'm pretty sure it'd be safer to typedef unsigned char u8 and typedef signed char i8. uint8_t/int8_t are not necessarily character types (see 6.2.5.20 and 7.22.1.1) and there are ramifications (see, e.g., 6.2.6.1, 6.3.2.3, 6.5.1).

anonnon•1w ago

> and you've gone to the trouble of refusing to accept CHAR_BIT!=8

This one was a head-scratcher for me. Yeah, there's no cost to check for it, but architectures where CHAR_BIT != 8 are rarer even than 24-bit architectures.

apaprocki•1w ago

I got the impression the author was implying because CHAR_BIT is enforced to be 8 that uint8_t and char are therefore equivalent, but they are different types with very different rules.

E.g. `char p = (char )&astruct` may violate strict aliasing but `uint8_t p = (uint8_t )&astruct` is guaranteed legal. Then modulo, traps, padding, overflow, promotion, etc.

sgsjchs•1w ago

It's the other way around.

publicdebates•1w ago

Could you clarify an example of the ramifications?

I tried looking through the C2Y standard draft to figure it out, but it's too complicated for me.

tom_•1w ago

With the disclaimer that I let my language lawyer qualification lapse a while ago, it's broadly to do with the character types being the only approved way to examine the bytes of an object. An object of a type can be accessed only as if it were an object of that type or some compatible type, but: it can also be accessed as a sequence of characters. (You'd do this if implementing memcpy, memset or memcmp, for example.)

6.2.6.1 - only character types can be used to inspect the sequence of bytes making up an objuect, and (interestingly) only an array of unsigned char is suitable for memcpy'ing an object into for inspection. It's possible for sequences of bytes to exist that don't represent a valid value of the original object; it's undefined behaviour to read those sequences of bytes other than via a character type (i.e., I think, via a pointer to something compatible with the object's actual type - there being no other valid ways to even attempt to read it)

6.3.2.3 - when casting a pointer to an object type to a pointer to a character type, the new character pointer points to the bytes of the object. If converting between object types, on the other hand, the original pointer will (with care) round trip, and that seems to be all you can do, and actual access is not permitted

6.5.1 - as well as all the expected ways of accessing an object, objects can be accessed via a character pointer

keyle•2w ago

That made me smile

     If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.

Now that is some C habit for the modern day... But huh, not C.

pjmlp•1w ago

I started doing that in 1993 on MS-DOS already, thanks to C++ RAII, C felt outdated already on those days.

procaryote•1w ago

Arguably, 1993's C has survived better than 1993's C++.

pjmlp•1w ago

Well in 33 years it has learnt nothing about memory safe programming, at least C++ provides the tooling for those that care, before even goverments decided to act upon it.

krapp•1w ago

My go to language for that is lua. I'm starting to think of it as a C framework more so than its own language.

amiga386•2w ago

Fun fact: the background image is the "BallsMany" pattern included with MagicWB for the Amiga

(To confirm: download the LhA archive from https://aminet.net/package/util/wb/MagicWB21p then open the archive in 7-zip, extract Patterns/BallsMany then load into an ILBM viewer, e.g. https://www.retroreversing.com/ilbm )

themafia•1w ago

> and I end up having all these typedefs in my projects

I avoid doing this now. It's more trouble than it's worth and it changes your code from a standard dialect of C into a custom one. Plus my eyes are old and they don't enjoy separating short identifiers.

> typedef struct { ... } String

I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling. C23 finally gave us "auto", you shouldn't fret over typedefing everything anymore. I also prefer a "strbuf" type with an index and capacity so I can safely read and write to it with a derived "strview" having pointer and length only which references into the buffer.

> returning results

The general method of returning structures larger than two machine words is fairly inefficient. Plus you're cutting yourself off from another C23 gem which was [[nodiscard]]. If you want the 'ok' value checked then you can _really_ specify that. Put everything else behind a pointer passed in an argument. The sum type logic works just as well there.

> I tend to avoid the string.h functions most of the time, only employing the mem family when I want to, well, mess with memory.

So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.

apaprocki•1w ago

> > typedef struct { ... } String

> I avoid doing this. Just use `struct string { ... };'. It makes it clear what you're handling.

Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.

wavemode•1w ago

The main thing I dislike about typedefs is that you can't forward declare them.

If I know for sure I'm never going to need to do that then OK.

procaryote•1w ago

How do you mean? You can at least do things like

typedef struct foo foo;

and somewhere else

struct foo { … }

flohofwoe•1w ago

The usual solution for this is:

    typedef struct bla_s { ... } bla_t;

Now you have a struct named 'bla_s' and a type alias 'bla_t'. For the forward declaration you'd use 'bla_s'.

Using the same name also works just fine, since structs and type aliases live in different namespaces:

    typedef struct bla_t { ... } bla_t;

...also before that topic comes up again: the _t postfix is not reserved in the C standard :)

apaprocki•1w ago

Yes, using the same Gtk example, the way you’d forward declare GtkLabel without including gtklabel.h in your header would be:

    struct _GtkLabel;
    typedef struct _GtkLabel GtkLabel;
    // Use GtkLabel* in declarations

1718627440•1w ago

Why are you complicating things? Struct and Unions are different namespaces for a reason.

    typedef struct GtkLabel GtkLabel;

works just fine.

apaprocki•1w ago

I’m simply stating how actual Gtk is written:

https://gitlab.gnome.org/GNOME/gtk/-/blob/main/gtk/gtklabel....

1718627440•1w ago

True, thanks then. As far as I see it they don't even use the struct in the implementation, so I guess it makes some sense.

warmwaffles•1w ago

People getting hung up on `_t` usage being reserved for posix need to lighten up. I doubt they'll clash with my definitions and if does happen in the future, I'll change the typedef name.

lelanthran•1w ago

> Well then imagine if Gtk made you write `struct GtkLabel`, etc. and you saw hundreds of `struct` on the screen taking up space in heavy UI code. Sometimes abstractions are worthwhile.

TBH, in that case the GtkLabel (and, indeed, the entire widget hierarchy) should be opaque pointers anyway.

If you're not using a struct as an abstraction, then don't typedef it. If you are, then hide the damn fields.

f1shy•1w ago

Thank you! Because I wanted to point exactly that. When I was very junior programmer, and coded alone, I used to have “that elemental header” where lots of things were inside. Many of them to convert C in what I wished it was.

Now I think is between no good idea, and absolutely awful.

Yes, sometimes you wish some thing were different in a programming language “if only these types had shorter names”. But when you work in a team, first you should have consensus, and then modifying the language becomes a heavy load, that every new person in the project will have to lift.

“Modifying C is porting the Lisp curse to C” is my motto. Use all as standard, vanilla as possible.

lelanthran•1w ago

> So you use strlen() a lot and don't have to deal with multibyte characters anywhere in your code. It's not much of a strategy.

You don't need to support all multibyte encodings (i.e. DBCS, UCS-2, UCS-4, UTF-16 or UTF-32) characters if you're able to normalise all input to UTF-8.

I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.

lionkor•1w ago

Am I missing something here? UTF8 has multibyte characters, they're just spread across multiple bytes.

When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.

Same with indices. If you Index at [1] in a string with a flag emoji, you don't get a valid UTF8 code point, but instead some part of the flag emoji. This applies with any UTF8 code points larger than 1 byte, which there are a lot of.

UTF16 or UTF32 are just different encodings.

What am I missing?

That's why UTF8 libraries exist.

lelanthran•1w ago

> When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.

Yes, and?

> What am I missing?

A use-case? Where, in your C code, is it reasonable to get the number of multibyte characters instead of the number of bytes in the string?

What are you going to use "number of unicode codepoints" for?

Any usage that amounts to "I need the number of unicode codepoints in this string" is coupled to handling the display of glyphs within your program, in which case you'd be using a library for that anyway because graphics is not part of C (or C++) anyway.

If you're simply printing it out, storing it, comparing it, searching it, etc, how would having the number of unicode codepoints help? What would it get used for?

tialaramex•1w ago

Indeed. If you have output considerations then the number of Unicode codepoints isn't what you wanted anyway, you care about how many output glyphs there will be, that codepoint might result in zero glyphs, it might modify an adjacent glyph, or it might be best rendered as multiple glyphs.

If you're doing some sort of searching you want a normalization and probably pre-processing step, but again you won't care about trying to count Unicode code points.

lionkor•1w ago

For example splitting, cutting and inserting strings into each other

lelanthran•1w ago

> For example splitting, cutting and inserting strings into each other

That's not going to work without a glyph-aware library anyway; even if you are working with actual codepoint arrays, you can't simply insert a codepoint into that array and have a correct unicode string as the result.

Same for splitting.

flohofwoe•1w ago

That works just fine on UTF-8 encoded strings with C stdlib functions if your delimiters are 7-bit ASCII characters (/,.:; etc...).

flohofwoe•1w ago

> When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.

Exactly, and that's what you want/need anyway most of the time (most importantly when allocating space for the string or checking if it fits into a buffer).

If you want the number of "characters" (which can have two meanings: either a single UNICODE code point, or a grapheme cluster (e.g. a "visible character" that's composed from multiple UNICODE code points). For this stuff you need a proper UNICODE/grapheme-aware string processing library. But this is needed only rarely in most application types which just pass strings around or occasionally need to split/parse/tokenize by 7-bit ASCII delimiters.

GuB-42•1w ago

Turns out that I rarely need to know sizes or indices of a UTF8 string in anything other than bytes.

If I write a parser for instance, usually, what to know is "what is the sequence of byte between this sequence of bytes and that sequence of bytes". That there are flag emojis or whatever in there don't matter, and the way UTF8 works ensures that a character representation doesn't partially overlap with a another.

What the byte sequences mean only really matters if you are writing an editor, so that you know how many bytes to remove when you press backspace for instance.

Truncation as to prevent buffer overflow seems to be a case where it would matter but not really. An overflow is an error and should be treated as such. Truncation is a safety mechanism, for when having your string truncated is a lesser evil. At that point, having half a flag emoji doesn't really matter.

raincole•1w ago

> I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.

It makes no sense. If you only need the byte count then you can use strlen no matter what the encoding is. If you need any other kind of counting then you don't use strlen no matter what the encoding is (except in ASCII only environment).

"Whether I should use strlen or not" is a completely independent question to "whether my input is all UTF-8."

lelanthran•1w ago

> If you only need the byte count then even you can use strlen no matter what the encoding is.

No, strlen won't give you the byte count on UTF16 encodings.

> If you need character count then you don't use strlen no matter what the encoding is (except in ASCII only environment).

What use-case requires the character count without also requiring a unicode glyph library?

raincole•1w ago

> strlen won't give you the byte count on UTF16 encodings.

You're right. I stand corrected.

zzo38computer•1w ago

I do not agree that restricting it to UTF-8 (or to Unicode in general) is a fair and reasonable design decision (although UTF-8 may be reasonable if Unicode is somehow required anyways (you should avoid requiring Unicode if you can though), especially the program is also expected to deal with ASCII in addition to requiring Unicode), but regardless of that, the number of code points is not usually relevant (and substring operations indexed by code points is not usually necessary either), and the number of bytes will be more important, and some programs should not need to know about the character encoding at all (or only have a limited consideration of what they do with them).

(One reason you might care about the number of code points is because you are converting UTF-8 to UTF-32 (or Shift-JIS to TRON-32 or whatever else) and you want to allocate the memory ahead of time. The number of characters (which is not the same as the number of code points in the case of Unicode, although for other character sets it might be) is probably not important; if you want to display it, you will care about the display width according to the font, and if you are doing editing then where one character starts and ends is going to be more significant than how many characters they are. If you are using and indexing by the number of code points a lot (even though as I say that should not usually be necessary), then you might use UTF-32 instead of UTF-8.)

(It is also my opinion that Unicode is not a good character set.)

JKCalhoun•1w ago

I was going to comment the same thing.

I had a coworker who had a very complicated set of "includes" that their code relied upon—not unlike the typedefs in the post. So his code was difficult to move around without also moving all his headers with it.

I try to minimize dependencies (custom headers, custom macros, etc.).

doanbactam•1w ago

Solid list. The bit about avoiding the preprocessor as much as possible really resonates—using `static inline` functions and `enum` instead of macros makes debugging so much less painful. What's your take on using C11's `_Generic` for type-generic macros? It adds some verbosity but can save you from a lot of runtime type errors.

0xbadcafebee•1w ago

> I think one of the most eye-opening blog posts I read when getting into programming initially was the evergreen parse, don’t validate post

Bro, that was written in 2019. If it's not old enough to drink it's not yet evergreen. But it's also long-winded. A 25-minute read, and y'know what the conclusion is? "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".

I need all the baby programmers in the back to hear me: type systems are bikeshedding. The point of a type is only to restrict computation to a fixed set. This concept can be applied anywhere you need to ensure reliability and simplicity. You don't need a programming language to natively support types in order to implement the concept yourself in that language.

lelanthran•1w ago

> You don't need a programming language to natively support types in order to implement the concept yourself in that language.

In a programming language that doesn't enforce types, how do you implement

> "Parsing leaves you with a new data structure matching a type, validation checks if some data technically complies with a type (but might not later be parsed correctly)".

apaprocki•1w ago

Please don’t buy into “no const”. If you’ve ever worked with a lot of C/C++ code, you really appreciate proper const usage and it’s very obvious if a prototype is written incorrectly because now any callers will have errors. No serious reusable library would expose functions taking char* without proper const usage. You would never be able to pass a C++ string c_str() to such a C function without a const_cast if that were the case. Casting away const is and should be an immediate smell.

anonnon•1w ago

Where is the author advocating not using const or casting it away?

apaprocki•1w ago

“modified 2026-01-17T23:20:00Z”

Seems it was cast away

Panzerschrek•1w ago

Yet another C person reinventing things which C++ already has.

lelanthran•1w ago

> Yet another C person reinventing things which C++ already has.

And yet another C++ person salty that people prefer simpler things.

pjmlp•1w ago

C23 + <compiler C extensions> is hardly simpler as people advocate.

oguz-ismail2•1w ago

I can't think of a language that isn't simpler compared to C++

pjmlp•1w ago

Might be, then again C23 isn't K&R C that many still learn from.

lelanthran•1w ago

> Might be, then again C23 isn't K&R C that many still learn from.

I agree with this, but then again, not many people are learning C now anyway. It will die away from natural attrition anyway, is my point.

The K&R C does have a few advantages, because the compilers at the time were not so aggressive in optimisation, and will consistently emit code that (for example) performed a NULL dereference (or other UB), ensuring things like consistently crashing instead of silently losing data/doing the wrong thing.

lelanthran•1w ago

> C23 + <compiler C extensions> is hardly simpler as people advocate.

Well, certainly simpler than C++, at any rate.

I mean, just knowing the assignment rules in C++ is worthy of an entire book on its own. Understandably, the single rule of "assignment is a bitwise copy of the source variable into the destination variable" is inflexible, but at least the person reading the local code can, just from the current scope, determine whether some assignment is a bug or not!

In many ways, C++ requires global context when reading any local scope: will the correct destructor get called? Can this variable be used as an argument to a function (a lack of a copy constructor results in the bitwise copy for on stack, with the destructor for that instance running twice - once in the stack and again when the scope ends)? Is this being passed by reference (i.e. it might be modified by the function we are calling) or by value (i.e. we don't need to worry about whether `bar` has been changed after a call to `foo(bar)`).

Many programmers don't like holding lots of global scope in their head when working in some local scope. In C, all those examples above are clear in the local scope.

All programmers who prefer C over C++ have already tried C++ in large and non-trivial projects before walking away. I doubt that the reverse is true.

pjmlp•1w ago

Where do you think the first generations from C++ programmers come from?

There is this urban myth C is simple, from folks that never read either ISO C manual, can't read legalese, never spent much time browsing the compiler reference manual.

Mostly learnt K&R C, assume the world is simple, until the code gets ported into another platform or compiler.

Yet in such a simple language, I keep waiting to meet the magical developer that never wrote memory corruption errors with pointer arithmetic, string and memory library functions.

lelanthran•1w ago

> There is this urban myth C is simple, from folks that never read either ISO C manual, can't read legalese, never spent much time browsing the compiler reference manual.

And yet you know from previous discussion with folks like Uecker and myself have done all those things, and still walked away from C++.

In my case, I stepped back even after having a decade of work experience in it. Anything needing more abstraction than C, C++ is not going to be a good fit anyway (there's better languages).

> Yet in such a simple language, I keep waiting to meet the magical developer that never wrote memory corruption errors with pointer arithmetic, string and memory library functions.

Who made that claim? This sounds like a strawman - "If you use C you'll never make this class of errors", which no one said in this conversation.

In any case, the point is even more true of C++ - I have yet to meet this magical C++ programmer that never hits the few dozens of footguns it has that C doesn't.

pjmlp•1w ago

People that contribute to WG14 are naturally biased against C++, especially with gimmicks like _Generic.

Internet is full of people asserting CVEs in C are only caused by not skilled enough devs.

lelanthran•1w ago

> Internet is full of people asserting CVEs in C are only caused by not skilled enough devs.

Sure, but those people are not here, and usually aren't on HN anyway.

The internet is also full of people asserting that CVEs in C++ are only caused by not skilled enough devs, but I consider those people irrelevant too.

The reasons for rejecting C++ in this forum have been repeated often enough that you should have seen them by now: C++ has major systemic problems that don't exist in many other languages, including C.

It should be no surprise to you, at this point, that people choose almost anything over C++. The fact that "anything" also includes "C" is mostly incidental.

No one is asserting that they reject C++ because C is better, they typically reject it for concrete reasons, like the ones I pointed out upthread.

pjmlp•1w ago

Yeah the same reasons as the flamewars on comp.lang.c and comp.lang.c++.

WG21 could do a better job, but that least they acknowledge security has to be tackled somehow.

lelanthran•1w ago

>> people choose almost anything over C++. The fact that "anything" also includes "C" is mostly incidental.

> Yeah the same reasons as the flamewars on comp.lang.c and comp.lang.c++.

I've never seen that reason in comp.lang.c and comp.lang.c++; I am skeptical that you have seen that reason.

uecker•1w ago

Your implied claim that WG14 doesn't is incorrect, as you have been told before.

pjmlp•1w ago

As you have equally been told before, I have still not been proven wrong by WG14 work output during the last decades since 1989.

indy•1w ago

C++ has many things, and that is why many programmers want to stick with C

zabzonk•1w ago

if you don't like those things, then don't use them

indy•1w ago

Some people would rather have a pen knife than a Swiss army knife.

zabzonk•1w ago

or perhaps a pointed stick?

zzo38computer•1w ago

There are also some things in C that do not work or work differently in C++, such as (void*), empty structures (which in C++ are not really empty), etc; and there is also such C++ stuff such as name mangling, the C++ standard library, etc, even if those things are not a part of your program, which is another reason why you might prefer C.

pjmlp•1w ago

It is like those folks that rather write JSDoc comments than using a linter like Typescript, because reasons.

Given the C++ adoption on 1990's commercial software and major consumer operating systems (Apple, IBM, Microsoft, Be), I bet if the FSF with their coding guidelines had not advocated for C, the adoption would not taken off beyond those days.

"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."

The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12

taminka•1w ago

really cool website, what's your colour palette?

moth-fuzz•1w ago

I'm a huge fan of the 'parse, don't validate' idiom, but it feels like a bit of a hurdle to use it in C - in order to really encapsulate and avoid errors, you'd need to use opaque pointers to hidden types, which requires the use of malloc (or an object pool per-type or some other scaffolding, that would get quite repetitive after a while, but I digress).

You basically have to trade performance for correctness, whereas in a language like C++, that's the whole purpose of the constructor, which works for all kinds of memory: auto, static, dynamic, whatever.

In C, to initialize a struct without dynamic memory, you could always do the following:

    struct Name {
        const char *name;
    };

    int parse_name(const char *name, struct Name *ret) {
        if(name) {
            ret->name = name;
            return 1;
        } else {
            return 0;
        }
    }

    //in user code, *hopefully*...
    struct Name myname;
    parse_name("mothfuzz", &myname);

But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever. This is very close to 'validation' type behaviour. So to get real 'parsing' behaviour, dynamic memory is required, which is off-limits for many of the kinds of projects one would use C for in the first place.

I'm very curious as to how the author resolves this, given that they say they don't use dynamic memory often. Maybe there's something I missed while reading.

danhau•1w ago

> But then anyone could just instantiate an invalid Name without calling the parse_name function and pass it around wherever

This is nothing new in C. This problem has always existed by virtue of all struct members being public. Generally, programmers know to search the header file / documentation for constructor functions, instead of doing raw struct instantiation. Don‘t underestimate how good documentation can drive correct programming choices.

C++ is worse in this regard, as constructors don‘t really allow this pattern, since they can‘t return a None / false. The alternative is to throw an exception, which requires a runtime similar to malloc.

apaprocki•1w ago

In C++ you would have a protected constructor and related friend utility class to do the parsing, returning any error code, and constructing the thing, populating an optional, shared_ptr, whatever… don’t make constructors fallible.

maccard•1w ago

In C++ you can do: struct Foo { private: int val = 0; Foo(int newVal) : val(newVal) {} public: static optional<Foo> CreateFoo(int newVal) { if (newVal != SENTINEL_VALUE) { return Foo(newVal); } return {}; } };

    int main(int argc, char* argv[]) {
      if (auto f = CreateFoo(argc)) {
        cout << "Foo made with value " << f.val;
      } else {
        cout << "Foo not made";
      }
    }

sparkie•1w ago

Sometimes you want the struct to be defined in a header so it can be passed and returned by value rather than pointer.

A technique I use is to leverage GCC's `poison` pragma to cause an error if attempting to access the struct's fields directly. I give the fields names that won't collide with anything, use macros to access them within the header and then `#undef` the macros at the end of the header.

Example - an immutable, pass-by-value string which couples the `char*` with the length of the string:

    #ifndef FOO_STRING_H
    #define FOO_STRING_H
    
    #include <stddef.h>
    #include <stdlib.h>
    #include <string.h>
    #include "config.h"
    
    typedef size_t string_length_t;
    #define STRING_LENGTH_MAX CONFIG_STRING_LENGTH_MAX
    
    typedef struct {
        string_length_t _internal_string_length;
        char *_internal_string_chars;
    } string_t;
    
    #define STRING_LENGTH(s) (s._internal_string_length)
    #define STRING_CHARS(s) (s._internal_string_chars)
    
    #pragma GCC poison _internal_string_length _internal_string_chars
    
    constexpr string_t error_string = { 0, nullptr };
    constexpr string_t empty_string = { 0, "" };
    
    inline static string_t string_alloc_from_chars(const char *chars) {
        if (chars == nullptr) return error_string;
        size_t len = strnlen(chars, STRING_LENGTH_MAX);
        if (len == 0) return empty_string;
        if (len < STRING_LENGTH_MAX) {
            char *mem = malloc(len + 1);
            strncpy(mem, chars, len);
            mem[len] = '\0';
            return (string_t){ len, mem };
        } else return error_string;
    }
    
    inline static char * string_to_chars(string_t string) {
        return STRING_CHARS(string);
    }

    inline static string_length_t string_length(string_t string) {
        return STRING_LENGTH(string);
    }

    inline static void string_free(string_t s) {
        free(STRING_CHARS(s));
    }
    
    inline static bool string_is_valid(string_t string) {
        return STRING_CHARS(string) != nullptr
            && strnlen(STRING_CHARS(string), STRING_LENGTH_MAX) == STRING_LENGTH(string)
    }
    

    ...

    
    #undef STRING_LENGTH
    #undef STRING_CHARS
    
    #endif /* FOO_STRING_H */

It just wraps `<string.h>` functions in a way that is slightly less error prone to use, and adds zero cost. We can pass the string everywhere by value rather than needing an opaque pointer. It's equivalent on SYSV (64-bit) to passing them as two separate arguments:

    void foo(string_t str);
    //vs
    void foo(size_t length, char *chars);

These have the exact same calling convention: length passed in `rdi` and `chars` passed in `rsi`. (Or equivalently, `r0:r1` on other architectures).

The main advantage is that we can also return by value without an "out parameter".

    string_t bar();
    //vs
    size_t bar(char **out_chars);

These DO NOT have the same calling convention. The latter is less efficient because it needs to dereference a pointer to return the out parameter. The former just returns length in `rax` and chars in `rdx` (`r0:r1`).

So returning a fat pointer is actually more efficient than returning a size and passing an out parameter on SYSV! (Though only marginally because in the latter case the pointer will be in cache).

Perhaps it's unfair to say "zero-cost" - it's slightly less than zero - cheaper than the conventional idiom of using an out parameter.

But it only works if the struct is <= 16-bytes and contains only INTEGER types. Any larger and the whole struct gets put on the stack for both arguments and returns. In that case it's probably better to use an opaque pointer.

That aside, when we define the struct in the header we can also `inline` most functions, so that avoids unnecessary branching overhead that we might have when using opaque pointers.

`#pragma GCC poison` is not portable, but it will be ignored wherever it isn't supported, so this won't prevent the code being compiled for other platforms - it just won't get the benefits we get from GCC & SYSV.

The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.

sparkie•1w ago

> The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.

Hmm, I found a solution and it was easier than expected. GCC has `__attribute__((designated_init))` we can stick on the struct which prevents positional initializers and requires the field names to be used (assuming -Werror). Since those names are poisoned, we won't be able to initialize except through functions defined in our library. We can similarly use a macro and #undef it.

Full encapsulation of a struct defined in a header:

    #ifndef FOO_STRING_H
    #define FOO_STRING_H

    #include <stddef.h>
    #include <stdlib.h>
    #include <string.h>
    #if defined __has_include
    # if __has_include("config.h")
    #  include "config.h"
    # endif
    #endif

    typedef size_t string_length_t;
    #ifdef CONFIG_STRING_LENGTH_MAX
    #define STRING_LENGTH_MAX CONFIG_STRING_LENGTH_MAX
    #else
    #define STRING_LENGTH_MAX (1 << 24)
    #endif

    typedef struct __attribute__((designated_init)) {
        const string_length_t _internal_string_length;
        const char *const _internal_string_chars;
    } string_t;

    #define STRING_CREATE(len, ptr) (string_t){ ._internal_string_length = (len), ._internal_string_chars = (ptr) }
    #define STRING_LENGTH(s) (s._internal_string_length)
    #define STRING_CHARS(s) (s._internal_string_chars)
    #pragma GCC poison _internal_string_length _internal_string_chars


    constexpr string_t error_string = STRING_CREATE(0, nullptr);
    constexpr string_t empty_string = STRING_CREATE(0, "");

    inline static string_t string_alloc_from_chars(const char *chars) {
        if (__builtin_expect(chars == nullptr, false)) return error_string;
        size_t len = strnlen(chars, STRING_LENGTH_MAX);
        if (__builtin_expect(len == 0, false)) return empty_string;
        if (__builtin_expect(len < STRING_LENGTH_MAX, true)) {
            char *mem = malloc(len + 1);
            strncpy(mem, chars, len);
            mem[len] = '\0';
            return STRING_CREATE(len, mem);
        } else return error_string;
    }

    inline static const char *string_to_chars(string_t string) {
        return STRING_CHARS(string);
    }

    inline static string_length_t string_length(string_t string) {
        return STRING_LENGTH(string);
    }

    inline static void string_free(string_t s) {
        free((char*)STRING_CHARS(s));
    }

    inline static bool string_is_valid(string_t string) {
        return STRING_CHARS(string) != nullptr;
    }

    // ... other string function

    #undef STRING_LENGTH
    #undef STRING_CHARS
    #undef STRING_CREATE

    #endif /* FOO_STRING_H */

Aside from horrible pointer aliasing tricks, the only way to create a `string_t` is via `string_alloc_from_chars` or other functions defined in the library which return `string_t`.

    #include <stdio.h>
    int main() {
        string_t s = string_alloc_from_chars("Hello World!");
        if (string_is_valid(s)) 
            puts(string_to_chars(s));
        string_free(s);
        return 0;
    }

apaprocki•1w ago

You can play tricks if you’re willing to compromise on the ABI:

    typedef struct foo_ foo;
    enum { FOO_SIZE = 64 };
    foo *foo_init(void *p, size_t sz);
    void foo_destroy(foo *p);
    #define FOO_ALLOCA() \
      foo_init(alloca(FOO_SIZE), FOO_SIZE)

Implementation (size checks, etc. elided):

    struct foo_ {
        uint32_t magic;
        uint32_t val;
    };
    
    foo *foo_init(void *p, size_t sz) {
        foo *f = (foo *)p;
        f->magic = 1234;
        f->val = 0;
        return f;
    }

Caller:

    foo *f = FOO_ALLOCA();
    // Can’t see inside
    // APIs validate magic

1718627440•1w ago

If you don't want your types to be public, don't put them in the public interface, put them into the implementation.

SkiFire13•1w ago

> Additionally, the intent of whether the buffer is used as “raw” memory chunks versus a meaningful u8 is pretty clear from the code that it gets used in, so I’m not worried about confusing intent with it.

It's generally not clear to the compiler, and that can result in missed optimization opportunities.

bArray•1w ago

> I don’t personally do things that require dynamic memory management in C often, so I don’t have many practices for it. I know that wellons & co. Have been really liking the arena, and I’d probably like it too if I actually used the heap often. But I don’t, so I have nothing to say.

> If I find myself needing a bunch of dynamic memory allocations and lifetime management, I will simply start using another language–usually rust or C#.

I'm not sure what the modern standards are, but if you are writing in C, pre-allocate as much as possible. Any kind of garbage collection is just extra processing time and ideally you don't want to run out of memory during an allocation mid-execution.

People may frown at C, but nothing beats getting your inner loops into CPU cache. If you can avoid extra fetches into RAM, you can really crank some processing power. Example projects have included computer vision, servers a custom neural network - all of which had no business being so fast.

boltzmann-brain•1w ago

oh god, i find the sidebar so intensely annoying. glad i have the Dom Delete addon, i would never manage to read anything otherwise

lelanthran•1w ago

> oh god, i find the sidebar so intensely annoying. glad i have the Dom Delete addon, i would never manage to read anything otherwise

A Dom Delete addon? Is it faster than hitting f12, clicking on the offending element and hitting delete?

boltzmann-brain•1w ago

yes

lelele•1w ago

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Why I Joined OpenAI

Dark Alley Mathematics

Show HN: I spent 4 years building a UI design tool with only the features I use

A century of hair samples proves leaded gas ban worked

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Show HN: If you lose your memory, how to regain access to your computer?

Hackers (1995) Animated Experience

An Update on Heroku

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

How to effectively write quality code with AI

Learning from context is harder than we thought

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

FORTH? Really!?

PC Floppy Copy Protection: Vault Prolok

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

The Oklahoma Architect Who Turned Kitsch into Art

I'm going to cure my girlfriend's brain tumor

Show HN: Slack CLI for Agents

Claude Composer

Evolution of car door handles over the decades

Planetary Roller Screws

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

The Waymo World Model

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Monty: A minimal, secure Python interpreter written in Rust for use by AI

Why I Joined OpenAI

Dark Alley Mathematics

Show HN: I spent 4 years building a UI design tool with only the features I use

A century of hair samples proves leaded gas ban worked

Microsoft open-sources LiteBox, a security-focused library OS

Sheldon Brown's Bicycle Technical Info

Show HN: If you lose your memory, how to regain access to your computer?

Hackers (1995) Animated Experience

An Update on Heroku

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

How to effectively write quality code with AI

Learning from context is harder than we thought

Understanding Neural Network, Visually

I now assume that all ads on Apple news are scams

Introducing the Developer Knowledge API and MCP Server

FORTH? Really!?

PC Floppy Copy Protection: Vault Prolok

Evaluating and mitigating the growing risk of LLM-discovered 0-days

Show HN: Smooth CLI – Token-efficient browser for AI agents

The Oklahoma Architect Who Turned Kitsch into Art

I'm going to cure my girlfriend's brain tumor

Show HN: Slack CLI for Agents

Claude Composer

Evolution of car door handles over the decades

Planetary Roller Screws

Some C habits I employ for the modern day

Comments