Trust me I love C. Probably over 90% of my lifetime code has been written in C. But python newbies don't get their web frameworks stack smashed. That's kind of nice.
Hah! True :-)
The thing is, smashed stacks are difficult to exploit deterministically or automatically. Even heartbleed, as widespread as it was, was not a guaranteed RCE.
OTOH, an exploit in a language like Python is almost certainly going to be easier to exploit deterministically. Log4j, for example, was a guaranteed exploit and the skill level required was basically "Create a Java object".
This is because of the ease with which even very junior programmers can create something that appears to run and work and not crash.
There are good reasons for this choice in C (and C++) due to broken integer promotion and casting rules.
See: "Subscripts and sizes should be signed" (Bjarne Stroustrup) https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0...
As a nice bonus, it means that ubsan traps on overflow (unsigned overflows just wrap).
The reason you should make length signed is that you can use the sanitizer to find or mitigate overflow as you correctly observe, while unsigned wraparound leads to bugs which are basically impossible to find. But this has nothing to do with integer promotion and wraparound bugs can also create bugs in - say - Rust.
I don’t mean to be disrespectful, but this cavalier attitude towards it reads like vaccine skepticism to me. It is not serious.
Programming can be inconsequential, but it can also be national security. I know which engineers I would trust with the latter, and they aren’t the kind who believe that discipline is “enough”.
Of course, if you consistently treat unsigned wraparound as a bug in your code, you can also use a sanitizer to screen for it. But in general I find it more practical to use signed integers for everything except for modular arithmetic where I use unsigned (and where wraparound is then expected and not a bug)
The issues really arise when you mix signed/unsigned arithmetic and end up promoting everything to signed unexpectedly. That's usually "okay", as long as you're not doing arithmetic on anything smaller than an int.
As an aside, if you like C enough to have opinions on promotion rules then you might enjoy the programming language Zig. It's around the same level as C, but with much nicer ergonomics, and overflow traps by default in Debug/ReleaseSafe optimization modes. If you want explicit two's complement overflow it has +%, *% and -% variants of the usual arithmetic operations, as well as saturating +|, *|, -| variants that clamp to [minInt(T), maxInt(T)].
EDIT to the aside: it's also true if you hate C enough to have opinions on promotion rules.
The "promoting unexpectedly" is something I do not think happens if you know C well. At least, I can't remember ever having a bug because of this. In most cases the promotion prevents you from having a bug, because you do not get unexpected overflow or wraparound because your type is too small.
Mixing signed and unsigned is problematic, but I see issues mostly in code from people who think they need to use unsigned when they shouldn't because they heard signed integers are dangerous. Recently I saw somebody "upgrading" a C code basis to C++ and also changing all loop variables to size_t. This caused a bug which he blamed on working on the "legacy C code" he is working on, although the original code was just fine. In general, there are compiler warnings that should catch issues with sign for conversions.
We can argue til we're blue in the face that people should just not make any mistakes, but history is against us - People will always make mistakes.
That's why surgeons are supposed to follow checklists and count their sponges in and out
What?
unsigned sizes are way easier to check, you just need one invariant:
if(x < capacity) // good to go
Always works, regardless how x is calculated and you never have to worry about undefined behavior when computing x. And the same invariant is used for forward and backward loops - some people bring up i >= 0 as a problem with unsigned, but that's because you should use i < n for backward loops as well, The One True Invariant.
Actually, unchecked math on an integer is going to be bad regardless of whether it's signed or unsigned. The difference is that with signed integers, your sanity check is simple and always the same and requires no thought for edge cases: `if(index < 0 || index > max)`. Plus ubsan, as mentioned above.
My policy is: Always use signed, unless you have a specific reason to use unsigned (such as memory addresses).
That kills any non-allocation dreams. Moment you have "Hi \uxxxx isn't the UTF nice?" you will probably have to allocate. If source is read-only you have to allocate. If source is mutable you have to waste CPU to rewrite the string.
Depends on what you are doing with it. If you aren't displaying it (and typically you are not in a server application), you don't need to unescape it.
If the source JSON/XML is in a writeable buffer, with some helper functions you can do it. I've done it for a few small-memory systems.
Not sure why many people seem fixated on the idea that using a programming language must follow a particular approach. You can do minimal alloc Java, you can simulate OOP-like in C, etc.
Unconventional, but why do we need to restrict certain optimizations (space/time perf, "readability", conciseness, etc) to only a particular language?
In Java, you don't care because the GC cleans after you and you don't usually care about millisecond-grade performance.
GP didn't say "zero-alloc", but "minimal alloc"
> Why should "nice" javaesque make little sense in C?
There's little to no indirection in idiomatic C compared with idiomatic Java.
Of course, in both languages you can write unidiomatically, but that is a great way to ensure that bugs get in and never get out.
I've upvoted you, but I'm not so sure I agree though.
Sure, each allocation imposes a new obligation to track that allocation, but on the downside, passing around already-allocated blocks imposes a new burden for each call to ensure that the callees have the correct permissions (modify it, reallocate it, free it, etc).
If you're doing any sort of concurrency this can be hard to track - sometimes it's easier to simply allocate a new block and give it to the callee, and then the caller can forget all about it (callee then has the obligation to free it).
LLMs are fundamentally probabilistic --- not deterministic.
This basically means that anything produced this way is highly suspect. And this framework is an example.
jacquesm•4h ago
As a learning exercise it is useful, but it should never see production use. What is interesting is that the apparent cleanliness of the code (it reads very well) is obscuring the fact that the quality is actually quite low.
If anything I think the conclusion should be that AI+novice does not create anything that is useable without expert review and that that probably adds up to a net negative other than that the novice will (hopefully) learn something. It would be great if someone could put in the time to do a full review of the code, I have just read through it casually and already picked up a couple of problems, I'm pretty sure that if you did a thorough job of it there would be many more.
drnick1•3h ago
I think this is a general feature and one of the greatest advantages of C. It's simple, and it reads well. Modern C++ and Rust are just horrible to look at.
messe•3h ago
jacquesm•3h ago
I don't remember any other language's proponents actively attacking the users of other programming language.
imtringued•2h ago
messe•2h ago
01HNNWZ0MV43FF•2h ago
I just saw someone on Hacker News saying that Rust was a bad language because of its users
jacquesm•1h ago
citbl•3h ago
uecker•3h ago
jacquesm•1h ago
And this goes for almost all programming languages. Each and every one of them has warts and issues with syntax and expressiveness. That holds true even for the most advanced languages in the field, Haskell, Erlang, Lisp and more so for languages that were originally designed for 'readability'. Programming is by its very nature more akin to solving a puzzle than to describing something. The puzzle is to how to get the machine to do something, to do it correctly, to do it safely and to do it efficiently, and all of those while satisfying the constraint of how much time you are prepared (or allowed) to spend on it. Picking the 'right' language will always be a compromise on some of these, there is no programming language that is perfect (or even just 'the best' or 'suitable') for all tasks, and there are no programming languages that are better than any other for any subset of all tasks until 'tasks' is a very low number.
OneLessThing•3h ago
I suppose I was just surprised to find this code promoted in my feed when it's not up to snuff. And I'm not hating, I do in fact love the project idea.
lifthrasiir•3h ago
[1] https://github.com/lifthrasiir/wah/blob/main/wah.h
jacquesm•3h ago
One good defense is to reduce your scope continuously. The smaller you make your scope the smaller the chances of something escaping your attention. Stay away from globals and global data structures. Make it impossible to inspect the contents of a box without going through a well defined interface. Use assertions liberally. Avoid fault propagation, abort immediately when something is out of the expected range.
uecker•3h ago
OneLessThing•3h ago
jacquesm•36m ago
But the lack of a good string library is by itself responsible for a very large number of production issues, as is the lack of foresight regarding de-referencing pointers that are no longer valid. Lack of guardrails seems to translate in 'do what you want' not necessarily 'build guard rails at the right level for you', most projects simply don't bother with guardrails at all.
Rust tries to address a lot of these issues, but it does so by tossing out a lot of the good stuff as well and introducing a whole pile of new issues and concepts that I'm not sure are an improvement over what was there before. This creates a take-it-or-leave it situation, and a barrier to entry. I would have loved to see that guard rails concept extended to the tooling in the form of compile time flags resulting in either compile time flagging of risky practices (there is some of this now, but I still think it is too little) and runtime errors for clear violations.
The temptation to 'start over' is always there, I think C with all of its warts and shortcomings is not the best language for a new programmer to start with if they want to do low level work. At the same time, I would - still, maybe that will change - hesitate to advocate for rust, it is a massive learning curve compared to the kind of appeal that C has for a novice. I'd probably recommend Go or Java over both C and rust if you're into imperative code and want to do low level work. For functional programming I'd recommend Erlang (if only because of the very long term view of the people that build it) or Clojure, though the latter seems to be on its retour.
OneLessThing•2h ago
I do think that LLM C code if made with great testing tooling in concert has great promise.
jacquesm•10m ago
citbl•3h ago
nurettin•2h ago
I have an issue with high strung opinions like this. I wrote plenty of crappy delphi code while learning the language that saw production use and made a living from it.
Sure, it wasn't the best experience for users, it took years to iron out all the bugs and there was plenty of frustration during the support phase (mostly null pointer exceptions and db locks in gui).
But nobody would be better off now if that code never saw production use. A lot of business was built around it.
zdragnar•2h ago
Once upon a time, you could put up a relatively vulnerable server, and unless you got a ton of traffic, there weren't too many things that would attack it. Nowadays, pretty much anything Internet facing will get a constant stream of probes. Putting up a server requires a stricter mindset than it used to.
jacquesm•1h ago
nurettin•21m ago
I guess the question at spotlight is: At what point would your custom server's buffer overflow when reading a header matter and would that bug even exist at that point?
Could a determined hacker get to your server without even knowing what weird software you cooked up and how to exploit your binary?
We have a lot of success stories born from bad code. I mean look at Micro$oft.
Look at all the big players like discord leaking user credentials. Why would you still call out the little fish?
Maybe I should create a form for all these ahah.