Also, license compliance is very easy (no notice required).
- log.c - A simple logging library implemented in C99
- microui - A tiny immediate-mode UI library
- fe - A tiny, embeddable language implemented in ANSI C
- microtar - A lightweight tar library written in ANSI C
- cembed - A small utility for embedding files in a C header
- ini - A tiny ANSI C library for loading .ini config files
- json.lua - A lightweight JSON library for Lua
- lite - A lightweight text editor written in Lua
- cmixer - Portable ANSI C audio mixer for games
- uuid4 - A tiny C library for generating uuid4 strings
Edit: I was not aware of the FSF's definition. I was using a definition of free software being software that you can use without having to pay for it.
Depends on which "free software" definition you're referring to.
The FSF definition of "free software" requires it to be open source.
That’s called freeware. Also, open-source software can be paid (with the caveat that if someone buys it, you must allow them to redistribute it for free).
To add an additional suggestion, gratis can also be used to refer to free as in free beer. Comes from a latin root and is common in spanish speaking countries to refer only to free of charge, and not as in freedom.
People != the legal departments of corporations.
lol
So I have much more trust in (A)GPL licensed projects, and I see them as more for the people than MIT licensed projects.
They care more about the package being maintained, bug-free, and their preferred vulnerability database showing no active exploits.
At least in my experience, anyway. Other companies may have stricter requirements.
SQLite on the other hand just says
The author disclaims copyright to this source code. In place of a legal
notice, here is a blessing:
May you do good and not evil.
May you find forgiveness for yourself and forgive others.
May you share freely, never taking more than you give.
which seems less useful once you strike sentence 1.The MIT license upholds the four essential freedoms of free software: the right to run, copy, distribute, study, change and improve the software.
It is listed under "Expat License" in the list of GPL-compatible Free Software licenses.
[1] https://www.gnu.org/philosophy/free-sw.html [2] https://opensource.org/osd
> not free software
which it is. As F3nd0 said, it's both.
I used "lite" (text editor in Lua) which has been mentioned under this submission. It is cool, too.
They're either written with a different use case in mind, or a complex mess of abstractions; often both.
It's not a very difficult problem to solve if you only write exactly what you need for your specific use case.
Anyhow, IMO a proper JSON library should offer both, in a layered approach. That is, a lower level SAX-style parser, on top of which a DOM-style API is provided as a convenience.
Not really because the JSON library itself can stream the input. For example if you use `serde_json::from_reader()` it won't load the whole file into memory before parsing it into your objects:
https://docs.rs/serde_json/latest/serde_json/fn.from_reader....
But that's kind of academic; half of all memory and all memory are in the same league.
In some minority of cases you might not want to do that (e.g. because you need to support multiple versions of a format), but that is rare and can also be handled in various ways directly in Serde.
The once "very simple" C++ single-header JSON library by nlohmann is now
* 13 years old
* is still actively merging PRs (last one 5 hours ago)
* has 122 __million__ unit tests
Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.
Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.
That's the thing with reinventing wheels, a wheel that fits every possible vehicle and runs well in any possible terrain is very difficult to build. But when you know exactly what you need it's a different story.
https://github.com/kstenerud/KSCrash/blob/master/Sources/KSC...
And yeah, writing a JSON codec sucks.
So I'm in the process of replacing it with a BONJSON codec, which has the same capabilities, is still async-safe and crash resilient, and is 35x faster with less code.
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
So in this case you're wrong.
General purpose is a different can of worms compared to solving a specific case.
Sexprs sitting over here, hoping for some love.
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...
Certain inputs can therefore trigger UB.
Sometimes, it's just not the responsibility of the library. Trying to handle every possible errors is a quick way to complexity.
[0]: https://43081j.com/2025/09/bloat-of-edge-case-libraries
Code is the ultimate specification. I don't trust the docs if the behavior is different from what it's saying (or more often fails to mention). And anything that deals with recursive structures (or looping without a clear counter and checks) is my one of the first candidate for checks.
> has no way to handle the overflow case after the fact.
Fork/Vendor the code and add your assertions.
In the spirit of the article you linked, I’d rather write my own version.
Here's an example - I once coded a limited JSON parser in assembly language. I did not attempt to make it secure in any way. The purpose was to parse control messages sent over a serial port connection to an embedded CPU that controlled a small motor to rotate a camera and snap a photo. There was simply no way for any "untrusted" JSON to enter the system. It worked perfectly and nothing could ever be compromised by having a very simple JSON parser in the embedded device controlling the motor.
For this specific project I chose JSON and it worked perfectly. Sending JSON from the embedded CPU was also really simple. Yes, there was a little overhead on a slow connection, but I wasn't getting anywhere near saturation. I think it was 9600 bps max on a noisy connection with checksums. If even 10% of the JSON "packets" got through it was still plenty for the system to run.
Isn't that a bit like saying "you don't have to worry about home security as long as you are the only person who has the ability to enter your house"?
If you need it, then you need it. But if you don't need it, then you don't need it. There is a non-trivial value in the smallness and simplicity, and a non-trivial cost in trying to handle infinity problems when you don't have infinity use-case.
If you are reading data from a file or stream that only you yourself wrote some other time, then it's true that data could possibly have been corrupted or something, but it's not true that it's automatically worth worrying about enough to justify making the code and thus it's bug surface larger.
How likely is the problem, how bad are the consequences if the problem happens, how many edge cases could possibly exist, how much code does it take to handle them all? None of these are questions you or anyone else can say about anyone else's project ahead of time.
If the full featured parser is too big, then the line drawing the scope of the lightweight parser has to go somewhere, and so of course there will be things on the other side of that line no matter where it is except all the way back at full-featured-parser.
"just this one little check" is not automatially reasonable, because that check isn't automatically more impoprtant than any other, and they are all "just one little checks"s. The one little check would perevent what? Maybe a problem that never happens or doesn't hurt when it does happen. A value might be misinerpreted? So what? Let it. Maybe it makes more sense to handle that in the application code the one place it might matter. If it will matter so much, then maybe the application needs the full fat library.
Using a "tiny library" for parsing untrusted data is where the mistake is. Not in OP code.
(TIP: choose the latter)
Writing a function to do a checked addition like in other languages isn't exactly difficult, either.
Detecting these mistakes in Rust is not too difficult. In debug builds, integer overflow triggers a panic[1]. Additionally, clippy (the official linter of Rust), has a rule[2] to detect this mistake.
[1] https://doc.rust-lang.org/book/ch03-02-data-types.html#integ...
[2] https://rust-lang.github.io/rust-clippy/master/index.html#ar...
It's the wrong attitude for a JSON parser written in C, unless you like to get owned.
UB is bad.
Sometimes. In this case, where the library is a parser that is written in C. I think it is reasonable to expect the library to handle all possible inputs. Even corner cases like this which are unlikely to be encountered in common practice. This is not "bloat" it is correctness.
In C, this kind of bug is capable of being exploited. Sure, many users of this lib won't be using it in exposed cases, but sooner or later the lib will end up in some widely-used internet-facing codebase.
As others have said, the fix could be as simple as bailing once the input size exceeds 1GB. Or it could be fine-grained. Either-way the fix would not "bloat" the codebase.
And yes, I'm well aware of the single-file C library movement. I am a fan.
- a JSON file with nested values exceeding 2 billion depth
- a file with more than 2 billion lines
- a line with more than 2 billion characters
Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.
Restricting the input to a reasonable size is an easy workaround for sure, but this limitation isn't indicated everywhere, so anyone deciding to consume this random project into their important code wouldn't know to defend against such situation.
In a web server scenario, 2GiB of { (which would trigger two overflows) in a compressed request would require a couple hundred kilobytes to two megabytes, depending on how old your server software is.
And in the spirit of your profile text I'm quite glad for such landmines being out there to trip up those that do blindly ingest all code they can find.
If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.
You have 4GB of "padding"...at minimum.
You file is going to be Petabytes in size for this to make any sense.
You are using a terrible format for whatever you are doing.
You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.
Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.
All that is going to happen in this program is a crash.
I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.
An after the fact check would be the wrong way to deal with UB, you'd need to check for < INT_MAX before the increment in order to avoid it.
The license also makes it clear that the authors aren't liable for any damages.
You find a vulnerability? patch it, push change to repo maintainer.
The license disclaims liability but that doesn't mean the author cannot ever be held liable. Ultimately, who is liable is up to a court to decide.
Why are you using random, unvetted and unaudited code where safety is important?
They are sharing their knowledge about how to create a tiny JSON parser. Where is the problem again?
If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.
Even if that is true, how is that the authors problem? The license clearly states that they're not responsible for damages. If you were developing such a serious project then you need the appropriate vetting process and/or support contracts for your dependencies.
In the present case, either the missing overflow check in the code is by mistake, and then it's warranted to point out the error, or, as I understood GGGP to be arguing, the author deliberately decided to neglect safety or correctness, and then in my opinion you can't reject the criticism as unwarranted if the project's presentation isn't explicit about that.
I'm not making anything the author's problem here. Rather, I'm defending my criticism of the code, and am giving arguments as to why it is generally good form to make it explicit if a project doesn't care about the code being safe and correct.
What do you consider this clause in the LICENSE:
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Every open source license has a very similar clause, include but not limited to BSD, GPL, CDDL, MPL and Apache.
You are responsible for the code you ship, doesn't matter whether it's written by you, an LLM, or whether it's a third-party dependency.
where? single header is just a way to package software, it has no relation to features, security or anything such...
- overestimating the gravity of a UB and its security implications
- underestimate the value of a 150 line json parser
- or overestimate the feasibility of having both a short and high quality parser.
It sometimes happens that fixing a bug is quicker than defending the low quality. Not everything is a tradeoff.
No one cares. Stop complaining or GTFO.
diff --git a/sj.h b/sj.h
index 60bea9e..25f6438 100644
--- a/sj.h
+++ b/sj.h
@@ -85,6 +85,7 @@ top:
return res;
case '{': case '[':
+ if (r->depth > 999) { r->error = "can't go deeper"; goto top; }
res.type = (*r->cur == '{') ? SJ_OBJECT : SJ_ARRAY;
res.depth = ++r->depth;
r->cur++;
There, fixed itLimit you JS input to 1 GB. I will have more problems in other portions of the stack if I start to receive a 2 GB JSON file over the web.
And if I still want to make it work for > 2GB, I would change all int in the source to 64 bits. Will still crash if input is > 2^64.
What I won't ever do in my code is check for int overflow.
Amen. Just build with -fno-strict-overflow, my hot take is that should be the default on Linux anyway.
UB was a secondary observation, but it also can lead to logic errors in that vein, without involving memory safety.
I'm not sure I agree that UB usually leads to memory safety violations, but in any case, the fact that signed integer overflow is UB isn't what makes the code incorrect and unsafe in the first place.
for(int i=0; blah blah; i++)
Is actually broken and dangerous on 64 bit machines.Skimming the code, they also are loose in parsing incorrect json, it seems:
static bool sj__is_number_cont(char c) {
return (c >= '0' && c <= '9')
|| c == 'e' || c == 'E' || c == '.' || c == '-' || c == '+';
}
case '-': case '0': case '1': case '2': case '3': case '4':
case '5': case '6': case '7': case '8': case '9':
res.type = SJ_NUMBER;
while (r->cur != r->end && sj__is_number_cont(*r->cur)) { r->cur++; }
break;
that seems to imply it treats “00.-E.e-8..7-E7E12” as a valid json number. case '}': case ']':
res.type = SJ_END;
if (--r->depth < 0) {
r->error = (*r->cur == '}') ? "stray '}'" : "stray ']'";
goto top;
}
r->cur++;
break;
I think that means the code finds [1,2} a valid array and {"foo": 42] a valid struct (maybe, it even is happy with [1,2,"foo":42})Those, to me, seem a more likely attack vector. The example code, for example, calls atoi on something parsed by the first piece of code.
⇒ I only would use this for parsing json config files.
Being tiny is one thing, but the json grammar isn’t that complex. They could easily do a better job at this without adding zillions of lines of code.
-sj_Reader sj_reader(char *data, size_t len) {
+sj_Reader sj_reader(char *data, int len) {
Not everyone needs to waste cycles on supporting JSON files larger than 2^31-1.on the more code side, love this, been looking to implement a simple json parser for some projects but this is small enough i can study it and either learn what i need or even use it. lovely!
Im still impressed and might use it, but just noting this.
I've been using cJSON[0] for years now and am pretty happy with it. I had used wjelement[1] before that but ran into a few issues and eventually moved away from it (can't recall why exactly its been so long.)
[0] https://github.com/DaveGamble/cJSON [1] https://github.com/netmail-open/wjelement
{"x",10eee"y"22:5,{[:::,,}]"w"7"h"33
rect: { 10, 22, 7, 33 }I don’t know what else you call a library that just extracts data.
So if you can, try and at least use LuaJIT, which when using json.lua seems to bring it back down into range with other performant languages, or jump down into LuaJIT and use Sj.h there, through the C FFI or just simdjson.
json.lua is great for when you're restricted in some ways to use a pure Lua implementation, though. It's the de facto solution.
https://github.com/lelanthran/libxcgi/blob/master/library/sr...
https://github.com/lelanthran/libxcgi/blob/master/library/sr...
Which does much more in 200 lines of C89 in a single header
EE84M3i•4mo ago
https://github.com/nst/JSONTestSuite
Lucas_Marchetti•4mo ago
morcus•4mo ago
Lucas_Marchetti•4mo ago
layer8•4mo ago
The nesting is limited by using an int as the depth counter. The C standard guarantees that MAX_INT is at least 32767, so that’s a limit on portable nesting depth. Nowadays int is typically 32 or 64 bits, so a much higher limit in typical C implementations.
If I see correctly, the library doesn’t check for overflow, however. This might conceivably be an exploitable vulnerability (and such an overflow would constitute UB).
johnisgood•4mo ago
catlifeonmars•4mo ago
What I mean by this is a subset (superset?) that exactly matches the parsing behavior of a specific target parsing library. Why is this useful? To avoid the class of vulnerabilities that rely on the same JSON being handled differently by two different parsers (you can exploit this to get around an authorization layer, for example).
LegionMammal978•4mo ago