Sj.h: A tiny little JSON parsing library in ~150 lines of C99

https://github.com/rxi/sj.h

140•simonpure•2h ago

Comments

EE84M3i•2h ago

This is interesting, but how does this do on the conformance tests?

https://github.com/nst/JSONTestSuite

Lucas_Marchetti•2h ago

Real question, does it manage nested objects ?

morcus•1h ago

It seems so: https://github.com/rxi/sj.h/blob/master/demo%2Fobject.c

Lucas_Marchetti•1h ago

yep but how deep can you parse nested into nested etc

layer8•1h ago

Why don’t you look at the source code, it’s only 150 lines?

The nesting is limited by using an int as the depth counter. The C standard guarantees that MAX_INT is at least 32767, so that’s a limit on portable nesting depth. Nowadays int is typically 32 or 64 bits, so a much higher limit in typical C implementations.

If I see correctly, the library doesn’t check for overflow, however. This might conceivably be an exploitable vulnerability (and such an overflow would constitute UB).

johnisgood•1h ago

Easy to add such checks though.

catlifeonmars•1h ago

You know what would really be useful is a conformance test based on a particular real implementation.

What I mean by this is a subset (superset?) that exactly matches the parsing behavior of a specific target parsing library. Why is this useful? To avoid the class of vulnerabilities that rely on the same JSON being handled differently by two different parsers (you can exploit this to get around an authorization layer, for example).

LegionMammal978•1h ago

It doesn't seem to have much in the way of validation, e.g., it will indiscriminately let you use either ']' or '}' to terminate an object or array. Also, it's more lenient than RFC or json.org JSON in allowing '\v' for whitespace. I'd treat it more as a "data extractor for known-correct JSON". But even then, rolling your own string or number parser could get annoying, unless the producer agrees on a subset of JSON syntax.

adrianN•2h ago

What’s the usecase for something like this? There are lots of excellent libraries for json available. Is this a teaching tool?

p2detar•2h ago

> Zero-allocations with minimal state

Snild•2h ago

Small code is easier to review, so projects with strict security requirements might be one?

Also, license compliance is very easy (no notice required).

adsan•2h ago

I suppose it's meant as a minimal library meant to be modded for the specific usecase.

bb88•1h ago

Embedded cpus is an easy one. You could maybe run an api server on a vape now.

flohofwoe•1h ago

Trivial to integrate into an existing code base, minimal size overhead, no heap allocations, no stdlib usage (only stdbool.h and stddef.h included for type definitions), no C++ template shenanigans and very simple and straightforward API. C libraries which tick all those boxes are actually quite rare, and C++ libraries are much rarer.

CyberDildonics•1h ago

A small single file, pure C dependency that doesn't allocate memory can be a universal solution to a common problem if it works well.

elcapitan•1h ago

Being able to parse without a lot of overhead and without allocations is quite interesting. E.g. when you process some massive json dump to just extract some properties (the Wikidata dumps come to mind).

binary132•1h ago

the more the merrier

lioeters•1h ago

What I love about this author's work is that they're usually single-file libraries in ANSI C or Lua with focused scope, easy-to-use interface, and good documentation. And free software license. Aside from the posted project, some I like are:

- log.c - A simple logging library implemented in C99

- microui - A tiny immediate-mode UI library

- fe - A tiny, embeddable language implemented in ANSI C

- microtar - A lightweight tar library written in ANSI C

- cembed - A small utility for embedding files in a C header

- ini - A tiny ANSI C library for loading .ini config files

- json.lua - A lightweight JSON library for Lua

- lite - A lightweight text editor written in Lua

- cmixer - Portable ANSI C audio mixer for games

- uuid4 - A tiny C library for generating uuid4 strings

codr7•1h ago

JSON parser libraries in general is a black hole of suffering imo.

They're either written with a different use case in mind, or a complex mess of abstractions; often both.

It's not a very difficult problem to solve if you only write exactly what you need for your specific use case.

flohofwoe•1h ago

You can't get much more 'opinion-less' than this library though. Iterate over keys and array items, identify the value type and return string-slices.

IshKebab•1h ago

It also feels like only half the job to me. Reminds me of SAX "parsers" that were barely more than lexers.

flohofwoe•1h ago

I mean, what else is there to do when iterating over a JSON file? Delegating number parsing and UNICODE handling to the user can be considered a feature (since I can decide on my own how expensive/robust I want this to be).

skydhash•1h ago

That is what I like Common Lisp libraries. They are mostly about the algorithms, leaving data structures up to the user. So you make sure you got those rights before calling the function.

nicce•1h ago

The project advertises that it has zero-allocations with minimal state. I don’t think it is fair or our problems are very different. Single string, (the most used type), and you need an allocation.

mbac32768•53m ago

It's astonishing how involved a fucking modern JSON library becomes.

The once "very simple" C++ single-header JSON library by nlohmann is now

* 13 years old

* is still actively merging PRs (last one 5 hours ago)

* has 122 __million__ unit tests

Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.

Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.

forty•50m ago

Parsing JSON is a Minefield (2016)

https://seriot.ch/projects/parsing_json.html

fnord77•1h ago

I can see one bug just glancing at the code - feeding a stray '}' at the top level can result in depth becoming negative

flohofwoe•1h ago

That's detected as an error though?

https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...

layer8•1h ago

The library doesn’t check for signed integer overflow here:

https://github.com/rxi/sj.h/blob/eb725e0858877e86932128836c1...

Certain inputs can therefore trigger UB.

EmilStenstrom•1h ago

Submit a PR!

skydhash•1h ago

There was a nice article [0] about bloated edge cases libraries (discussion [1]).

Sometimes, it's just not the responsibility of the library. Trying to handle every possible errors is a quick way to complexity.

[0]: https://43081j.com/2025/09/bloat-of-edge-case-libraries

[1]: https://news.ycombinator.com/item?id=45319399

layer8•1h ago

The problem in the present case is that the caller is not made aware of the limitation, so can’t be expected to prevent passing unsupported input, and has no way to handle the overflow case after the fact.

skydhash•1h ago

Do you not review libraries you add to your project? A quick scan of the issues page if it's on a forge? Or just reading through the code if it's small enough (or select functions)?

Code is the ultimate specification. I don't trust the docs if the behavior is different from what it's saying (or more often fails to mention). And anything that deals with recursive structures (or looping without a clear counter and checks) is my one of the first candidate for checks.

> has no way to handle the overflow case after the fact.

Fork/Vendor the code and add your assertions.

layer8•58m ago

Obviously I just did review it, and my conclusion was to not use that code.

In the spirit of the article you linked, I’d rather write my own version.

klysm•51m ago

Strongly disagree here because JSON can come from untrusted sources and this has security implications. It's not the same kind of problem that the bloat article discusses where you just have bad contracts on interfaces.

flykespice•40m ago

There is no easy way out when you're working with C: either you handle all possible UB cases with exhaustive checks, or you move on to another language.

(TIP: choose the latter)

ricardobeat•1h ago

An int will be 32 bits on any non-ancient platform, so this means:

- a JSON file with nested values exceeding 2 billion depth

- a file with more than 2 billion lines

- a line with more than 2 billion characters

layer8•1h ago

All very possible on modern platforms.

Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.

johnisgood•45m ago

Personally, all my C code is written with SEI C Coding Standard in mind.

klysm•50m ago

2 billion characters seems fairly plausible to hit in the real world

naasking•10m ago

2GB in a single JSON file is definitely an outlier. A simple caveat when using this header could suffice: ensure inputs are less than 2GB.

odie5533•51m ago

Can't use this library in production that's for sure.

hypeatei•47m ago

You're not aware of the simplistic, single header C library culture that some developers like to partake in. Tsoding (a streamer) is a prime example of someone who likes developing/using these types of libraries. They acknowledge that these things aren't focused on "security" or "features" and that's okay. Not everything is a super serious business project exposed to thousands of paying customers.

zwnow•43m ago

So if its a hobby project designed for just a handful of people, its suddenly okay to endanger them due to being sloppy?

hypeatei•41m ago

This is an open source project that you're not obligated to use nor did you pay for it. Who is it endangering?

The license also makes it clear that the authors aren't liable for any damages.

flykespice•32m ago

...and what open source software license in the world makes the author liable for damages?

nkrisc•12m ago

The code nor author don’t endanger anyone. Whoever uses it inappropriately endangers themselves or others.

Why are you using random, unvetted and unaudited code where safety is important?

layer8•40m ago

Hobby projects that prove useful have a tendency of starting to be used in production code, and then turning into CVEs down the road.

If there is a conscious intent of disregarding safety as you say, the Readme should have a prominent warning about that.

hypeatei•37m ago

> Hobby projects that prove useful have a tendency of starting to be used in production code

Even if that is true, how is that the authors problem? The license clearly states that they're not responsible for damages. If you were developing such a serious project then you need the appropriate vetting process and/or support contracts for your dependencies.

layer8•34m ago

I didn’t say it’s the author’s problem. It’s a problem with the code.

vrighter•36m ago

then that is their problem, not the code author's. If you use a hobby project in production, that's on you

sim7c00•1h ago

this is really nice. i also _must_ use it because my initials are S.J H.. :').

on the more code side, love this, been looking to implement a simple json parser for some projects but this is small enough i can study it and either learn what i need or even use it. lovely!

mbel•29m ago

It feels like a stretch to call this a parser. It’s looks like a typical lexer?

Sj.h: A tiny little JSON parsing library in ~150 lines of C99

DXGI debugging: Microsoft put me on a list

Google/timesketch: Collaborative forensic timeline analysis

The link between trauma, drug use, and our search to feel better

I forced myself to spend a week in Instagram instead of Xcode

The University of Oxford has fallen out of the top three universities in the UK

LaLiga's Anti-Piracy Crackdown Triggers Widespread Internet Disruptions in Spain

The Beginner's Textbook for Homomorphic Encryption

Disk Utility still can't check and repair APFS volumes and containers (2021)

A coin flip by any other name (2023)

How Isaac Newton Discovered the Binomial Power Series (2022)

Spectral Labs releases SGS-1: the first generative model for structured CAD

AI was supposed to help juniors shine. Why does it mostly make seniors stronger?

iFixit iPhone Air teardown

Show HN: Freeing GPUs stuck by runaway jobs

40k-Year-Old Symbols in Caves Worldwide May Be the Earliest Written Language

New thermoelectric cooling breakthrough nearly doubles efficiency

$2 WeAct Display FS adds a 0.96-inch USB information display to your computer

Liberté, égalité, Radioactivité

Writing a competitive BZip2 encoder in Ada from scratch in a few days – part 3

How to stop functional programming (2016)

President Trump Signs Technology Prosperity Deal with United Kingdom

Extrachromosomal DNA–Driven Oncogene Evolution in Glioblastoma

Review: Project Xanadu – The Internet That Might Have Been

Meta exposé author faces bankruptcy after ban on criticising company

UUIDv7 in Postgres 18. With time extraction

Hi No Youjin

Teardown of Apple 40W dynamic power adapter with 60W max

Why, as a responsible adult, SimCity 2000 hits differently

The bloat of edge-case first libraries

Sj.h: A tiny little JSON parsing library in ~150 lines of C99

DXGI debugging: Microsoft put me on a list

Google/timesketch: Collaborative forensic timeline analysis

The link between trauma, drug use, and our search to feel better

I forced myself to spend a week in Instagram instead of Xcode

The University of Oxford has fallen out of the top three universities in the UK

LaLiga's Anti-Piracy Crackdown Triggers Widespread Internet Disruptions in Spain

The Beginner's Textbook for Homomorphic Encryption

Disk Utility still can't check and repair APFS volumes and containers (2021)

A coin flip by any other name (2023)

How Isaac Newton Discovered the Binomial Power Series (2022)

Spectral Labs releases SGS-1: the first generative model for structured CAD

AI was supposed to help juniors shine. Why does it mostly make seniors stronger?

iFixit iPhone Air teardown

Show HN: Freeing GPUs stuck by runaway jobs

40k-Year-Old Symbols in Caves Worldwide May Be the Earliest Written Language

New thermoelectric cooling breakthrough nearly doubles efficiency

$2 WeAct Display FS adds a 0.96-inch USB information display to your computer

Liberté, égalité, Radioactivité

Writing a competitive BZip2 encoder in Ada from scratch in a few days – part 3

How to stop functional programming (2016)

President Trump Signs Technology Prosperity Deal with United Kingdom

Extrachromosomal DNA–Driven Oncogene Evolution in Glioblastoma

Review: Project Xanadu – The Internet That Might Have Been

Meta exposé author faces bankruptcy after ban on criticising company

UUIDv7 in Postgres 18. With time extraction

Hi No Youjin

Teardown of Apple 40W dynamic power adapter with 60W max

Why, as a responsible adult, SimCity 2000 hits differently

The bloat of edge-case first libraries

Sj.h: A tiny little JSON parsing library in ~150 lines of C99

Comments