As the title states, I made a library to process and validate a massive JSON in Node.js where using JSON.parse() means saying hello to OOM.
It came from an unfortunate situation at work where we had no control over the data and can't change the format to something that's easy to stream. Therefore, processing it meant dealing with a high memory footprint in our pods, which I thought was a bit wasteful, since we only really needed some parts of the data and we didn't need to load the whole thing in memory. There are JSON streaming libraries available but it wasn't really to my taste (e.g. SAX-style, callbacks) and it doesn't flow how I like my code to look.
I left that job 6 months ago but the idea still lingered so after loads of mistakes with parsing, SIMD, binary encodings, FFI performance hits and general API design, I've finally arrived at something I'm proud of. I'll write a blog about the journey at some point since I think it's interesting.
It's at pre-1.0 at the moment and would really appreciate feedback. I've tried to really hard to minimize the API surface to make it simple. Please have a look.
Disclaimer:
I wrote this with the help of AI but made sure I was on the wheel before pushing code. Granted, there were times I just went LGTM like a Friday morning PR but those usually come back to bite you so I try not to.
jankdc•1h ago
Long time lurker here.
As the title states, I made a library to process and validate a massive JSON in Node.js where using JSON.parse() means saying hello to OOM.
It came from an unfortunate situation at work where we had no control over the data and can't change the format to something that's easy to stream. Therefore, processing it meant dealing with a high memory footprint in our pods, which I thought was a bit wasteful, since we only really needed some parts of the data and we didn't need to load the whole thing in memory. There are JSON streaming libraries available but it wasn't really to my taste (e.g. SAX-style, callbacks) and it doesn't flow how I like my code to look.
I left that job 6 months ago but the idea still lingered so after loads of mistakes with parsing, SIMD, binary encodings, FFI performance hits and general API design, I've finally arrived at something I'm proud of. I'll write a blog about the journey at some point since I think it's interesting.
It's at pre-1.0 at the moment and would really appreciate feedback. I've tried to really hard to minimize the API surface to make it simple. Please have a look.
Disclaimer:
I wrote this with the help of AI but made sure I was on the wheel before pushing code. Granted, there were times I just went LGTM like a Friday morning PR but those usually come back to bite you so I try not to.