frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Searchable compression for JSON – ~99% page skip and sub-ms lookups

https://github.com/kodomonocch1/see_proto
13•kodomonocch1•2h ago
Problem JSON/NDJSON is everywhere in data platforms, but compression usually breaks searchability. You either keep queryable raw stores (high I/O/egress) or compress into gz/zstd blobs (cheap to store, painful to probe). The “cloud tax” shows up as wasted reads.

What I built (SEE — Semantic Entropy Encoding) A schema-aware, searchable compression codec for JSON that keeps exists/pos lookups fast while still compressing. Internals: structure-aware delta + dictionaries, a PageDir + mini-index to jump to relevant pages, and a tuned Bloom filter that skips ~99% of pages. AutoPage (131/262 KiB) balances seek vs throughput.

Benchmarks (apples-to-apples, FULL) - size ratio: str ≈ 0.168–0.170, combined ≈ 0.194–0.196 - Bloom density ≈ 0.30; skip: present ≈ 0.99, absent ≈ 0.992 - lookup (ms): present p50/p95/p99 ≈ 0.18/0.28/0.37; absent ≈ 1.16–1.88/1.36–2.11/1.58–2.41 Numbers are stable on a commodity desktop (i7-13700K/96GB/Windows).

Try it in 10 minutes (no build) 1) pip install see_proto 2) python samples/quick_demo.py It prints size ratios, Bloom density, skip %, and lookup p50/p95/p99 on a packaged sample.

Why not “just zstd”? We sometimes lose pure size vs zstd alone. The win is searchable compression: Bloom + PageDir avoids touching most pages, so selective probes pay less I/O/egress and finish faster. On large log scans this often wins on TCO even with similar raw ratios.

Link (README + quick demo + one-pager) https://github.com/kodomonocch1/see_proto

Comments

kodomonocch1•2h ago
Happy to answer design details (page layout, Bloom tuning, codec selection, failure modes). Minimal Python examples for exists(key) and positions(key) are in the repo. If anyone needs deeper materials (reproducible FULL benches, wheel artifacts, and design notes) we have an NDA-gated VDR; I can share the form on request.
duanhjlt•2h ago
Congrats on the release. The SEE approach—schema-aware delta, dictionaries, PageDir, and tuned Bloom filters—seems thoughtfully engineered. The tradeoff versus pure zstd makes sense if selective probes dominate TCO. I’ll try the quick demo; curious about failure modes and Bloom tuning across varied schemas.
esafak•1h ago
It looks like you want to make money off this file format? That seems difficult. You would need to build a product around it first. I suppose some kind of a search or observability company could get funded if you have a demo. But be warned that running a company involves a lot more than developing a secret sauce.

The easiest thing is to popularize it and get a well-paying job from your fame. Make some friends and start your company together.

zahlman•1h ago
It doesn't exactly inspire confidence observing that the .see "archive" included in the zip distribution apparently gets further compressed by more than 2:1 within the zip archive....
throwuxiytayq•1h ago
“Millisecond lookups” sounds funny when you work in game dev. Anyway, interesting idea, thanks for sharing. Where the code at, though?
stuartjohnson12•1h ago
From OP's Github: "I am a 20-year-old university student living in Japan. Although I'm a liberal arts major, I aspire to become an engineer."

Just FYI - this is most likely vibe coding that a sycophantic AI has persuaded OP is cutting edge research.

Andrej Karpathy – AGI is still a decade away

https://www.dwarkesh.com/p/andrej-karpathy
48•ctoth•32m ago•18 comments

Live Stream from the Namib Desert

https://bookofjoe2.blogspot.com/2025/10/live-stream-from-namib-desert.html
266•surprisetalk•5h ago•56 comments

Scientists discover intercellular nanotubular communication system in brain

https://www.science.org/doi/10.1126/science.adr7403
69•marshfram•2h ago•13 comments

Ruby core team takes ownership of RubyGems and Bundler

https://www.ruby-lang.org/en/news/2025/10/17/rubygems-repository-transition/
428•sebiw•5h ago•206 comments

EVs are depreciating faster than gas-powered cars

https://restofworld.org/2025/ev-depreciation-blusmart-collapse/
139•belter•6h ago•327 comments

Meow.camera

https://meow.camera/
513•southwindcg•14h ago•179 comments

I built an F5 QKview scanner for CISA ED 26-01

https://www.usenabla.com/blog/emergency-scanning-cisa-endpoint
4•jdbohrman•5h ago•0 comments

Migrating from AWS to Hetzner

https://digitalsociety.coop/posts/migrating-to-hetzner-cloud/
898•pingoo101010•7h ago•496 comments

The Rapper 50 Cent, Adjusted for Inflation

https://50centadjustedforinflation.com/
216•gaws•1h ago•57 comments

AI has a cargo cult problem

https://www.ft.com/content/f2025ac7-a71f-464f-a3a6-1e39c98612c7
53•cs702•1h ago•37 comments

Resizeable Bar Support on the Raspberry Pi

https://www.jeffgeerling.com/blog/2025/resizeable-bar-support-on-raspberry-pi
77•speckx•1w ago•23 comments

4Chan Lawyer publishes Ofcom correspondence

https://alecmuffett.com/article/117792
150•alecmuffett•10h ago•208 comments

You did no fact checking, and I must scream

https://shkspr.mobi/blog/2025/10/i-have-no-facts-and-i-must-scream/
231•blenderob•3h ago•124 comments

Dead or Alive creator Tomonobu Itagaki, 58 passes away

https://www.gamedeveloper.com/design/dead-or-alive-creator-tomonobu-itagaki-has-passed-away-at-58
45•corvad•2h ago•9 comments

OpenAI Needs $400B In The Next 12 Months

https://www.wheresyoured.at/openai400bn/
12•chilipepperhott•15m ago•1 comments

Cartridge Chaos: The Official Nintendo Region Converter and More

https://nicole.express/2025/not-just-for-robert.html
10•zdw•5d ago•0 comments

Let's write a macro in Rust

https://hackeryarn.com/post/rust-macros-1/
77•hackeryarn•1w ago•31 comments

MIT physicists improve the precision of atomic clocks

https://news.mit.edu/2025/mit-physicists-improve-atomic-clocks-precision-1008
7•pykello•5d ago•1 comments

How I bypassed Amazon's Kindle web DRM

https://blog.pixelmelt.dev/kindle-web-drm/
1446•pixelmelt•21h ago•446 comments

Ask HN: How to stop an AWS bot sending 2B requests/month?

142•lgats•12h ago•81 comments

Trap the Critters with Paint

https://deepanwadhwa.github.io/freeze_trap/
25•deepanwadhwa•6d ago•13 comments

Read your way through Hà Nội

https://vietnamesetypography.com/samples/read-your-way-through-ha-noi/
62•jxmorris12•6d ago•55 comments

Show HN: OnlyJPG – Client-Side PNG/HEIC/AVIF/PDF/etc to JPG

https://onlyjpg.com
43•johnnyApplePRNG•6h ago•22 comments

Email bombs exploit lax authentication in Zendesk

https://krebsonsecurity.com/2025/10/email-bombs-exploit-lax-authentication-in-zendesk/
38•todsacerdoti•6h ago•11 comments

Stinkbug Leg Organ Hosts Symbiotic Fungi That Protect Eggs from Parasitic Wasps

https://bioengineer.org/stinkbug-leg-organ-hosts-symbiotic-fungi-that-protect-eggs-from-parasitic...
8•gmays•3h ago•1 comments

Next steps for BPF support in the GNU toolchain

https://lwn.net/Articles/1039827/
95•signa11•14h ago•18 comments

Amazon-backed, nuclear facility for Washington state

https://www.geekwire.com/2025/a-first-look-at-the-amazon-backed-next-generation-nuclear-facility-...
15•stikit•2h ago•1 comments

Metropolis 1998 lets you design every building in an isometric, pixel-art city (2024)

https://arstechnica.com/gaming/2024/08/metropolis-1998-lets-you-design-every-building-in-an-isome...
78•YesBox•3h ago•30 comments

New computer model helps reveal how the brain both adapts and misfires

https://now.tufts.edu/2025/10/16/flight-simulator-brain-reveals-how-we-learn-and-why-minds-someti...
50•XzetaU8•12h ago•18 comments

Your data model is your destiny

https://notes.mtb.xyz/p/your-data-model-is-your-destiny
354•hunglee2•2d ago•90 comments