frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Searchable compression for JSON – ~99% page skip and sub-ms lookups

https://github.com/kodomonocch1/see_proto
13•kodomonocch1•1h ago
Problem JSON/NDJSON is everywhere in data platforms, but compression usually breaks searchability. You either keep queryable raw stores (high I/O/egress) or compress into gz/zstd blobs (cheap to store, painful to probe). The “cloud tax” shows up as wasted reads.

What I built (SEE — Semantic Entropy Encoding) A schema-aware, searchable compression codec for JSON that keeps exists/pos lookups fast while still compressing. Internals: structure-aware delta + dictionaries, a PageDir + mini-index to jump to relevant pages, and a tuned Bloom filter that skips ~99% of pages. AutoPage (131/262 KiB) balances seek vs throughput.

Benchmarks (apples-to-apples, FULL) - size ratio: str ≈ 0.168–0.170, combined ≈ 0.194–0.196 - Bloom density ≈ 0.30; skip: present ≈ 0.99, absent ≈ 0.992 - lookup (ms): present p50/p95/p99 ≈ 0.18/0.28/0.37; absent ≈ 1.16–1.88/1.36–2.11/1.58–2.41 Numbers are stable on a commodity desktop (i7-13700K/96GB/Windows).

Try it in 10 minutes (no build) 1) pip install see_proto 2) python samples/quick_demo.py It prints size ratios, Bloom density, skip %, and lookup p50/p95/p99 on a packaged sample.

Why not “just zstd”? We sometimes lose pure size vs zstd alone. The win is searchable compression: Bloom + PageDir avoids touching most pages, so selective probes pay less I/O/egress and finish faster. On large log scans this often wins on TCO even with similar raw ratios.

Link (README + quick demo + one-pager) https://github.com/kodomonocch1/see_proto

Comments

kodomonocch1•1h ago
Happy to answer design details (page layout, Bloom tuning, codec selection, failure modes). Minimal Python examples for exists(key) and positions(key) are in the repo. If anyone needs deeper materials (reproducible FULL benches, wheel artifacts, and design notes) we have an NDA-gated VDR; I can share the form on request.
duanhjlt•1h ago
Congrats on the release. The SEE approach—schema-aware delta, dictionaries, PageDir, and tuned Bloom filters—seems thoughtfully engineered. The tradeoff versus pure zstd makes sense if selective probes dominate TCO. I’ll try the quick demo; curious about failure modes and Bloom tuning across varied schemas.
esafak•1h ago
It looks like you want to make money off this file format? That seems difficult. You would need to build a product around it first. I suppose some kind of a search or observability company could get funded if you have a demo. But be warned that running a company involves a lot more than developing a secret sauce.

The easiest thing is to popularize it and get a well-paying job from your fame.

zahlman•1h ago
It doesn't exactly inspire confidence observing that the .see "archive" included in the zip distribution apparently gets further compressed by more than 2:1 within the zip archive....
throwuxiytayq•1h ago
“Millisecond lookups” sounds funny when you work in game dev. Anyway, interesting idea, thanks for sharing. Where the code at, though?
stuartjohnson12•1h ago
From OP's Github: "I am a 20-year-old university student living in Japan. Although I'm a liberal arts major, I aspire to become an engineer."

Just FYI - this is most likely vibe coding that a sycophantic AI has persuaded OP is cutting edge research.

Semi-artificial leaf interfacing organic semiconductors and enzymes

https://www.cell.com/joule/fulltext/S2542-4351(25)00346-0
1•PaulHoule•41s ago•0 comments

Andrej Karpathy – AGI is still a decade away

https://www.dwarkesh.com/p/andrej-karpathy
1•ctoth•49s ago•0 comments

Show HN: SteganoPDF – Embed any file in a PDF

https://www.signmypdf.com/tools/steganopdf-embed-any-file-in-pdf/
1•aqrashik•2m ago•0 comments

A Brief History of Rubygems.org

https://lwn.net/SubscriberLink/1042131/319050141553ec37/
1•jmarchello•3m ago•0 comments

Why AI Will Widen the Gap Between Superstars and Everybody Else

https://www.wsj.com/lifestyle/workplace/ai-workplace-tensions-what-to-do-c45f6b51
1•born_a_skeptic•3m ago•0 comments

The new science of strong materials,JE Gordon

https://archive.org/details/newscienceofstro00jame
1•akshatjiwan•3m ago•0 comments

Thou shalt not let AI run amok: Vatican wants global rules

https://www.theregister.com/2025/10/17/vatican_seminar_calls_for_global/
1•rntn•3m ago•0 comments

The Can That Outlived Its Creator

https://kyytpress.substack.com/p/the-can-that-outlived-its-creator
1•shadowvoxing•5m ago•0 comments

Everything is Amazing and Nobody is Happy (about coding with LLMs)

https://coding-with-ai.dev/posts/ai-coding-tools-amazing/
1•codeclimber•7m ago•0 comments

Picasso painting goes missing en route to exhibition

https://www.cnn.com/2025/10/17/style/picasso-still-life-guitar-missing-spain-intl-scli
1•mooreds•11m ago•0 comments

Scammers are pretending to be Elon Musk. They're stealing millions

https://gizmodo.com/elon-musk-spacex-neuralink-grok-xai-investment-scam-2000669248
1•gnabgib•11m ago•0 comments

Promptlet: AI Prompt Enhancement Manager for macOS

https://www.josh.ing/promptlet
2•coloneltcb•11m ago•0 comments

AE86 Electric conversion with manual transmission [video]

https://www.youtube.com/watch?v=DE2oDKguy3Q
1•Rumudiez•14m ago•0 comments

Exception Handling considered harmful (2005)

https://www.lighterra.com/papers/exceptionsharmful/
1•lr0•18m ago•0 comments

Is Postgres Read Heavy or Write Heavy?

https://www.crunchydata.com/blog/is-postgres-read-heavy-or-write-heavy-and-why-should-you-care
1•soheilpro•18m ago•1 comments

RFC: Reinforcement for Creativity

https://github.com/POlLLOGAMER/RfC-Reinforcement-for-Creativity
1•KaoruAK•18m ago•0 comments

Ed 26-01: Mitigate Vulnerabilities in F5 Devices

https://www.cisa.gov/news-events/directives/ed-26-01-mitigate-vulnerabilities-f5-devices
1•TurkishPoptart•19m ago•0 comments

Ask HN: If AI makes no progress, are its abilities enough to justify valuations?

1•sendos•20m ago•1 comments

Drought's Solution: Could Seawater Be the Answer? [video]

https://www.youtube.com/watch?v=ZGUKJz8o34k
1•ahmetcadirci25•22m ago•0 comments

We built PintPoints, a game that turns bar crawls into live competitions

https://play.google.com/store/apps/details?id=app.lunafox.pintpoints&hl=en_GB
1•barathonapp•22m ago•1 comments

The Micro Shift

https://steward.ventures/micro-shift
1•Aloke•22m ago•0 comments

Free Graphic Cards for Everyone

https://idiallo.com/blog/free-graphic-cards-for-everyone
1•ambigious7777•22m ago•0 comments

What the 3.0 Release Tells Us About WebAssembly's Uncertain Future

https://redmonk.com/kholterhoff/2025/10/17/wasms-identity-crisis/
3•mooreds•25m ago•0 comments

Show HN: Kortx – MCP server connecting Claude Code to GPT-5

https://github.com/effatico/kortx-mcp
1•sleepy_ghost•26m ago•0 comments

Samsung Gives Up on Super Thin Smartphones

https://www.macrumors.com/2025/10/17/samsung-reportedly-gives-up-on-thin-smartphones/
2•bartekrutkowski•29m ago•0 comments

Hyperscalers try to beat the heat with larger racks, more air flow

https://www.theregister.com/2025/10/17/hyperscaler_datacenter_ocp/
1•rntn•30m ago•0 comments

Resolving an Amalgam of Issues During the Elite Specialization Beta

https://www.guildwars2.com/en/news/resolving-issues-during-the-elite-specialization-beta/
1•debesyla•30m ago•0 comments

The US is paying the price for complacency with China. Time is running out

https://www.semafor.com/article/10/16/2025/the-us-is-paying-the-price-for-complacency-with-china-...
4•zerosizedweasle•32m ago•2 comments

Troubled Real Estate Firm at Center of Banks' Latest Loan Pinch

https://www.bloomberg.com/news/articles/2025-10-17/troubled-real-estate-firm-at-center-of-banks-l...
1•zerosizedweasle•34m ago•0 comments

A Simple Guide to Five Normal Forms in Relational Database Theory (1983)

https://dl.acm.org/doi/pdf/10.1145/358024.358054
1•Rendello•35m ago•1 comments