frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: ISON – Data format that uses 30-70% fewer tokens than JSON for LLMs

https://github.com/maheshvaikri-code/ison
4•maheshvaikri99•12h ago
ISON (Interchange Simple Object Notation) - a data format optimized for LLMs and Agentic AI.

The problem: JSON wastes tokens. Curly braces, quotes, colons, commas - all eat into your context window.

ISON uses tabular patterns that LLMs already understand from training data:

JSON (87 tokens): { "users": [ {"id": 1, "name": "Alice", "email": "alice@example.com"}, {"id": 2, "name": "Bob", "email": "bob@example.com"} ] }

ISON (34 tokens): table.users id:int name:string email 1 Alice alice@example.com 2 Bob bob@example.com

Features: - 30-70% token reduction - Type annotations - References between tables - Schema validation (ISONantic) - Streaming format (ISONL)

Implementations: Python, JavaScript, TypeScript, Rust, C++ 9 packages, 171+ tests passing

pip install ison-py # Parser pip install isonantic # Validation & schemas

npm install ison-parser # JavaScript npm install ison-ts # TypeScript with full types npm install isonantic-ts # Validation & schemas

[dependencies] ison-rs = "1.0" isonantic-rs = "1.0" # Validation & schemas

Looking for feedback on the format design.

Comments

dtagames•12h ago
Personally, I'm against anything that goes against the standard LLM data formats of JSON and MD. Any perceived economy is outweighed by confusion when none of these alternative formats exist in the training data in any real sense and every one of them has to be translated (by the LLM) to be used in your code or to apply to your real data.

Any tokens you saved will be lost 3x over in that process, as well as introducing confusing new context information that's unrelated to your app.

maheshvaikri99•12h ago
Fair point, but I'd push back on "none of these alternative formats exist in training data."

ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:

table.users id name email 1 Alice alice@example.com

...is structurally identical to markdown tables and CSVs that dominate training corpora.

On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.

If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.

The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.

dClauzel•11h ago
Just use CSV at this point :D
maheshvaikri99•8h ago
Ha, fair. CSV gets you 80% there.

The 20% ISON adds: - Multiple named tables in one doc - Cross-table references - No escaping hell (quoted strings handled cleanly) - Schema validation (ISONantic)

If you're stuffing one flat table into context, CSV works fine. When you have users + orders + products with relationships, ISON saves you from JSON's bracket tax.

throw03172019•10h ago
So CSV with a “typed” header?
maheshvaikri99•8h ago
Essentially yes, but with a few additions CSV lacks:

1. Multiple tables in one document (table.users, table.orders) 2. References between tables (:user:42 links to id 42) 3. Object blocks for config/metadata 4. Streaming format (ISONL) for large datasets

The type annotations are optional - they help LLMs understand the schema without inference.

You could think of it as "CSV that knows about relationships" - which is exactly what multi-agent systems need when passing state around.

throw03172019•7h ago
Got it. Thanks.

Any data on how LLMs like this format? Are they able to make the associations etc?

maheshvaikri99•6h ago
Yes - I ran a 300 Questions benchmark comparing ISON vs JSON vs JSON-COMPACT etc on the same tasks.

ISON: 88.3% accuracy JSON: lower (can share exact numbers if interested)

Tested across Claude, GPT-4, DeepSeek, and Llama 3.

The key finding: LLMs handle tabular formats natively because they've seen billions of markdown tables and CSVs in training. No special prompting needed.

For associations, I tested with multi-table ISON docs like:

table.users id name 1 Alice 2 Bob

table.orders id user_id product 101 :1 Widget 102 :2 Gadget

Prompt: "What did Alice order?"

All models correctly resolved :1 → Alice → Widget without explicit instructions about the reference syntax.

The 30-70% token savings come from removing JSON's structural overhead (braces, quotes, colons, commas) while keeping the same semantic density.

Haven't published formal benchmarks on this yet - that's good feedback. I should.

dmarwicke•8h ago
tried this with msgpack last year. accuracy tanked. models have seen a trillion json examples, like 12 of whatever format you invent

Show HN: Ez FFmpeg – Video editing in plain English

http://npmjs.com/package/ezff
107•josharsh•3h ago•34 comments

Show HN: Mysti – Claude, Codex, and Gemini debate your code, then synthesize

https://github.com/DeepMyst/Mysti
4•bahaAbunojaim•3d ago•0 comments

Show HN: Witr – Explain why a process is running on your Linux system

https://github.com/pranshuparmar/witr
339•pranshuparmar•20h ago•60 comments

Show HN: Xcc700: Self-hosting mini C compiler for ESP32 (Xtensa) in 700 lines

https://github.com/valdanylchuk/xcc700
128•isitcontent•20h ago•23 comments

Show HN: AutoLISP interpreter in Rust/WASM – a CAD workflow invented 33 yrs ago

https://acadlisp.de/noscript.html
119•holg•20h ago•37 comments

Show HN: GeneGuessr – a daily biology web puzzle

https://geneguessr.brinedew.bio/
84•brinedew•4d ago•20 comments

Show HN: Gaming Couch – a local multiplayer party game platform for 8 players

https://gamingcouch.com
417•ChaosOp•5d ago•114 comments

Show HN: Lamp Carousel – DIY kinetic sculpture powered by lamp heat (2024)

https://evan.widloski.com/posts/spinners/
95•Evidlo•2d ago•18 comments

Show HN: Minimalist editor that lives in browser, stores everything in the URL

https://github.com/antonmedv/textarea
460•medv•2d ago•164 comments

Show HN: Exploring Mathematics with Python

https://coe.psu.ac.th/ad/explore/
265•Andrew2565•1w ago•29 comments

Show HN: Automoderated Anonymous Wall of Messages

https://wall.tulv.in/
5•atulvi•16h ago•3 comments

Show HN: Vibium – Browser automation for AI and humans, by Selenium's creator

https://github.com/VibiumDev/vibium
435•hugs•2d ago•122 comments

Show HN: Hybrid-Transpiler – A tool to convert C++ to Rust and Go

https://github.com/cmc-labo/hybrid-transpiler
6•hpscript•20h ago•1 comments

Show HN: ISON – Data format that uses 30-70% fewer tokens than JSON for LLMs

https://github.com/maheshvaikri-code/ison
4•maheshvaikri99•12h ago•9 comments

Show HN: Spacelist, a TUI for Aerospace window manager

https://github.com/magicmark/spacelist
4•markl42•12h ago•1 comments

Show HN: CineCLI – Browse and torrent movies directly from your terminal

https://github.com/eyeblech/cinecli
343•samsep10l•4d ago•107 comments

Show HN: Jmail – Google Suite for Epstein files

https://www.jmail.world
1550•lukeigel•6d ago•360 comments

Show HN: Domain Search MCP – AI-powered domain availability checker

https://github.com/dorukardahan/domain-search-mcp
5•dorukardahan•1d ago•3 comments

Show HN: I embedded 10M StreetView images

https://view.geospot.sdan.io/
12•sdan•1d ago•5 comments

Show HN: Turn raw HTML into production-ready images for free

https://html2png.dev
151•alvinunreal•3d ago•80 comments

Show HN: Polibench – compare political bias across AI models

https://polibench.vercel.app/
3•randomkiwi99•17h ago•0 comments

Show HN: Twine – A tool to dynamically trace calls in production Elixir systems

https://github.com/ollien/twine
3•todsacerdoti•17h ago•0 comments

Show HN: A small embeddable and hackable Lisp-2 interpreter in C

https://github.com/mistivia/bamboo-lisp
2•mistivia•17h ago•0 comments

Show HN: I was tired of link shorteners, so I built Rediredge

https://leotrapani.com/blog/rediredge
3•trapani•18h ago•0 comments

Show HN: QBridge, a clean, modern iOS alternative to Cordova and Capacitor

https://github.com/Qbix/QBridge
2•EGreg•19h ago•2 comments

Show HN: Web CLI – Browser-based terminal with multi-tab support

https://github.com/pozgo/web-cli
3•polinux•19h ago•0 comments

Show HN: HN Wrapped 2025 - an LLM reviews your year on HN

https://hn-wrapped.kadoa.com?year=2025
313•hubraumhugo•6d ago•153 comments

Show HN: What if Dr. Mario had more than 3 colors?

https://pressbin.com/minigames/dr-blocks.html
5•jawns•14h ago•0 comments

Show HN: A schema-first, multi-agent pipeline for autonomous research

https://github.com/giatenica/gia-agentic-short
2•7777777phil•20h ago•0 comments

Show HN: A local-first, reversible PII scrubber for AI workflows

https://medium.com/@tj.ruesch/a-local-first-reversible-pii-scrubber-for-ai-workflows-using-onnx-a...
37•tjruesch•2d ago•13 comments