Show HN: ISON – Data format that uses 30-70% fewer tokens than JSON for LLMs

https://github.com/maheshvaikri-code/ison

7•maheshvaikri99•1mo ago

ISON (Interchange Simple Object Notation) - a data format optimized for LLMs and Agentic AI.

The problem: JSON wastes tokens. Curly braces, quotes, colons, commas - all eat into your context window.

ISON uses tabular patterns that LLMs already understand from training data:

JSON (87 tokens): { "users": [ {"id": 1, "name": "Alice", "email": "alice@example.com"}, {"id": 2, "name": "Bob", "email": "bob@example.com"} ] }

ISON (34 tokens): table.users id:int name:string email 1 Alice alice@example.com 2 Bob bob@example.com

Features: - 30-70% token reduction - Type annotations - References between tables - Schema validation (ISONantic) - Streaming format (ISONL)

Implementations: Python, JavaScript, TypeScript, Rust, C++ 9 packages, 171+ tests passing

pip install ison-py # Parser pip install isonantic # Validation & schemas

npm install ison-parser # JavaScript npm install ison-ts # TypeScript with full types npm install isonantic-ts # Validation & schemas

[dependencies] ison-rs = "1.0" isonantic-rs = "1.0" # Validation & schemas

Looking for feedback on the format design.

Comments

dtagames•1mo ago

Personally, I'm against anything that goes against the standard LLM data formats of JSON and MD. Any perceived economy is outweighed by confusion when none of these alternative formats exist in the training data in any real sense and every one of them has to be translated (by the LLM) to be used in your code or to apply to your real data.

Any tokens you saved will be lost 3x over in that process, as well as introducing confusing new context information that's unrelated to your app.

maheshvaikri99•1mo ago

Fair point, but I'd push back on "none of these alternative formats exist in training data."

ISON isn't inventing new syntax. It's CSV/TSV with a header - which LLMs have seen billions of times. The table format:

table.users id name email 1 Alice alice@example.com

...is structurally identical to markdown tables and CSVs that dominate training corpora.

On the "3x translation overhead" - ISON isn't meant for LLM-to-code interfaces where you need JSON for an API call. It's for context stuffing: RAG results, memory retrieval, multi-agent state passing.

If I'm injecting 50 user records into context for an LLM to reason over, I never convert back to JSON. The LLM reads ISON directly, reasons over it, and responds.

The benchmark: same data, same prompt, same task. ISON uses fewer tokens and gets equivalent accuracy. Happy to share the test cases if you want to verify.

dtagames•1mo ago

That's exactly the problem. Why convert anything, especially if it's as lossy as CSVs are? You lose nesting and the rest of your structure in favor of a single header row. That's not a benefit.

If your real data is in JSON (and in JS/TS apps, it always is at runtime as only JSON objects exist in that language) it makes no sense to ever convert it, period.

Besides, corporate report type CSVs that are in training materials don't have data shapes anything like JSON or even most businesses software. You're crippling an established and useful data carrier in order to save pennies on tokens. Tokens are getting cheaper, so it's the wrong optimization.

maheshvaikri99•1mo ago

Fair enough. Let me clarify the use case:

ISON isn't meant to replace JSON in your application. Your JS/TS code still uses JSON objects internally. ISON is specifically for the LLM context window.

The flow: App (JSON) → serialize to ISON → inject into prompt → LLM reasons → response → your app

You're right that nesting is lost. But for LLM reasoning, flat structures often work better. LLMs struggle with deeply nested JSON - they lose track of parent-child relationships 4+ levels deep.

On "tokens are getting cheaper": True for API costs. But context windows are still limited. When you're stuffing RAG results, memory, agent state, and user history into 128K tokens, every byte matters. It's not about saving money - it's about fitting more context.

On "wrong optimization": I ran the benchmark. Same data, same task. ISON: 88.3% accuracy. JSON: 84.7%. The LLM actually performed better with the tabular format, not just "equivalent for fewer tokens."

## BENCHMARK STATS:

TOKEN EFFICIENCY: ISON: 3,550 tokens JSON: 12,668 tokens

  ISON vs JSON:        72.0% reduction

LLM ACCURACY (300 Questions): ISON: 265/300 ( 88.3%) JSON: 254/300 ( 84.7%)

EFFICIENCY (Acc/1K): ISON: 24.88 JSON: 6.68 ISON is 272.3% MORE EFFICIENT than JSON!

But I hear you - if your data is deeply nested and that nesting carries semantic meaning the LLM needs, JSON might be the right choice. ISON works best for relational/tabular data going into context.

dClauzel•1mo ago

Just use CSV at this point :D

maheshvaikri99•1mo ago

Ha, fair. CSV gets you 80% there.

The 20% ISON adds: - Multiple named tables in one doc - Cross-table references - No escaping hell (quoted strings handled cleanly) - Schema validation (ISONantic)

If you're stuffing one flat table into context, CSV works fine. When you have users + orders + products with relationships, ISON saves you from JSON's bracket tax.

throw03172019•1mo ago

So CSV with a “typed” header?

maheshvaikri99•1mo ago

Essentially yes, but with a few additions CSV lacks:

1. Multiple tables in one document (table.users, table.orders) 2. References between tables (:user:42 links to id 42) 3. Object blocks for config/metadata 4. Streaming format (ISONL) for large datasets

The type annotations are optional - they help LLMs understand the schema without inference.

You could think of it as "CSV that knows about relationships" - which is exactly what multi-agent systems need when passing state around.

throw03172019•1mo ago

Got it. Thanks.

Any data on how LLMs like this format? Are they able to make the associations etc?

maheshvaikri99•1mo ago

Yes - I ran a 300 Questions benchmark comparing ISON vs JSON vs JSON-COMPACT etc on the same tasks.

ISON: 88.3% accuracy JSON: lower (can share exact numbers if interested)

Tested across Claude, GPT-4, DeepSeek, and Llama 3.

The key finding: LLMs handle tabular formats natively because they've seen billions of markdown tables and CSVs in training. No special prompting needed.

For associations, I tested with multi-table ISON docs like:

table.users id name 1 Alice 2 Bob

table.orders id user_id product 101 :1 Widget 102 :2 Gadget

Prompt: "What did Alice order?"

All models correctly resolved :1 → Alice → Widget without explicit instructions about the reference syntax.

The 30-70% token savings come from removing JSON's structural overhead (braces, quotes, colons, commas) while keeping the same semantic density.

Haven't published formal benchmarks on this yet - that's good feedback. I should.

dmarwicke•1mo ago

tried this with msgpack last year. accuracy tanked. models have seen a trillion json examples, like 12 of whatever format you invent

maheshvaikri99•1mo ago

Token Efficiency

  | Format       | Tokens | vs JSON  |
  |--------------|--------|----------|
  | ISONGraph    | 639    | -69%     |
  | ISON         | 685    | -66%     |
  | TOON         | 856    | -58%     |
  | JSON Compact | 1,072  | -47%     |
  | JSON         | 2,039  | baseline |

  LLM Accuracy

  | Format       | Correct | Accuracy | Acc/1K Tokens |
  |--------------|---------|----------|---------------|
  | ISONGraph    | 46/50   | 92.0%    | 143.97        |
  | ISON         | 44/50   | 88.0%    | 128.47        |
  | JSON         | 42/50   | 84.0%    | 41.20         |
  | JSON Compact | 41/50   | 82.0%    | 76.49         |
  | TOON         | 40/50   | 80.0%    | 93.46         |

  Key Findings

  1. ISONGraph wins on both efficiency AND accuracy - 92% correct with fewest tokens
  2. ISON/ISONGraph excel at multi-hop queries - LLM can follow relationships easily
  3. Acc/1K metric shows ISONGraph provides 3.5x more value per token than JSON
  4. Graph-specific format helps LLM understand relationships better than flat JSON

maheshvaikri99•1mo ago

Published the Benchmark with Results

https://ison.dev/benchmark.html

https://github.com/maheshvaikri-code/ison/tree/main/benchmar...

quinncom•1mo ago

When including ISON alongside normal text, which language should you use for the code fence info string? Is `ison` a known code type, i.e.:

```ison object.config timeout 30 debug true api_key "sk-xxx-secret" max_retries 3 ```

Show HN: Browser based state machine simulator and visualizer

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Stacky – certain block game clone

Show HN: Slack CLI for Agents

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Browser based state machine simulator and visualizer

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Stacky – certain block game clone

Show HN: Slack CLI for Agents

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: Slop News – HN front page now, but it's all slop

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: ISON – Data format that uses 30-70% fewer tokens than JSON for LLMs

Comments