frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

TOON – Token Oriented Object Notation

https://github.com/johannschopplich/toon
56•royosherove•1d ago

Comments

anonymoushn•22h ago
Hello, it's probably better to add leading spaces before all of the words rather than none of them
meander_water•13h ago
I don't get it, can't you just use yaml instead of inventing another DSL.
mhosayny•12h ago
It's more compact than YAML. More like a combination of YAML and CSV.
jscheel•8h ago
For repeating objects of the same structure, yaml will still require each key on each object, whereas this is a hybrid with csv, so it defines the keys once.
3cats-in-a-coat•8m ago
No one forces us to use objects in JSON with repeated keys you know.
inopinatus•1h ago
Norway.
dragonwriter•1h ago
YAML 1.2 has been out for 16 years now, so I would simply not assume that the suggestion to use YAML for a new purpose means “use YAML 1.1”.
inopinatus•1h ago
I could agree that you would not make poor assumptions.

Your LLM, however, may experience cross-format feature superposition and consequential spurious activation.

flyer23•35m ago
It is, also noone uses it:)
vessenes•12h ago
I’ll be interested to see benchmarks. My expectation is that accuracy will take a hit on mid or longer context prompts: I’d bet that the heavy use of JSON in fine tuning will end up impacting quality of a more terse (less reasoning space) novel encoding.

That said: I like the idea!

brian-bk•12h ago
There are a very light benchmarks in the Readme, or are you looking for more?
Mumps•11h ago
Do you mean the [0] Token Benchmarks section? I only see token count numbers.

Which doesn't address the question: do LLMs understand TOON the same as they would JSON? It's quite likely that this notation is not interpreted the same by most LLM, as they would JSON. So benchmarks on, say, data processing tasks, would be warranted.

[0] https://github.com/johannschopplich/toon?tab=readme-ov-file#...

tujux•10h ago
I think they're talking about these sections:

1. Retrieval Accuracy - https://github.com/johannschopplich/toon?tab=readme-ov-file#...

2. Performance by dataset - https://github.com/johannschopplich/toon?tab=readme-ov-file#...

moralestapia•10h ago
[flagged]
jayd16•1h ago
I'm not sure which one would win but its a bit telling that compression isn't mentioned at all.

I guess its about LLMs so the idea is has to be plaintext? But if you can train it on TOON can't you train it on BSON?

inopinatus•2h ago
JSON unmarshalling often has to consider separately whether an attribute is absent, false, zero, null, or the empty string, but this was never quite semantically ambiguous enough for my tastes, so adding that void-ish values may also now be serialised as a tuple of length [0] seems to me an excellent additional obfuscation.
joshribakoff•2h ago
The use case here is to reduce the token usage with LLMs, such as an agent that outputs a list of commands eg. Tuples with files to write and their new contents.

Supporting this use case doesn’t require perfectly marshaling every data structure ever.

But to your point the tool could have wider use cases without the limitations.

inopinatus•1h ago
If one trains a model to understand it then that model will inevitably emit it, which means in turn one shall have to parse it, and now the application supports TOON for anything, and good luck telling the users/customers any different.
Pxtl•2h ago
I'm sorry I don't see this adding value over various other formats. I don't really want a new object serialization format, I just want the existing ones to have the features I need. YAML but with static typing and schema. XML but without crazy internet features. TOML but with an object format that doesn't hurt my brain. JSON but with decent multiline strings and comments. NestedText but with a sub-standard that provides static-typing and schema and whatnot.
foxglacier•18m ago
The benchmarks show it performs better than them, so that's the value - cost savings and improved accuracy. I suppose you could convert JSON to TOON just for the LLM and not actually read it with your own brain.
hedgehog•58m ago
It would be interesting to compare this to BAML and TOML.
toobulkeh•16m ago
Definitely is a core feature of BAML. My main complaint with BAML is that it’s all or nothing. It’s very opinionated and we can’t get the benefits without the DX and vice versa. Separating this feature without requiring a DSL of model definition is a great add.
3cats-in-a-coat•12m ago
I'll say the obvious. A lot of this you can just do in JSON.

Let's take the example:

    {
      "users": [
        { "id": 1, "name": "Alice", "role": "admin" },
        { "id": 2, "name": "Bob", "role": "user" }
      ]
    }

    users[2]{id,name,role}:
      1,Alice,admin
      2,Bob,user
We can keep it JSON, but use more compact list expressions, as tuples when pragmatic:

    ["users",
       [1, "Alice", "admin"],
       [2, "Bob", "user"]
    ]
The thing is the game with LLMs is not what's shortest, but what's:

1. Mainstream, so they understand it.

2. What they're tuned for, and their tuned for what's mainstream (JSON).

If you want to go extreme compression you can shove it all in JSON strings too and keep the larger structure JSON:

    ["users",
       "1:admin:Alice",
       "2:user:Bob",
    ]
You may say "how is this better". Well it's better because it's still JSON, there's less to explain to the LLM, and to your other devs. Even if we use a weird compact format like "id:role:name" this is still shorter to explain than a completely different syntax with its whole world of rules.

Easy RISC-V

https://dramforever.github.io/easyriscv/
101•todsacerdoti•2h ago•11 comments

Claude for Excel

https://www.claude.com/claude-for-excel
395•meetpateltech•7h ago•302 comments

JetKVM – Control any computer remotely

https://jetkvm.com/
236•elashri•6h ago•130 comments

10M people watched a YouTuber shim a lock; the lock company sued him – bad idea

https://arstechnica.com/tech-policy/2025/10/suing-a-popular-youtuber-who-shimmed-a-130-lock-what-...
625•Brajeshwar•10h ago•252 comments

Simplify Your Code: Functional Core, Imperative Shell

https://testing.googleblog.com/2025/10/simplify-your-code-functional-core.html
116•reqo•2d ago•44 comments

Pyrex catalog from from 1938 with hand-drawn lab glassware [pdf]

https://exhibitdb.cmog.org/opacimages/Images/Pyrex/Rakow_1000132877.pdf
242•speckx•8h ago•58 comments

Go beyond Goroutines: introducing the Reactive paradigm

https://samuelberthe.substack.com/p/go-beyond-goroutines-introducing
25•samber•1w ago•13 comments

The new calculus of AI-based coding

https://blog.joemag.dev/2025/10/the-new-calculus-of-ai-based-coding.html
58•todsacerdoti•6h ago•39 comments

Why Busy Beaver hunters fear the Antihydra

https://benbrubaker.com/why-busy-beaver-hunters-fear-the-antihydra/
118•Bogdanp•6h ago•33 comments

MCP-Scanner – Scan MCP Servers for vulnerabilities

https://github.com/cisco-ai-defense/mcp-scanner
89•hsanthan•6h ago•27 comments

Rust cross-platform GPUI components

https://github.com/longbridge/gpui-component
444•xvilka•13h ago•186 comments

Tags to make HTML work like you expect

https://blog.jim-nielsen.com/2025/dont-forget-these-html-tags/
377•FromTheArchives•13h ago•201 comments

TOON – Token Oriented Object Notation

https://github.com/johannschopplich/toon
56•royosherove•1d ago•23 comments

Solving regex crosswords with Z3

https://blog.nelhage.com/post/regex-crosswords-z3/
40•atilimcetin•6d ago•0 comments

Avoid 2:00 and 3:00 am cron jobs (2013)

https://www.endpointdev.com/blog/2013/04/avoid-200-and-300-am-cron-jobs/
232•pera•6h ago•223 comments

Image Dithering: Eleven Algorithms and Source Code (2012)

https://tannerhelland.com/2012/12/28/dithering-eleven-algorithms-source-code.html
34•Bogdanp•3d ago•8 comments

When 'perfect' code fails

https://marma.dev/articles/2025/when-perfect-code-fails
26•vinhnx•8h ago•21 comments

It's not always DNS

https://notes.pault.ag/its-not-always-dns/
24•todsacerdoti•5h ago•15 comments

Sieve (YC X25) is hiring engineers to build video datasets for frontier AI

https://www.sievedata.com/
1•mvoodarla•6h ago

Study finds growing social circles may fuel polarization

https://phys.org/news/2025-10-friends-division-social-circles-fuel.html
76•geox•4h ago•75 comments

PSF has withdrawn $1.5M proposal to US Government grant program

https://pyfound.blogspot.com/2025/10/NSF-funding-statement.html
408•lumpa•8h ago•334 comments

Should LLMs just treat text content as an image?

https://www.seangoedecke.com/text-tokens-as-image-tokens/
132•ingve•6d ago•80 comments

The last European train that travels by sea

https://www.bbc.com/travel/article/20251024-the-last-european-train-that-travels-by-sea
129•1659447091•14h ago•122 comments

Show HN: Dlog – Journaling and AI coach that learns what drives well-being (Mac)

https://dlog.pro/
12•dr-j•6h ago•5 comments

Show HN: Erdos – open-source, AI data science IDE

https://www.lotas.ai/erdos
41•jorgeoguerra•7h ago•21 comments

Iroh-blobs 0.95 – New features – Iroh

https://www.iroh.computer/blog/iroh-blobs-0-95-new-features
7•janandonly•6d ago•0 comments

fnox, a secret manager that pairs well with mise

https://github.com/jdx/mise/discussions/6779
101•bpierre•6h ago•22 comments

Eight Million Copies of Moby-Dick (2014)

https://thevoltablog.wordpress.com/2014/01/27/nicolas-mugaveros-eight-million-copies-of-moby-dick...
30•awalias•4d ago•10 comments

Why Nigeria accepted GMOs

https://www.asimov.press/p/nigeria-crops
37•surprisetalk•5h ago•71 comments

Let the little guys in: A context sharing runtime for the personalised web

https://arjun.md/little-guys
55•louisbarclay•5h ago•11 comments