frontpage.

If the future is that we tend towards replacing current tokenisation, I wanted to build intuitions around one of the core contribution of Meta's Byte Latent Transformer: entropy-based patching.

What are it's strengths and weaknesses? No better way of doing that than via tinkering with visualisations in a HF space so thought I'd share!

A few things that emerge as a result that you can try yourself:

1. robustness - high entropy means more compute will get dedicated to those bytes which include cases like low resource languages, spelling tasks etc

2. compute efficiency

2a. low entropy means less compute spent for those bytes

2b. in-context learning applies to tokenisation! It induces low entropy regions later on in the sequence and has to waste less compute!

I'm writing a blog post on an expanded version of this, updates via https://lucalp.dev or https://x.com/lucalp__

Show HN: Tinker with Meta's "tokenizer-free" patcher