The missed opportunity of constrained decoding

https://michaelorenstein.com/blog/zero-entropy-tokens/

2•killcoder•3w ago

Comments

killcoder•3w ago

I was working on a speculative decoding optimisation and its accompanying blog post. Explaining the more basic concepts filled so much of the post I decided to pull them out, forming this article.

I had a bit too much fun with the tokenisation diagrams / animations. The raw text is provided to an Astro component, which tokenises it, and forms the individual DOM elements of the tokens. I find it really hard to read 'tokenised' text, I figured some consistent colouring would help. The 'Probabilities' component is a trivial grid, but all the other components support 'word wrap'.

I ended up writing a 'responsive design aware graph colouring solver'.

Multiple screen widths, 'desktop' and 'mobile' are 'simulated', forming an adjacency graph of tokens that touch. Colours are then greedily allocated, then optimised per page over a few hundred iterations, swapping allocations to enforce minimum hue distance between touching tokens at those common screen sizes. The optimising value function prioritises even distribution of colours, because it looks nicer than maximal hue difference.

Originally I naively outputted the palette styles per component, but found the css post processing optimisers didn't handle that as well as I'd have thought. So then I wrote a little 'CSS compiler' that takes the high level palette and timing concepts of the animations, and optimally merges rule declarations.

The start of the post really relies on the animation occurring while fully in view, so I set up some IntersectionObservers that do the 'please scroll' text.

I tried my best to have it all work when JS is disabled on the client. I tried to get the 'hovering' to be CSS-only, but found the JS solution much more performant.

The DAG diagrams are formed with this neat Needleman-Wunsch algorithm from the bioinformatics field. The Astro component accepts several 'examples' then aligns common subsequences, producing the CSS grid and the 'basic SVG' on the server. The responsive nature meant I had to move the final 'allow' generation to the client.

Some browsers seem to throttle the token animations sometimes but I haven't figured out what causes that. This is my first time leaning hard on CSS variables.

EVs Are a Failed Experiment

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

CCC (Claude's C Compiler) on Compiler Explorer

Homeland Security Spying on Reddit Users

Actors with Tokio (2021)

Can graph neural networks for biology realistically run on edge devices?

Deeper into the shareing of one air conditioner for 2 rooms

Weatherman introduces fruit-based authentication system to combat deep fakes

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

A Curated List of ML System Design Case Studies

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

Open Problems in Mechanistic Interpretability

Bye Bye Humanity: The Potential AMOC Collapse

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

Digital Iris [video]

Essential CDN: The CDN that lets you do more than JavaScript

They Hijacked Our Tech [video]

Vouch

HRL Labs in Malibu laying off 1/3 of their workforce

Show HN: High-performance bidirectional list for React, React Native, and Vue

Show HN: I built a Mac screen recorder Recap.Studio

Ask HN: Codex 5.3 broke toolcalls? Opus 4.6 ignores instructions?

Vectors and HNSW for Dummies

Sanskrit AI beats CleanRL SOTA by 125%

'Washington Post' CEO resigns after going AWOL during job cuts

Claude Opus 4.6 Fast Mode: 2.5× faster, ~6× more expensive

TSMC to produce 3-nanometer chips in Japan

Quantization-Aware Distillation

List of Musical Genres