frontpage.

I've seen a few "Claude tokenizers" floating around lately with all the 4.7 chatter, but most of them just hit the count_tokens endpoint and hand you back a number. You don't actually see how your text gets split or understand the changes from 4.6 to 4.7.

I built this a while back for doing some mech interp research. It faithfully represents Claude token splitting - showing hidden tokens, real boundaries and so on. It is not cheap to run - essentially n^2 cost - you could optimise for longer sequences but you are not guaranteed a faithful representation if so.

Open Source: https://github.com/R0bk/claude-tokenizer

Feedback welcome, let me know if there are any edge cases that look wrong.

P.S. I'd expect this to face a similar fate as streaming chunk and prefill based token extraction methods did. I do worry about the ability to do independent research once it's fully closed off and would love it if there was more public frontier tokenizers.

So What If They Have My Data?

Kimi K2.6: Advancing Open-Source Coding

Licensing Best Practices for the Sharing of Scientific Data

The printing press for biological data (Sterling Hooten)

MoA-X: Mixture of Agents Orchestration Framework

Top Gun 3 Is Happening: The Need for Speed Lives On

Anthropic tests user trust with ID and selfie checks for Claude

The "AI Vulnerability Storm": Building a "Mythos- Ready" Security Program [pdf]

I'm never buying another Kindle, and neither should you

TIL: Checksumming Files Recursively with Rclone

Badvibes – Lint for Vibe Coders

Show HN: A web-based replacement for Nvidia's CUDA occupancy spreadsheet

Tech CEOs Think AI Will Let Them Be Everywhere at Once

Has cosmic philosophy conjectures infiltrated AI?

Astronaut's astounding iPhone 17 Pro Max video shows 'Earthset' from space

Baltic nations brace for impact of Iran war delaying US weapons shipments

Kimi K2.6: Advancing Open-Source Coding

Why I De-Googled

How Cybercrime Became a Leading Industry in 'Scambodia'

My practitioner view of program analysis

Are browsers (fully) Opus ready yet?

Wife Acceptance Factor

Best Wispr Flow Alternatives for Android

Books Are Not Remotely Too Expensive

I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs

You Can Purchase Non-Smart TVs from Samsung

The Mystery in the Medicine Cabinet: Acetaminophen, ibuprofen, and what to know

Kimi K2.6

'Reefer Madness,' the PSA That Backfired Spectacularly

Cancelling ARM deployments in Topaz – what it means for an emulator

Show HN: Actual Claude Tokenizer

So What If They Have My Data?

Kimi K2.6: Advancing Open-Source Coding

Licensing Best Practices for the Sharing of Scientific Data

The printing press for biological data (Sterling Hooten)

MoA-X: Mixture of Agents Orchestration Framework

Top Gun 3 Is Happening: The Need for Speed Lives On

Anthropic tests user trust with ID and selfie checks for Claude

The "AI Vulnerability Storm": Building a "Mythos- Ready" Security Program [pdf]

I'm never buying another Kindle, and neither should you

TIL: Checksumming Files Recursively with Rclone

Badvibes – Lint for Vibe Coders

Show HN: A web-based replacement for Nvidia's CUDA occupancy spreadsheet

Tech CEOs Think AI Will Let Them Be Everywhere at Once

Has cosmic philosophy conjectures infiltrated AI?

Astronaut's astounding iPhone 17 Pro Max video shows 'Earthset' from space

Baltic nations brace for impact of Iran war delaying US weapons shipments

Kimi K2.6: Advancing Open-Source Coding

Why I De-Googled

How Cybercrime Became a Leading Industry in 'Scambodia'

My practitioner view of program analysis

Are browsers (fully) Opus ready yet?

Wife Acceptance Factor

Best Wispr Flow Alternatives for Android

Books Are Not Remotely Too Expensive

I prompted ChatGPT, Claude, Perplexity, and Gemini and watched my Nginx logs

You Can Purchase Non-Smart TVs from Samsung

The Mystery in the Medicine Cabinet: Acetaminophen, ibuprofen, and what to know

Kimi K2.6

'Reefer Madness,' the PSA That Backfired Spectacularly

Cancelling ARM deployments in Topaz – what it means for an emulator