frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Wordchipper – Rust BPE tokenizer, 9x faster than tiktoken

2•antimora•1h ago
Hey HN,

We're ZSpaceLabs, Burn framework contributors working to make Rust a first-class AI/ML stack.

We just released wordchipper, a Rust-native BPE tokenizer for the OpenAI GPT-2 tokenizer family (r50k, cl100k, o200k). On a 64-core machine with the o200k vocab (GPT-4o / GPT-5 tokenizer), we measured 2.4 GiB/s, about 9.2× faster than tiktoken-rs. Through the Python bindings it is typically 2–4× faster than tiktoken, depending on thread count.

The main design goal was to make the internals easy to swap. The tokenizer is split into two parts: pre-tokenization (lexer) and BPE span encoding. Each part can be replaced independently, which makes it easy to experiment with different combinations of lexer backends and span encoding algorithms.

Right now there are three lexer implementations. One uses fancy-regex and is fully compatible with tiktoken. Another uses regex-automata with a runtime DFA and is about 4–8× faster. The third uses logos with a compile-time DFA and is about 14–21× faster on cl100k and o200k.

Write-up with more details: https://zspacelabs.ai/wordchipper/articles/substitutable/

GitHub: https://github.com/zspacelabs/wordchipper

Happy to hear feedback, especially from people working on tokenization, large-scale inference pipelines, or Rust ML tooling.

Disney Exits OpenAI Deal After AI Giant Shutters Sora

https://www.hollywoodreporter.com/business/digital/openai-shutting-down-sora-ai-video-app-1236546...
1•timpera•2m ago•0 comments

Show HN: Watch TV for Free for India

https://tvdekh.com
1•isandeep1995•5m ago•0 comments

My Pragmatic Way to Use Labels in Password Managers

https://martin.sh/my-pragmatic-way-to-use-labels-in-password-managers/
2•showmypost•6m ago•0 comments

How to catch LiteLLM like security issues proactively/reactively?

https://github.com/dinakars777/ai-code-guardian
1•dinakars777•8m ago•0 comments

US SEC's ex-enforcement chief clashed with bosses over cases before leaving

https://www.reuters.com/business/finance/us-secs-ex-enforcement-chief-clashed-with-bosses-before-...
1•breve•8m ago•0 comments

Gl0wFlow – A plain-English scripting language and Rust runtime for AI

https://github.com/nikolakb/Gl0wFlow
1•Gl0wFl0w•8m ago•0 comments

xAI Will Win

https://twitter.com/shaunmmaguire/status/2036097323458343361
1•mhb•9m ago•0 comments

We're Not Vibe Engineering

https://jonathannen.com/not-vibe-engineering/
1•jwilliams•9m ago•0 comments

Designing repos for humans and agents

https://www.merolle.net/blog/designing-repos-for-humans-and-agents/
1•ryanmerolle•10m ago•0 comments

Front end devs need to have design skills

https://www.sarahgebauer.com/post/frontend-devs-need-to-have-design-skills/
1•speckx•11m ago•0 comments

Chinese EV maker BYD in talks to open Canadian dealerships, consultant says

https://www.theglobeandmail.com/business/article-china-ev-maker-byd-auto-dealerships-canada/
2•breve•12m ago•0 comments

OpenAI shutting down Sora app

https://www.nbcnews.com/tech/tech-news/openai-shuttering-sora-video-generating-service-rcna264989
10•websku•14m ago•3 comments

I сan't find a job in the Bay Area, so I'll search in London

https://relocateme.substack.com/p/i-cant-find-a-job-in-the-bay-area
3•andrewstetsenko•14m ago•0 comments

Go Naming Conventions: A Practical Guide

https://www.alexedwards.net/blog/go-naming-conventions
1•yurivish•14m ago•0 comments

UK game development suffers its 'sharpest recorded decline'

https://www.videogameschronicle.com/news/as-uk-game-development-suffers-its-sharpest-recorded-dec...
4•fidotron•15m ago•0 comments

Ossature: Spec-Driven Code Generation

https://ossature.dev/blog/introducing-ossature/
2•beshrkayali•18m ago•0 comments

Army raises enlistment age to 42, removes waiver for marijuana possession

https://taskandpurpose.com/news/army-enlistment-age-marijuana-waiver/
3•ilamont•18m ago•2 comments

Kicking the Tires on Temporal's Agent Skill

https://stevekinney.com/writing/temporal-developer-skill
1•stevekinney•19m ago•0 comments

United Relax Genius Marketing Campaign

https://twitter.com/byAnhtho/status/2036542952164704755
1•AnhTho_FR•21m ago•0 comments

Obsolete Sounds

https://citiesandmemory.com/obsolete-sounds/
2•benbreen•22m ago•0 comments

When upserts don't update but still write

https://www.datadoghq.com/blog/engineering/debugging-postgres-performance/
1•jpineman•22m ago•0 comments

The Most Innovative Companies of 2026

https://www.fastcompany.com/most-innovative-companies/list
1•kaycebasques•24m ago•0 comments

We're Saying Goodbye to Sora

https://twitter.com/soraofficialapp/status/2036532795984715896
13•octabond•24m ago•3 comments

US expected to send thousands more soldiers to Middle East, sources say

https://www.reuters.com/world/middle-east/us-expected-send-thousands-soldiers-middle-east-sources...
8•cdrnsf•24m ago•0 comments

Wedium – TikTok Made in Europe

https://wedium.social/
2•amai•28m ago•0 comments

Show HN: Bounty_OS the job market BLOWS

https://bountyos.com/pitch/
1•andrewconklin•31m ago•2 comments

Is anybody else bored of talking about AI?

https://blog.jakesaunders.dev/is-anybody-else-bored-of-talking-about-ai/
154•jakelsaunders94•33m ago•82 comments

Wonka's Whipple Scrumptious Fudgemallow Delight (2005)

http://foodisnice.blogspot.com/2005/08/chocolate-wonkas-whipple-scrumptious.html
1•microsoftedging•33m ago•0 comments

The Barium X Window System Toolkit for Common Lisp

https://tomscii.sig7.se/barium/
1•oumua_don17•34m ago•0 comments

Litellm PyPI supply chain attack

https://twitter.com/karpathy/status/2036487306585268612
2•vinnyglennon•34m ago•0 comments