frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Splintr – Rust BPE tokenizer, 12x faster than tiktoken for batches

https://github.com/farhan-syah/splintr
1•fs90•2mo ago
Hi HN,

I built Splintr, a BPE tokenizer in Rust (with Python bindings), because I found existing Python-based tokenizers were bottlenecking my data processing pipelines.

While OpenAI's tiktoken is the gold standard for correctness, I found I could get significantly better throughput on modern multi-core CPUs by rethinking how parallelism is applied.

Splintr achieves ~111 MB/s batch throughput (vs ~9 MB/s for tiktoken).

The Design Choice: "Sequential by Default" One of the most interesting findings during development was that naive parallelism actually hurts performance for typical LLM inputs. Thread pool overhead is significant for texts under 1MB.

I implemented a hybrid strategy:

Single Text (encode): Purely sequential. It’s 3-4x faster than tiktoken simply by using pcre2 with JIT instead of standard regex handling.

Batch Processing (encode_batch): Parallelizes across texts using Rayon, rather than within a text. This saturates all cores without the overhead of splitting small strings.

Other Features:

Safety: Strict UTF-8 compliance, including a streaming decoder that correctly buffers incomplete multi-byte characters.

Compatibility: Drop-in support for cl100k_base (GPT-4), o200k_base (GPT-4o), and llama3 vocabularies.

The repo is written in Rust with PyO3 bindings. I’d love feedback on the implementation or other potential optimization tricks for BPE.

Thanks!

I turned myself into an AI-generated deathbot – here's what I found

https://www.bbc.com/news/articles/c93wjywz5p5o
1•cmsefton•4m ago•0 comments

Management style doesn't predict survival

https://orchidfiles.com/management-style-doesnt-predict-survival/
1•theorchid•5m ago•0 comments

One Generation Runs the Country. The Next Cashed in on Crypto

https://www.wsj.com/finance/currencies/trump-sons-crypto-billions-1e7f1414
1•impish9208•6m ago•1 comments

"I Was Wrong": Why the Civil War Is Running Late [video][2h21m]

https://www.youtube.com/watch?v=RDmkKZ7vAkI
1•Bender•7m ago•0 comments

Show HN: A sandboxed execution environment for AI agents via WASM

https://github.com/Parassharmaa/agent-sandbox
1•paraaz•10m ago•0 comments

Wine-Staging 11.2 Brings More Patches to Help Adobe Photoshop on Linux

https://www.phoronix.com/news/Wine-Staging-11.2
2•doener•10m ago•0 comments

The Nature of the Beast

https://cinemasojourns.com/2026/02/07/the-nature-of-the-beast/
1•jjgreen•11m ago•0 comments

From Prediction to Compilation: A Manifesto for Intrinsically Reliable AI

1•JanusPater•11m ago•0 comments

Show HN: Curated list of 1000 open source alternatives to proprietary software

https://opensrc.me
1•ZenithSoftware•13m ago•0 comments

AI's Real Problem Is Illegitimacy, Not Hallucination

1•JanusPater•14m ago•1 comments

'I fell into it': ex-criminal hackers urge UK pupils to use web skills for good

https://www.theguardian.com/technology/2026/feb/08/i-fell-into-it-ex-criminal-hackers-urge-manche...
1•robaato•14m ago•0 comments

Why 175-Year-Old Glassmaker Corning Is Suddenly an AI Superstar

https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b
1•bookofjoe•15m ago•1 comments

Keeping WSL Alive

https://shift1w.com/blog/keeping-wsl-alive/
1•jakesocks•16m ago•0 comments

Unlocking core memories with GoldSrc engine and CS 1.6 (2025)

https://www.danielbrendel.com/blog/43-unlocking-core-memories-with-goldsrc-engine
2•foxiel•17m ago•0 comments

Gtrace an advanced network path analysis tool

https://github.com/hervehildenbrand/gtrace
2•jimaek•17m ago•0 comments

America does not trust Putin or Trump

https://re-russia.net/en/review/809/
1•mnky9800n•21m ago•0 comments

Let's Do Music in Linux [video]

https://www.youtube.com/watch?v=IHgsOdoLuBU
1•mariuz•22m ago•0 comments

"Nothing" is the secret to structuring your work

https://www.vangemert.dev/blog/nothing
1•spmvg•25m ago•0 comments

AI Makes the Easy Part Easier and the Hard Part Harder

https://www.blundergoat.com/articles/ai-makes-the-easy-part-easier-and-the-hard-part-harder
1•birdculture•27m ago•0 comments

Show HN: Fine-tuned Qwen2.5-7B on 100 films for probabilistic story graphs

https://cinegraphs.ai/
1•graphpilled•27m ago•1 comments

A failed wantrepreneur's view on common startup advice

https://developerwithacat.com/blog/202602/startup-advice/
1•mmarian•27m ago•0 comments

Show HN: BestClaw Simple OpenClaw/MoltBot for non tech people

https://bestclaw.host/
2•nihey•28m ago•0 comments

AI is making me anxious and stupid

https://tom.so/posts/ai-is-making-me-anxious-and-stupid
1•tomupom•31m ago•0 comments

Show HN: Real-time path tracing of medical CT volumes in the browser via WebGPU

https://grenzwert.net/
2•MickGorobets•35m ago•1 comments

United States – Crypto Scam Help – Intelligence Cyber Wizard Safe Guide

1•Forensics•38m ago•0 comments

What to Do After a Crypto Scam (USA) Intelligence Cyber Wizard Explained

1•Forensics•39m ago•0 comments

The Physics of 588: A 17.64μm Isolation Barrier Strategy for 5nm Process

https://github.com/eggpine84-del/NHE-CODING
1•eggpine84•39m ago•0 comments

My Eighth Year as a Bootstrapped Founder

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•40m ago•0 comments

Data Modelling Open Source

https://github.com/sqlmodel/sqlmodel
2•Sean766•43m ago•0 comments

Mid-life transitions

https://blogs.gnome.org/chergert/2026/02/06/mid-life-transitions/
2•pabs3•43m ago•0 comments