frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Solar's growth in US almost enough to offset rising energy use

https://arstechnica.com/science/2025/11/solars-growth-in-us-almost-enough-to-offset-rising-energy...
1•pseudolus•2m ago•0 comments

"we are not enron" says nvidia [video]

https://www.youtube.com/watch?v=ThUNhjblo24
3•xqcgrek2•5m ago•0 comments

Generative AI in Software Engineering Must Be Human-Centered [pdf]

https://www.cs.ubc.ca/~rtholmes/papers/jss_2024_russo.pdf
3•nextos•5m ago•0 comments

OpenAI Loses Key Discovery Battle as It Cedes Ground to Authors in AI Lawsuits

https://www.hollywoodreporter.com/business/business-news/openai-loses-key-discovery-battle-why-de...
2•CharlesW•9m ago•0 comments

Show HN: Splintr – Rust BPE tokenizer, 12x faster than tiktoken for batches

https://github.com/farhan-syah/splintr
1•fs90•9m ago•0 comments

Ask HN: Why would you downvote without replying?

1•txrx0000•10m ago•0 comments

How 'Stranger Things' Defined the Era of the Algorithm

https://www.nytimes.com/interactive/2025/11/26/arts/television/stranger-things-streaming-netflix....
1•pseudolus•11m ago•1 comments

Warning: The Fed Can't Rescue AI

https://paulkrugman.substack.com/p/warning-the-fed-cant-rescue-ai
3•xqcgrek2•15m ago•0 comments

Apple's Preventing Some Apps from Working on Older iPhones [video]

https://www.youtube.com/watch?v=WXqVV8_GORE
1•barbs•19m ago•0 comments

Collabora Online Desktop Released with Improved UI from LibreOffice

https://www.collaboraonline.com/blog/collabora-online-now-available-on-desktop/
1•nogajun•20m ago•0 comments

Why is OpenAI lying about the data its collecting on users?

2•kypro•21m ago•0 comments

What Does "Capitalism" Mean, Anyway?

https://www.newyorker.com/magazine/2025/12/01/capitalism-a-global-history-sven-beckert-book-review
2•pseudolus•21m ago•1 comments

SM9 – Chinese National Cryptography Standard

https://en.wikipedia.org/wiki/SM9_(cryptography_standard)
1•uneven9434•22m ago•0 comments

Taking down Next.js servers for 0.0001 cents a pop

https://www.harmonyintelligence.com/taking-down-next-js-servers
6•polymathist•23m ago•1 comments

IPSW.me – Download iOS Firmware

https://ipsw.me/
1•uneven9434•24m ago•0 comments

Designable Emergence: The Next Frontier After the Artificial Nucleolus

https://medium.com/@peter_9588/designable-emergence-the-next-frontier-after-the-artificial-nucleo...
1•aniijbod•27m ago•0 comments

2025 Edelman Trust Barometer Flash Poll: Trust and AI at a Crossroad [pdf]

https://www.edelman.com/sites/g/files/aatuss191/files/2025-11/2025%20Edelman%20Trust%20Barometer%...
2•layer8•27m ago•0 comments

Beep-8: A Fantasy Console with an ARM-Based Architecture and C/C++ SDK

https://github.com/beep8/beep8-sdk
3•beep8_official•29m ago•1 comments

Dead, 279 missing in Hong Kong fire

https://www.scmp.com/news/hong-kong/society/article/3334217/major-fire-hong-kongs-tai-po-leaves-2...
3•nsoonhui•30m ago•1 comments

Show HN: Statements to Sheets – Convert Bank Statement PDFs to CSV

https://statementstosheets.com
2•spiked•33m ago•1 comments

DSP 101 Part 1: An Introductory Course in DSP System Design

https://www.analog.com/en/resources/analog-dialogue/articles/dsp-101-part-1.html
1•teleforce•38m ago•0 comments

Has the bailout of generative AI begun?

https://garymarcus.substack.com/p/has-the-bailout-of-generative-ai
3•chmaynard•44m ago•0 comments

The weirdest tool I own is also one of the most useful

https://www.zdnet.com/article/the-weirdest-tool-i-own-is-also-one-of-the-most-useful-and-its-14-o...
1•fcpguru•54m ago•0 comments

What we know about the Hong Kong apartment fires

https://www.bbc.com/news/articles/cdxe9r7wjgro
2•thunderbong•55m ago•0 comments

To Meld A.I. With Supercomputers, National Labs Are Picking Up the Pace

https://www.nytimes.com/2025/11/20/technology/national-laboratories-ai-supercomputers.html
1•bookofjoe•57m ago•1 comments

Chinese Pharma Is on the Cusp of Going Global

https://www.economist.com/china/2025/11/23/chinese-pharma-is-on-the-cusp-of-going-global
1•m463•58m ago•0 comments

AdBlock and Signal are for terrorists, according to the French government [video]

https://www.youtube.com/watch?v=1q1hjmwLqe4
11•pabs3•1h ago•1 comments

Show HN: HN Alerts

https://hnalerts.com/
2•davidbarker•1h ago•0 comments

Is France standing up for encryption and privacy?

https://tuta.com/blog/france-law-encryption
2•pabs3•1h ago•0 comments

SmartTube App Publishing Key Exposed

https://www.patreon.com/posts/144473602
1•akersten•1h ago•0 comments
Open in hackernews

Show HN: Splintr – Rust BPE tokenizer, 12x faster than tiktoken for batches

https://github.com/farhan-syah/splintr
1•fs90•9m ago
Hi HN,

I built Splintr, a BPE tokenizer in Rust (with Python bindings), because I found existing Python-based tokenizers were bottlenecking my data processing pipelines.

While OpenAI's tiktoken is the gold standard for correctness, I found I could get significantly better throughput on modern multi-core CPUs by rethinking how parallelism is applied.

Splintr achieves ~111 MB/s batch throughput (vs ~9 MB/s for tiktoken).

The Design Choice: "Sequential by Default" One of the most interesting findings during development was that naive parallelism actually hurts performance for typical LLM inputs. Thread pool overhead is significant for texts under 1MB.

I implemented a hybrid strategy:

Single Text (encode): Purely sequential. It’s 3-4x faster than tiktoken simply by using pcre2 with JIT instead of standard regex handling.

Batch Processing (encode_batch): Parallelizes across texts using Rayon, rather than within a text. This saturates all cores without the overhead of splitting small strings.

Other Features:

Safety: Strict UTF-8 compliance, including a streaming decoder that correctly buffers incomplete multi-byte characters.

Compatibility: Drop-in support for cl100k_base (GPT-4), o200k_base (GPT-4o), and llama3 vocabularies.

The repo is written in Rust with PyO3 bindings. I’d love feedback on the implementation or other potential optimization tricks for BPE.

Thanks!