frontpage.

I investigated whether an explicit entropy signal in token embeddings—beyond implicit frequency effects—carries independent information in modern LLMs, and if it could enable inference optimizations in frozen models.

Using skip-gram trained 260-d vectors (256 semantic + 4 entropy dimensions) on FineWeb-Edu, I projected them into GPT-2 and Qwen3-14B embedding spaces and substituted low-entropy tail tokens (rare/predictable, function-like) vs high-entropy common tokens (frequent/polysemous).

Surprising result: Low-entropy tail substitutions incur a near-constant +0.101 to +0.102 perplexity increase on both models—to three decimal places—despite major differences in architecture, dimension (768 vs 5120), tokenizer, and norms. This appears intrinsic to the token class.

High-entropy tokens are far more sensitive: +36 PPL on GPT-2, +9 on Qwen3 from ~500–530 swaps (per-token ratio 356× → 91×).

Residual stream analysis shows nearly identical mean convergence trajectories, but low-entropy tokens exhibit transient mid-layer variance spikes (heterogeneous paths), while high-entropy ones propagate perturbations monotonically.

Native training on Qwen3 subword vocab weakens the signal (narrower contexts) and reverses the pattern.

Repo with reproducible experiments: https://github.com/maykef/entropy2vec

Results summary (tables): https://github.com/maykef/entropy2vec/blob/main/results/summary_table.md

Conclusion: The low/high-entropy distinction is real and model-agnostic, but post-hoc exploitation for inference (speed/memory) is negligible on frozen models—lookup/forward pass remain full-cost. Meaningful gains would require conditional compute from pretraining.

Curious about similar observations in larger models or alternative uses (e.g., uncertainty detection, curriculum design).

Learning-Based Multi-Stage Strategy for Aircraft to Evade Missile

The Man Who Broke into Jail

How to deploy your Lovable App in Brazil [video]

Reflections on Norway

The Zen of DevOps · TBNL

Earth Garden: Field Recordings Around the World

Robert Tinney: 'Byte' Magazine and Beyond

Show HN: Pane – Give your AI access to your financial data via MCP

Hit Your 1 Rep Max with AI

CBP Tapped into the Online Advertising Ecosystem to Track Peoples' Movements

MCP Servers Are Now Searchable

Microsoft Expands Starlink Alliance to Grow Azure and AI in Kenya

Slab tearing and segmented subduction termination driven by transform tectonics

Rare Earths Norway says estimate of Europe's biggest deposit jumps 81%

Anthropic-backed super PAC spends $1.6M in primary race divided over datacenters

First AI Agent on a Smartwatch

Killed by Mozilla

PRX Part 3 – Training a Text-to-Image Model in 24h

Helsinki just went a full year without a single traffic death

Select your fruit (No JavaScript)

If You Like PICO-8, You'll Love Kaplay (Probably)

It's an Obscure Psychedelic Used to Treat Trauma. Could It Help Me?

MicroTimes Interviews Borland's Philippe Kahn Again (1995)

Behold the Power of Meta:Substitute

Pincer – Python AI agent framework, security-first

Compiling Prolog to Forth [pdf]

Maryland Senators Approve Bill to Let Off-Duty Firefighters, EMTs Use Cannabis

Zed will require age identification for its services

Linux in Space: The aerospace industry's attitude for Space Architechture

The magic of adding random noise to black and white images [video]

Low-entropy tokens tolerate substitution with and 0.1 PPL cost across models