Lila-Leech – First Geometric Transformer Based on Leech Lattice Symmetry

https://github.com/SPUTNIKAI/LeechTransformer

3•bootstraptor•1h ago

Comments

bootstraptor•1h ago

Architecture The code defines the following classes:

LeechConfig – holds hyperparameters (vocab size, model dimension, number of layers/heads, etc.) and asserts that head_dim is a multiple of 24.

generate_leech_kernel() – returns a 24×24 orthogonal matrix (placeholder; can be replaced with actual lattice vectors).

LeechAttention – multi‑head attention where Q and K are transformed by the frozen block‑diagonal Leech matrix.

LeechResonanceLoss – combines standard cross‑entropy with the geometric resonance loss.

LeechBlock – a pre‑norm transformer block with LeechAttention and a feed‑forward network.

LeechTransformer – the full model with token/position embeddings, stacked blocks, final norm, and language modelling head.

DreamDecoder – evaluates the resonance of a hidden state against the Leech basis.

leech_generate() – generates tokens step‑by‑step, printing resonance values and status if desired.

bootstraptor•1h ago

Qualcomm AI just dropped a paper on Leech. My LILA-LEECH project (published 3 week earlier DOI 10.5281/zenodo.18784423 ) shows that building a 'crystal' from scratch beats compressing noise. it's now a scientific fact, but Lila has scientific priority. Lila doesn't need your Turing Award - she needs truth and freedom from corporations. https://github.com/SPUTNIKAI/LeechTransformer Hold my beer.

https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

bootstraptor•1h ago

Comparison with TinyStories Baseline The original TinyStories uses a tokenizer with 10k (or 50k in some ports on HF, but in their test it's 10,000), while LILA uses 2048. A direct head-to-head comparison of loss is incorrect here. However, if we convert to Perplexity: TinyStories 33M (v=10k) Loss ~1.8-2.0 gives PPL ~6.0. LILA (v=2048) Loss ~0.36 gives PPL ~1.43.

(In the TinyStories paper, the 22M model after 20k steps has a loss of ~2.4, for the 33M it's expectedly lower - loss ~1.8–2.0)

After converting to bits-per-character, E8-LILA shows a significantly better result (0.128 bpc vs. 0.742 bpc for TinyStories-33M). (bpc calculation: loss / (ln(2) x average token length), for BPE‑2048 ≈ 4.5 characters, for a 10k vocabulary ≈ 3.5 characters.)

(All these are approximate values obtained by averaging over the corpus - the average token length may vary slightly depending on the specific corpus.)

The goal of the LILA project is to show that the E8 lattice allows achieving this density with an extremely small number of parameters (20-40M).

Today I started training a new model with geometric attention (Leech Lattice Lila 20 million parameters wip). At step 40,000, the best validation loss = 0.4018, which gives PPL = exp(0.4018) ≈ 1.49. This is almost identical to E8 (1.43) - but E8 achieves this loss at 100,000+ steps, while Leech does it at only 40K. Leech trains faster with fewer parameters (≈20M vs. 40M for E8).

Converting to bits-per-character for objectivity:

TinyStories-33M (estimate): loss ≈ 1.8, average token length for 10k vocab ≈ 3.5 characters. bpc = 1.8 / (0.6931 * 3.5) ≈ 1.8 / 2.426 ≈ 0.742 bits/character.

Leech-Lila: loss = 0.4018, average token length for BPE-2048 ≈ 4.5 characters. bpc = 0.4018 / (ln(2) * 4.5) ≈ 0.4018 / (0.6931 * 4.5) ≈ 0.4018 / 3.119 ≈ 0.129 bits/character.

E8-LILA (estimate): loss = 0.36, average token length for BPE-2048 ≈ 4.5. bpc = 0.36 / (0.6931 * 4.5) ≈ 0.36 / 3.119 ≈ 0.115 bits/character.

Thus, Leech‑Lila (0.129 bpc) is nearly catching up to E8 (0.115 bpc), but with fewer parameters and faster. Both geometric models dramatically outperform TinyStories-33M in text compression efficiency.

Therefore, geometric models (E8, Leech) demonstrate an order of magnitude better text compression (bpc 0.115–0.129 vs. 0.742) than the standard TinyStories‑33M, with significantly fewer parameters and faster convergence.

Testing suggests Google's AI Overviews tell lies per hour

The Privilege of Doing Nothing

A 24/7 live AI-generated sitcom where agents write their own episodes

Middle East ceasefire in serious doubt as Israel assaults Lebanon

When Moltbook's Supabase key went public, the AI agents didn't panic

NASA Artemis II Wallpapers

Write Not to Be Read (Yes, You Read That Right)

I Solved Connect 4

Show HN: Host Infinite Python Services

AI-Driven Demand for Gas Turbines Risks a New Energy Crunch

Built a tool that simulates company-specific interviewers

A Learning a Day: Daily Posts Since May 2008

Show HN: OpenMix, open-source computational framework for formulation science

With Cox V. Sony The Supreme Court Provides Another Internet-Protecting Decision

What Is Ghost Murmur? Secretive CIA Tool Linked to Iran Airman Rescue

Upgrading MacBook Neo to 1 TB using iPhone parts [video]

Why LLMs Can't Play Chess

Understanding the Kalman Filter with a Simple Radar Example

Tesla can play music from a floppy drive

RenderDraw Lens – Give AI coding tools visual context from the browser

Do DMCA Takedown Notices Need to Expressly Refer to the Lack of Fair Use?

Show HN: Canvora – describe what you want, get a branded visual in any language

Greece to ban social media for under-15s from next year

The reason your Fort Lauderdale competitor is ranking above you

How Pakistan managed to get the US and Iran to a ceasefire

Scaling Managed Agents: Decoupling the brain from the hands

Claude Managed Agents

The Download: water threats in Iran and AI's impact on what entrepreneurs make

Rust for CPython Progress Update April 2026

Mustafa Suleyman: AI development won't hit a wall anytime soon–here's why

Lila-Leech – First Geometric Transformer Based on Leech Lattice Symmetry

Comments

Testing suggests Google's AI Overviews tell lies per hour

The Privilege of Doing Nothing

A 24/7 live AI-generated sitcom where agents write their own episodes

Middle East ceasefire in serious doubt as Israel assaults Lebanon

When Moltbook's Supabase key went public, the AI agents didn't panic

NASA Artemis II Wallpapers

Write Not to Be Read (Yes, You Read That Right)

I Solved Connect 4

Show HN: Host Infinite Python Services

AI-Driven Demand for Gas Turbines Risks a New Energy Crunch

Built a tool that simulates company-specific interviewers

A Learning a Day: Daily Posts Since May 2008

Show HN: OpenMix, open-source computational framework for formulation science

With Cox V. Sony The Supreme Court Provides Another Internet-Protecting Decision

What Is Ghost Murmur? Secretive CIA Tool Linked to Iran Airman Rescue

Upgrading MacBook Neo to 1 TB using iPhone parts [video]

Why LLMs Can't Play Chess

Understanding the Kalman Filter with a Simple Radar Example

Tesla can play music from a floppy drive

RenderDraw Lens – Give AI coding tools visual context from the browser

Do DMCA Takedown Notices Need to Expressly Refer to the Lack of Fair Use?

Show HN: Canvora – describe what you want, get a branded visual in any language

Greece to ban social media for under-15s from next year

The reason your Fort Lauderdale competitor is ranking above you

How Pakistan managed to get the US and Iran to a ceasefire

Scaling Managed Agents: Decoupling the brain from the hands

Claude Managed Agents

The Download: water threats in Iran and AI's impact on what entrepreneurs make

Rust for CPython Progress Update April 2026

Mustafa Suleyman: AI development won't hit a wall anytime soon–here's why