LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

https://dnhkng.github.io/posts/rys-ii/

38•realberkeaslan•5h ago

Comments

JPLeRouzic•1h ago

Has anyone started to implement this technique in Llama.cpp or similar inference tool?

dnhkng•1h ago

There was some work done on this a while back, during the FrankenMerge craze of 23'

I am working with TurboDerp to integrate this into the Exllama v3 format.

_lex•1h ago

We've discovered the language. It changes the economics of computing.

As in, this entire cloud buildout is unnecessary because it becomes like using a calculator.

Reach out to chat.

cjameskeller•1h ago

Would you be willing to elaborate? I would be curious to hear more.

lostmsu•1h ago

How's the reproducibility of the results? Like avg score of 10 runs vs original.

dnhkng•46m ago

Author here: The code is up on GitHub.

The probes I used seem to help identify good configurations, but are quite noisey. A small probe set was initially used to make the scan tractable, and then the higher ranked models were retested on a set ~10x larger.

yodon•1h ago

If you look at convolutional neural nets used in image processing, it's super common for the first layer or so to learn a family of wavelet basis functions. Later layers then do recognition in wavelet space, without that space ever being explained or communicated to the training algorithm.

This work here is obviously more complex than that, but suggests something similar is going on with early layers transforming to some sort of generalized basis functions defining a universal language representation.

yodon•1h ago

Apologies if I missed this in the article (or in the first article in the series) - what happens if you add two copies of the layer set? Does performance improve over adding one copy of the layer set?

dnhkng•48m ago

Author here: That was done in this blog post, in the beam search. I started with the best re-layer configs, and iteratively added more blocks, including the same multiple times, during a long beam search.

It turns out this does not help (somewhat surprisingly).

skyde•5m ago

Actually not surprised. I guess this is for the same reason “say it twice” [1] is working. Because LLm are trained as causal language model, past token cannot attend to future token. One copy of the layer set solve this. [1]https://arxiv.org/html/2512.14982v1

saidnooneever•58m ago

it sometimes makes me think of a video at some point of a guy (Daniel Tammet) who had some brain difference,which caused him to be extremely fast at language learning. He said all language carries the same patterns for him, which he sees through synestesia or whatever.

he learnt icelandic in week and had a fluent conversation on their national TV to prove it. (this is nuts, that language is extremely difficult to pickup with nasal sounds etc.)

ofcourse i guess its not even close to average to have such a abilities as a human, but i wonder if at some point LLMs and AI algorithms and models might shed light on such kind of abstractions (like some mentioned in comments also about image recognition algos) that might help humans actually learn these things themselves, train on them and perhaps even get taught such a thing as a skill.

LiteLLM Python package compromised by supply-chain attack

The bridge to wealth is being pulled up with AI

Major insider trading on oil detected ahead of Iran talks

Microsoft's "Fix" for Windows 11: Flowers After the Beating

Nanobrew: The fastest macOS package manager compatible with brew

Debunking Zswap and Zram Myths

Secure Domain Name System (DNS) Deployment 2026 Guide [pdf]

Ripgrep is faster than grep, ag, git grep, ucg, pt, sift (2016)

curl > /dev/sda: How I made a Linux distro that runs wget | dd

Opera: Rewind The Web to 1996 (Opera at 30)

Hypothesis, Antithesis, Synthesis

So where are all the AI apps?

Box of Secrets: Discreetly modding an apartment intercom to work with Apple Home

Log File Viewer for the Terminal

io_uring, libaio performance across Linux kernels and an unexpected IOMMU trap

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

LaGuardia pilots raised safety alarms months before deadly runway crash

MSA: Memory Sparse Attention

NanoClaw Adopts OneCLI Agent Vault

iPhone 17 Pro Demonstrated Running a 400B LLM

Autoresearch on an old research idea

No-build, no-NPM, SSR-first JavaScript framework if you hate React, love HTML

BIO – The Bao I/O Co-Processor

A 6502 disassembler with a TUI: A modern take on Regenerator

Missile Defense Is NP-Complete

The Jellies That Evolved a Different Way to Keep Time

Dune3d: A parametric 3D CAD application

Claude Code Cheat Sheet

FCC updates covered list to include foreign-made consumer routers

Show HN: Cq – Stack Overflow for AI coding agents

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

Comments

LiteLLM Python package compromised by supply-chain attack

The bridge to wealth is being pulled up with AI

Major insider trading on oil detected ahead of Iran talks

Microsoft's "Fix" for Windows 11: Flowers After the Beating

Nanobrew: The fastest macOS package manager compatible with brew

Debunking Zswap and Zram Myths

Secure Domain Name System (DNS) Deployment 2026 Guide [pdf]

Ripgrep is faster than grep, ag, git grep, ucg, pt, sift (2016)

curl > /dev/sda: How I made a Linux distro that runs wget | dd

Opera: Rewind The Web to 1996 (Opera at 30)

Hypothesis, Antithesis, Synthesis

So where are all the AI apps?

Box of Secrets: Discreetly modding an apartment intercom to work with Apple Home

Log File Viewer for the Terminal

io_uring, libaio performance across Linux kernels and an unexpected IOMMU trap

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?

LaGuardia pilots raised safety alarms months before deadly runway crash

MSA: Memory Sparse Attention

NanoClaw Adopts OneCLI Agent Vault

iPhone 17 Pro Demonstrated Running a 400B LLM

Autoresearch on an old research idea

No-build, no-NPM, SSR-first JavaScript framework if you hate React, love HTML

BIO – The Bao I/O Co-Processor

A 6502 disassembler with a TUI: A modern take on Regenerator

Missile Defense Is NP-Complete

The Jellies That Evolved a Different Way to Keep Time

Dune3d: A parametric 3D CAD application

Claude Code Cheat Sheet

FCC updates covered list to include foreign-made consumer routers

Show HN: Cq – Stack Overflow for AI coding agents