How Taalas "prints" LLM onto a chip?

https://www.anuragk.com/blog/posts/Taalas.html

56•beAroundHere•12h ago

Comments

rustyhancock•1h ago

Edit: reading the below it looks like I'm quite wrong here but I've left the comment...

The single transistor multiply is intriguing.

Id assume they are layers of FMA operating in the log domain.

But everything tells me that would be too noisy and error prone to work.

On the other hand my mind is completely biased to the digital world.

If they stay in the log domain and use a resistor network for multiplication, and the transistor is just exponentiating for the addition that seems genuinely ingenious.

Mulling it over, actually the noise probably doesn't matter. It'll average to 0.

It's essentially compute and memory baked together.

I don't know much about the area of research so can't tell if it's innovative but it does seem compelling!

generuso•59m ago

The document referenced in the blog does not say anything about the single transistor multiply.

However, [1] provides the following description: "Taalas’ density is also helped by an innovation which stores a 4-bit model parameter and does multiplication on a single transistor, Bajic said (he declined to give further details but confirmed that compute is still fully digital)."

[1] https://www.eetimes.com/taalas-specializes-to-extremes-for-e...

rustyhancock•43m ago

That's much more informative, I think my original comment is quite off the mark then.

Hello9999901•1h ago

This would be a very interesting future. I can imagine Gemma 5 Mini running locally on hardware, or a hard-coded "AI core" like an ALU or media processor that supports particular encoding mechanisms like H.264, AV1, etc.

Other than the obvious costs (but Taalas seems to be bringing back the structured ASIC era so costs shouldn't be that low [1]), I'm curious why this isn't getting much attention from larger companies. Of course, this wouldn't be useful for training models but as the models further improve, I can totally see this inside fully local + ultrafast + ultra efficient processors.

[1] https://en.wikipedia.org/wiki/Structured_ASIC_platform

owenpalmer•55m ago

> Kinda like a CD-ROM/Game cartridge, or a printed book, it only holds one model and cannot be rewritten.

Imagine a slot on your computer where you physically pop out and replace the chip with different models, sort of like a Nintendo DS.

beAroundHere•50m ago

That's the kind of hardware am rooting for. Since it'll encourage Open weighs models, and would be much more private.

Infact, I was thinking, if robots of future could have such slots, where they can use different models, depending on the task they're given. Like a Hardware MoE.

8cvor6j844qw_d6•37m ago

A cartridge slot for models is a fun idea. Instead of one chip running any model, you get one model or maybe a family of models per chip at (I assume) much better perf/watt. Curious whether the economics work out for consumer use or if this stays in the embedded/edge space.

rustybolt•34m ago

Note that this doesn't answer the question in the title, it merely asks it.

beAroundHere•31m ago

Yeah, I had written the blog to wrap my head around the idea of 'how would someone even be printing Weights on a chip?' 'Or how to even start to think in that direction?'.

I didn't explore the actual manufacturing process.

pixelmelt•12m ago

You should add an RSS feed so I can follow it!

beAroundHere•7m ago

I don't post blogs much, so haven't added RSS there, but will do. I mostly post to my linkblog[1], hence have RSS there.

[1] https://www.anuragk.com/linkblog

sargun•20m ago

Isn’t the highly connected nature of the model layers problematic to build into physical layer?

abrichr•11m ago

ChatGPT Deep Research dug through Taalas' WIPO patent filings and public reporting to piece together a hypothesis. Next Platform notes at least 14 patents filed [1]. The two most relevant:

"Large Parameter Set Computation Accelerator Using Memory with Parameter Encoding" [2]

"Mask Programmable ROM Using Shared Connections" [3]

The "single transistor multiply" could be multiplication by routing, not arithmetic. Patent [2] describes an accelerator where, if weights are 4-bit (16 possible values), you pre-compute all 16 products (input x each possible value) with a shared multiplier bank, then use a hardwired mesh to route the correct result to each weight's location. The abstract says it directly: multiplier circuits produce a set of outputs, readable cells store addresses associated with parameter values, and a selection circuit picks the right output. The per-weight "readable cell" would then just be an access transistor that passes through the right pre-computed product. If that reading is correct, it's consistent with the CEO telling EE Times compute is "fully digital" [4], and explains why 4-bit matters so much: 16 multipliers to broadcast is tractable, 256 (8-bit) is not.

The same patent reportedly describes the connectivity mesh as configurable via top metal masks, referred to as "saving the model in the mask ROM of the system." If so, the base die is identical across models, with only top metal layers changing to encode weights-as-connectivity and dataflow schedule.

Patent [3] covers high-density multibit mask ROM using shared drain and gate connections with mask-programmable vias, possibly how they hit the density for 8B parameters on one 815mm2 die.

If roughly right, some testable predictions: performance very sensitive to quantization bitwidth; near-zero external memory bandwidth dependence; fine-tuning limited to what fits in the SRAM sidecar.

Caveat: the specific implementation details beyond the abstracts are based on Deep Research's analysis of the full patent texts, not my own reading, so could be off. But the abstracts and public descriptions line up well.

[1] https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

[2] https://patents.google.com/patent/WO2025147771A1/en

[3] https://patents.google.com/patent/WO2025217724A1/en

[4] https://www.eetimes.com/taalas-specializes-to-extremes-for-e...

kinduff•1m ago

Very nice read, thank you for sharing this so well written.

How I use Claude Code: Separation of planning and execution

Japanese Woodblock Print Search

A Botnet Accidentally Destroyed I2P

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Two Bits Are Better Than One: making bloom filters 2x more accurate

How Taalas "prints" LLM onto a chip?

How far back in time can you understand English?

Gamedate – A site to revive dead multiplayer games

Evidence of the bouba-kiki effect in naïve baby chicks

Parse, Don't Validate and Type-Driven Design in Rust

Scientists discover recent tectonic activity on the moon

zclaw: personal AI assistant in under 888 KB, running on an ESP32

Claws are now a new layer on top of LLM agents

CXMT has been offering DDR4 chips at about half the prevailing market rate

Coccinelle: Source-to-source transformation tool

A New Perspective on Drawing Venn Diagrams for Data Visualization

Forward propagation of errors through time

Toyota Mirai hydrogen car depreciation: 65% value loss in a year

Carelessness versus Craftsmanship in Cryptography

The Human Root of Trust – public domain framework for agent accountability

“Playmakers,” reviewed: The race to give every child a toy

I verified my LinkedIn identity. Here's what I handed over

Canvas_ity: A tiny, single-header <canvas>-like 2D rasterizer for C++

Be wary of Bluesky

EDuke32 – Duke Nukem 3D (Open-Source)

Finding forall-exists Hyperbugs using Symbolic Execution

A16z partner says that the theory that we’ll vibe code everything is wrong

What not to write on your security clearance form (1988)

Keep Android Open

Permacomputing

How I use Claude Code: Separation of planning and execution

Japanese Woodblock Print Search

A Botnet Accidentally Destroyed I2P

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

Two Bits Are Better Than One: making bloom filters 2x more accurate

How Taalas "prints" LLM onto a chip?

How far back in time can you understand English?

Gamedate – A site to revive dead multiplayer games

Evidence of the bouba-kiki effect in naïve baby chicks

Parse, Don't Validate and Type-Driven Design in Rust

Scientists discover recent tectonic activity on the moon

zclaw: personal AI assistant in under 888 KB, running on an ESP32

Claws are now a new layer on top of LLM agents

CXMT has been offering DDR4 chips at about half the prevailing market rate

Coccinelle: Source-to-source transformation tool

A New Perspective on Drawing Venn Diagrams for Data Visualization

Forward propagation of errors through time

Toyota Mirai hydrogen car depreciation: 65% value loss in a year

Carelessness versus Craftsmanship in Cryptography

The Human Root of Trust – public domain framework for agent accountability

“Playmakers,” reviewed: The race to give every child a toy

I verified my LinkedIn identity. Here's what I handed over

Canvas_ity: A tiny, single-header <canvas>-like 2D rasterizer for C++

Be wary of Bluesky

EDuke32 – Duke Nukem 3D (Open-Source)

Finding forall-exists Hyperbugs using Symbolic Execution

A16z partner says that the theory that we’ll vibe code everything is wrong

What not to write on your security clearance form (1988)

Keep Android Open

Permacomputing

How Taalas "prints" LLM onto a chip?

Comments