frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

T5Gemma 2: The next generation of encoder-decoder models

https://blog.google/technology/developers/t5gemma-2/
70•milomg•3h ago

Comments

minimaxir•2h ago
> Note: we are not releasing any post-trained / IT checkpoints.

I get not trying to cannibalize Gemma, but that's weird. A 540M multimodel model that performs well on queries would be useful and "just post-train it yourself" is not always an option.

jeffjeffbear•1h ago
Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts?
refulgentis•4m ago
It’ll be a major pain in the ass to replicate exactly what they did to make it long context and multimodal. Sucks too because the smol Gemma 3s with same parameter count were neither.
davedx•1h ago
What is an encoder-decoder model, is it some kind of LLM, or a subcomponent of an LLM?
wongarsu•1h ago
The announcement of the original T5Gemma goes in some more detail [1]. I'd describe it as two LLMs stacked on top of each other: the first understands the input, the second generates the output. "Encoder-decoder models often excel at summarization, translation, QA, and more due to their high inference efficiency, design flexibility, and richer encoder representation for understanding input"

1: https://developers.googleblog.com/en/t5gemma/

canyon289•1h ago
Hi, I'm not on the t5 Gemma team but work on gemma in general.

Encoder Decoder comes from the original transformers implementation way back in 2017. If you look at figure 1 you'll see what the first transformer ever looked like.

Since that time different implementations of transformers use either just the encoder portion, or the decoder portion, or both. Its a deep topic so hard to summarize here, but Gemini explains it really well! Hope this gets you started on some prompting to learn more

https://arxiv.org/pdf/1706.03762

wood_spirit•1h ago
A decoder predicts the next word (token) to iteratively generate a whole sentence. An encoder masks a word in the middle of a sentence and tries to predict that middle.

The original transformer paper from google was encoder-decoder, but then encoder BERT was hot and then decoder GPT was hot; now encoder-decoder is hot again!

Decoders are good at generative tasks - chatbots etc.

Encoders are good at summaration.

Encoder decoders are better at summaration. It’s steps towards “understanding” (quotes needed).

nodja•1h ago
It's an alternate architecture of LLMs, they actually predate modern LLMs. An encoder-decoder model was actually the model used in the "Attention if all you need" paper that introduced the transformer and essentially gave birth to modern LLMs.

A encoder-decoder model splits input and output. This makes sense for translation tasks, summarization, etc. They're good when there's a clear separation of "understand the task" and "complete the task", but you can use it for anything really. A example would be send "Translate to english: Le chat est noir." to the encoder, the encoder processes everything in a single step, that is understand the task as a whole, then the output of the encoder is fed to the decoder and then the decoder runs one token at a time.

GPT ditches the encoder altogether and just runs the decoder with some slight changes, this makes it more parameter efficient but tends to hallucinate more due to past tokens containing information that might be wrong. You can see it as the encoder running on each token as they are read/generated.

Edit: On re-read I noticed it might not be clear what I mean by past tokens containing wrong information. I mean that for each token the model generates a hidden state, those states don't change, so for example an input of 100 tokens will have 100 hidden states, the states are generated at once on the encoder model, and one token at a time on the decoder models. Since the decoder doesn't have the full information yet, the hidden state will contain extra information that might not having anything to do with the task, or even confuse it.

For example if you give the model the task "Please translate this to chinese: Thanks for the cat, he's cute. I'm trying to send it to my friend in hong kong.". For a enc-dec model it would read the whole thing at once and understand that you mean cantonese. But a decoder only model would "read" it one token a time it could trip in several places, 1. assume chinese means mandarin chinese not cantonese, 2. assume that the text after "cute." it's something to also translate and not a clarification. This would have several token worth of extra information that would confuse the model. Models are trained with this in mind so they're used to tokens having lots of different meanings embeded in them, then having later tokens narrow down the meanings, but it might cause models to ignore certain tokens, or hallucinate.

potatoman22•7m ago
What's the use case of models like T5 compared to decoder-only models like Gemma? More traditional ML/NLP tasks?
sigmoid10•3m ago
They trained it to be used like any other decoder only model. So text generation essentially. But you could use the encoder part for things like classification without much effort. Then again you can also slap a classifier head on any decoder model.

Beginning January 2026, all ACM publications will be made open access

https://dl.acm.org/openaccess
1145•Kerrick•7h ago•129 comments

We pwned X, Vercel, Cursor, and Discord through a supply-chain attack

https://gist.github.com/hackermondev/5e2cdc32849405fff6b46957747a2d28
433•hackermondev•3h ago•173 comments

GPT-5.2-Codex

https://openai.com/index/introducing-gpt-5-2-codex/
294•meetpateltech•4h ago•170 comments

Texas is suing all of the big TV makers for spying on what you watch

https://www.theverge.com/news/845400/texas-tv-makers-lawsuit-samsung-sony-lg-hisense-tcl-spying
324•tortilla•2d ago•184 comments

How China built its ‘Manhattan Project’ to rival the West in AI chips

https://www.japantimes.co.jp/business/2025/12/18/tech/china-west-ai-chips/
126•artninja1988•4h ago•112 comments

Skills for organizations, partners, the ecosystem

https://claude.com/blog/organization-skills-and-directory
212•adocomplete•5h ago•135 comments

Classical statues were not painted horribly

https://worksinprogress.co/issue/were-classical-statues-painted-horribly/
509•bensouthwood•10h ago•253 comments

Two kinds of vibe coding

https://davidbau.com/archives/2025/12/16/vibe_coding.html
31•jxmorris12•1h ago•13 comments

1.5 TB of VRAM on Mac Studio – RDMA over Thunderbolt 5

https://www.jeffgeerling.com/blog/2025/15-tb-vram-on-mac-studio-rdma-over-thunderbolt-5
9•rbanffy•39m ago•0 comments

T5Gemma 2: The next generation of encoder-decoder models

https://blog.google/technology/developers/t5gemma-2/
70•milomg•3h ago•10 comments

Delty (YC X25) Is Hiring an ML Engineer

https://www.ycombinator.com/companies/delty/jobs/MDeC49o-machine-learning-engineer
1•lalitkundu•2h ago

The Legacy of Nicaea

https://hedgehogreview.com/web-features/thr/posts/the-legacy-of-nicaea
17•diodorus•5d ago•1 comments

How did IRC ping timeouts end up in a lawsuit?

https://mjg59.dreamwidth.org/73777.html
99•dvaun•1d ago•11 comments

Show HN: Picknplace.js, an alternative to drag-and-drop

https://jgthms.com/picknplace.js/
72•bbx•2d ago•47 comments

The Scottish Highlands, the Appalachians, Atlas are the same mountain range

https://vividmaps.com/central-pangean-mountains/
59•lifeisstillgood•3h ago•15 comments

FunctionGemma 270M Model

https://blog.google/technology/developers/functiongemma/
117•mariobm•4h ago•33 comments

Firefox will have an option to disable all AI features

https://mastodon.social/@firefoxwebdevs/115740500373677782
187•twapi•4h ago•173 comments

TRELLIS.2: state-of-the-art large 3D generative model (4B)

https://github.com/microsoft/TRELLIS.2
50•dvrp•2d ago•10 comments

Show HN: Stop AI scrapers from hammering your self-hosted blog (using porn)

https://github.com/vivienhenz24/fuzzy-canary
86•misterchocolat•2d ago•53 comments

Your job is to deliver code you have proven to work

https://simonwillison.net/2025/Dec/18/code-proven-to-work/
566•simonw•8h ago•481 comments

Meta Segment Anything Model Audio

https://ai.meta.com/samaudio/
110•megaman821•2d ago•14 comments

Oliver Sacks put himself into his case studies – what was the cost?

https://www.newyorker.com/magazine/2025/12/15/oliver-sacks-put-himself-into-his-case-studies-what...
22•barry-cotter•2h ago•61 comments

How to hack Discord, Vercel and more with one easy trick

https://kibty.town/blog/mintlify/
74•todsacerdoti•3h ago•14 comments

I've been writing ring buffers wrong all these years (2016)

https://www.snellman.net/blog/archive/2016-12-13-ring-buffers/
40•flaghacker•2d ago•18 comments

Using TypeScript to obtain one of the rarest license plates

https://www.jack.bio/blog/licenseplate
125•lafond•8h ago•133 comments

AI Vending Machine Was Tricked into Giving Away Everything

https://kottke.org/25/12/this-ai-vending-machine-was-tricked-into-giving-away-everything
17•duggan•1h ago•1 comments

Please just try HTMX

http://pleasejusttryhtmx.com/
393•iNic•8h ago•331 comments

The <time> element should do something

https://nolanlawson.com/2025/12/14/the-time-element-should-actually-do-something/
52•birdculture•2d ago•16 comments

Launch HN: Pulse (YC S24) – Production-grade unstructured document extraction

31•sidmanchkanti21•7h ago•34 comments

The immortality of Microsoft Word

https://theredline.versionstory.com/p/on-the-immortality-of-microsoft-word
33•jpbryan•7h ago•48 comments