frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

T5Gemma 2: The next generation of encoder-decoder models

https://blog.google/technology/developers/t5gemma-2/
45•milomg•1h ago

Comments

minimaxir•40m ago
> Note: we are not releasing any post-trained / IT checkpoints.

I get not trying to cannibalize Gemma, but that's weird. A 540M multimodel model that performs well on queries would be useful and "just post-train it yourself" is not always an option.

jeffjeffbear•27m ago
Isn't finetuning the point of the T5 style models, since they perform better for smaller parameter counts?
davedx•31m ago
What is an encoder-decoder model, is it some kind of LLM, or a subcomponent of an LLM?
wongarsu•20m ago
The announcement of the original T5Gemma goes in some more detail [1]. I'd describe it as two LLMs stacked on top of each other: the first understands the input, the second generates the output. "Encoder-decoder models often excel at summarization, translation, QA, and more due to their high inference efficiency, design flexibility, and richer encoder representation for understanding input"

1: https://developers.googleblog.com/en/t5gemma/

canyon289•18m ago
Hi, I'm not on the t5 Gemma team but work on gemma in general.

Encoder Decoder comes from the original transformers implementation way back in 2017. If you look at figure 1 you'll see what the first transformer ever looked like.

Since that time different implementations of transformers use either just the encoder portion, or the decoder portion, or both. Its a deep topic so hard to summarize here, but Gemini explains it really well! Hope this gets you started on some prompting to learn more

https://arxiv.org/pdf/1706.03762

wood_spirit•14m ago
A decoder predicts the next word (token) to iteratively generate a whole sentence. An encoder masks a word in the middle of a sentence and tries to predict that middle.

The original transformer paper from google was encoder-decoder, but then encoder BERT was hot and then decoder GPT was hot; now encoder-decoder is hot again!

Decoders are good at generative tasks - chatbots etc.

Encoders are good at summaration.

Encoder decoders are better at summaration. It’s steps towards “understanding” (quotes needed).

nodja•10m ago
It's an alternate architecture of LLMs, they actually predate modern LLMs. An encoder-decoder model was actually the model used in the "Attention if all you need" paper that introduced the transformer and essentially gave birth to modern LLMs.

A encoder-decoder model splits input and output. This makes sense for translation tasks, summarization, etc. They're good when there's a clear separation of "understand the task" and "complete the task", but you can use it for anything really. A example would be send "Translate to english: Le chat est noir." to the encoder, the encoder processes everything in a single step, that is understand the task as a whole, then the output of the encoder is fed to the decoder and then the decoder runs one token at a time.

GPT ditches the encoder altogether and just runs the decoder with some slight changes, this makes it more parameter efficient but tends to hallucinate more due to past tokens containing information that might be wrong. You can see it as the encoder running on each token as they are read/generated.

Building a Code Review system that uses prod data to predict bugs

https://blog.sentry.io/building-a-code-review-system-that-uses-prod-data-to-predict-bugs/
1•jshchnz•35s ago•0 comments

Business Learnings in 2025?

1•rjmtax•2m ago•0 comments

Naughty Dog Studio Orders Employee Overtime for 'Intergalactic'

https://www.bloomberg.com/news/articles/2025-12-18/sony-s-naughty-dog-studio-orders-employee-over...
5•HelloUsername•9m ago•0 comments

A TS library for connecting videos in your Mux account to multi-modal LLMs

https://github.com/muxinc/ai
1•tilt•12m ago•0 comments

Plaintext Casa First Release

https://github.com/nkoehring/plaintext.casa/releases/tag/v0.3
1•koehr•12m ago•1 comments

The Art of Vibe Design

https://www.ivan.codes/blog/the-art-of-vibe-design
1•dohguy•13m ago•0 comments

Starlink 35956 suffered a failure with venting of the propulsion tank

https://bsky.app/profile/planet4589.bsky.social/post/3mac4a3owxs2c
2•perihelions•13m ago•0 comments

CVSS 10.0 HPE OneView RCE bug identified

https://www.scworld.com/news/10-0-hpe-oneview-rce-bug-identified-patch-now
1•Bender•13m ago•0 comments

Wyoming Blasted by 123 MPH Winds on Wednesday and More Wind to Come

https://cowboystatedaily.com/2025/12/17/wyoming-blasted-by-123-mph-winds-and-fierce-mountain-snow...
1•Bender•13m ago•0 comments

Token-Count-Based Batching: Faster, Cheaper Embedding Inference for Queries

https://www.mongodb.com/company/blog/engineering/token-count-based-batching-faster-cheaper-embedd...
1•fzliu•15m ago•0 comments

New X-ray images show interstellar comet as it makes closest approach to Earth

https://www.cnn.com/2025/12/18/science/interstellar-comet-3i-atlas-xray-earth
2•Bender•15m ago•0 comments

Bill Gates and Sergey Brin Among Newly Released Epstein Photos

https://www.ft.com/content/96d65675-f4c2-4b70-aede-2e77a8648fe8
3•aanet•15m ago•0 comments

Trump media group agrees $6B merger with Google-backed fusion energy company

https://www.ft.com/content/1e1978d5-535b-4241-872f-38db778df694
2•perihelions•16m ago•0 comments

A Starlink Satellite Exploded

https://twitter.com/Starlink/status/2001691802911289712
4•wmf•16m ago•0 comments

LionsOS Design, Implementation and Performance

https://arxiv.org/abs/2501.06234
2•indolering•16m ago•0 comments

Mitsubishi Electric Technology Detects Intoxication During Driving

https://us.mitsubishielectric.com/en/pr/global/2025/1216/
2•geox•17m ago•0 comments

LLMs' impact on science: Booming publications, stagnating quality

https://arstechnica.com/science/2025/12/llms-impact-on-science-booming-publications-stagnating-qu...
2•pseudolus•19m ago•0 comments

GIJN's Top Investigative Tools of 2025

https://gijn.org/stories/gijn-top-investigative-tools-2025/
1•runningmike•20m ago•1 comments

BoltCache: A High-Performance Redis Alternative Built in Go

https://github.com/wutlu/boltcache
1•spotlayn•20m ago•0 comments

2005 Elon Musk Sounded Like Satoshi Nakamoto

https://old.reddit.com/r/conspiracy/comments/1pp2is1/2005_elon_musk_sounded_like_satoshi_nakamoto/
1•tokenmemory•21m ago•1 comments

Two Kinds of Vibe Coding

https://davidbau.com/archives/2025/12/16/vibe_coding.html
5•jxmorris12•21m ago•0 comments

Control Panel for Twitter

https://soitis.dev/control-panel-for-twitter
1•xnx•22m ago•1 comments

Model hallucinations aren't random. They have geometric structure

https://arxiv.org/abs/2512.13771
2•devy•25m ago•0 comments

Analytical dashboards and AI chat: local dev to prod (Vercel and Boreal)

https://www.fiveonefour.com/blog/chat-analytical-dashboards-guide
1•oatsandsugar•28m ago•0 comments

Most Top-Achieving Adults Werent Elite Specialists in Childhood, New Study Finds

https://www.wsj.com/science/elite-high-performance-adults-children-sports-study-ae8d6bed
3•achristmascarl•28m ago•0 comments

FAA Warns of Military Aircraft Flying Undetected in Caribbean

https://www.bloomberg.com/news/articles/2025-12-18/faa-warns-of-military-aircraft-flying-undetect...
2•toomuchtodo•30m ago•1 comments

GitHub delays GHA price increase

https://twitter.com/github/status/2001372894882918548
2•timvdalen•35m ago•2 comments

Ask HN: Is there an open source "turbopuffer"?

1•koconder•39m ago•0 comments

Calculate founder dilution across funding rounds

https://angelmatch.io/resources/cap-table-calculator
2•educated_panda•39m ago•0 comments

Ask HN: How to spend L&D/Training funds before the end of the year?

2•jamestimmins•40m ago•1 comments