Three things everyone should know about Vision Transformers

41•reqo•6h ago

Comments

Centigonal•5h ago

There's something that tickles me about this paper's title. The thought that everyone should know these three things. The idea of going to my neighbor who's a retired K-12 teacher and telling her about how adding MLP-based patch pre-processing layers improves Bert-like self-supervised training based on patch masking.

pixl97•5h ago

Hey, when the AI powered T-rex is chasing you down you'll wish you paid attention that the vision transformers perception is based on movement!

Had to throw some Jurassic Park humor in here.

woopwoop•5h ago

Clickbait titles are something of a tradition in this field by now. Some important paper titles include "One weird trick for parallelizing convolutional neural networks", "Attention is all you need", and "A picture is worth 16x16 words". Personally I still find it kind of irritating, but to each their own I guess.

minimaxir•5h ago

Only the first one is clickbait in the style of blogs that incentivize you to click on the headline (i.e. the information gap), the last two are just fun puns.

janalsncm•4h ago

Honestly I took the first one as making fun of that trope. Usually the “one weird trick to” ends in some tabloid-style thing like lose 15 pounds or find out if your husband is loyal. So “parallizing CNNs” is a joke, as if that’s something you’d see in a checkout isle.

woopwoop•4h ago

In what sense is "Attention is all you need" a pun?

minimaxir•4h ago

It's a reference to the lyric "love is all you need" from the song "All You Need Is Love" by the Beatles, and it uses a faux-synonym with a different meaning.

adultSwim•4h ago

"Attention is all you need" is an outlier. They backed up their bold claim with breakthrough results.

For modest incremental improvements, I greatly prefer boring technical titles. Not everything needs to a stochastic parrot. We see this dynamic with building luxury condos. On any individual project, making that pick will help juice profit. When the whole city follows that , it leads to a less desirable outcome.

guerrilla•4h ago

Yeah, I guess today was the day that I learned I am not part of "everyone". I feel so left out now.

i5heu•5h ago

I put this paper into 4o so i can check if it is relevant, so that you do not have to do this too here are the bullet points:

- Vision Transformers can be parallelized to reduce latency and improve optimization without sacrificing accuracy.

- Fine-tuning only the attention layers is often sufficient for adapting ViTs to new tasks or resolutions, saving compute and memory.

- Using MLP-based patch preprocessing improves performance in masked self-supervised learning by preserving patch independence.

Jamesoncrate•4h ago

just read the abstract

jmugan•4h ago

You would think. I don't know about this paper in particular, but I'm continually surprised about how much more I get out of LLM summaries of papers than the abstracts of papers written by the authors.

tough•1h ago

This would be an interesting metric to track, how different an abstract generated from LLM giving it the paper as source, vs the actual abstract is, and if it has any correlation whatsoever with the overall quality of the paper or not

mananaysiempre•45m ago

Paper abstracts are not optimized by drive-by readers like you and me. They are optimized for active researchers in the field reading their daily arXiv digest that lists all the new papers across the categories they work in, and needing to take the read/don't-read decision for each entry there as efficiently as possible.

If you’ve already decided you’re interested in the paper, then the Introduction and/or Conclusion sections are what you’re looking for.

OpenAI releases image generation in the API

I wrote to the address in the GPLv2 license notice (2022)

One quantum transition makes light at 21 cm

NSF director to resign amid grant terminations, job cuts, and controversy

OpenVSX, which VSCode forks rely on for extensions, down for 24 hours

Everyone Says They'll Pay More for "Made in the USA" So We Ran an A/B Test

Three Fundamental Flaws of SIMD ISAs (2023)

Instant SQL for results as you type in DuckDB UI

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

Ask HN: Share your AI prompt that stumps every model

A Principled Approach to Querying Data – A Type-Safe Search DSL

Why I Blog and How I Automate it (2023)

Mark Zuckerberg says social media is over

Show HN: Infat – Declaritive application assocation manager for macOS

Show HN: I reverse engineered top websites to build an animated UI library

Bild AI (YC W25) is hiring a founding engineer in SF

Asymmetric Content Moderation in Search Markets: The Case of Adult Websites

Show HN: Lemon Slice Live – Have a video call with a transformer model

Show HN: I Added Translation to My RSS Reader Project

Show HN: Zev – Remember (or discover) terminal commands

Creating your own federated microblog

A Tour Inside the IBM Z17

Acquisitions, consolidation, and innovation in AI

Assignment 5: Cars and Key Fobs

Three things everyone should know about Vision Transformers

Careless People

Show HN: I made my own TRMNL e-ink device

What Actually Happens at the End of 'Trading Places'? (2013)

Show HN: My from-scratch OS kernel that runs DOOM

You wouldn't steal a font

OpenAI releases image generation in the API

I wrote to the address in the GPLv2 license notice (2022)

One quantum transition makes light at 21 cm

NSF director to resign amid grant terminations, job cuts, and controversy

OpenVSX, which VSCode forks rely on for extensions, down for 24 hours

Everyone Says They'll Pay More for "Made in the USA" So We Ran an A/B Test

Three Fundamental Flaws of SIMD ISAs (2023)

Instant SQL for results as you type in DuckDB UI

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

Ask HN: Share your AI prompt that stumps every model

A Principled Approach to Querying Data – A Type-Safe Search DSL

Why I Blog and How I Automate it (2023)

Mark Zuckerberg says social media is over

Show HN: Infat – Declaritive application assocation manager for macOS

Show HN: I reverse engineered top websites to build an animated UI library

Bild AI (YC W25) is hiring a founding engineer in SF

Asymmetric Content Moderation in Search Markets: The Case of Adult Websites

Show HN: Lemon Slice Live – Have a video call with a transformer model

Show HN: I Added Translation to My RSS Reader Project

Show HN: Zev – Remember (or discover) terminal commands

Creating your own federated microblog

A Tour Inside the IBM Z17

Acquisitions, consolidation, and innovation in AI

Assignment 5: Cars and Key Fobs

Three things everyone should know about Vision Transformers

Careless People

Show HN: I made my own TRMNL e-ink device

What Actually Happens at the End of 'Trading Places'? (2013)

Show HN: My from-scratch OS kernel that runs DOOM

You wouldn't steal a font

Three things everyone should know about Vision Transformers

Comments