I’ve been working on a small project to map the evolution of large language model research.
I collected around 8,000 papers, embedded their abstracts, and plotted them using t-SNE to visualize clusters such as instruction-tuning, RAG, agents, and evaluation.
One interesting detail — the earliest “proto-LLM” paper that shows up is “Natural Language Processing (almost) From Scratch” (2011), which already had hints of joint representations and multitask learning.
sjm213•1h ago
I collected around 8,000 papers, embedded their abstracts, and plotted them using t-SNE to visualize clusters such as instruction-tuning, RAG, agents, and evaluation.
One interesting detail — the earliest “proto-LLM” paper that shows up is “Natural Language Processing (almost) From Scratch” (2011), which already had hints of joint representations and multitask learning.
Interactive version here: https://awesome-llm-papers.github.io/tsne.html
Would love feedback — especially on what other dimensions or embeddings might be interesting to explore (e.g., by year, model type, or dataset).