frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LLM Year in Review

https://karpathy.bearblog.dev/year-in-review-2025/
43•swyx•4h ago

Comments

swyx•4h ago
xposted to https://x.com/karpathy/status/2002118205729562949
TheAceOfHearts•4h ago
I think one of the things that is missing from this post is engaging a bit in trying to answer: what are the highest priority AI-related problems that the industry should seek to tackle?

Karpathy hints at one major capability unlock being UI generation, so instead of interacting with text the AI can present different interfaces depending on the kind of problem. That seems like a severely underexplored problem domain so far. Who are the key figures innovating in this space so far?

In the most recent Demis interview, he suggests that one of the key problems that must be solved is online / continuous learning.

Aside from that, another major issues is probably reducing hallucinations and increasing reliability. Ideally you should be able to deploy an LLM to work on a problem domain, and if it encounters an unexpected scenario it reaches out to you in order to figure out what to do. But for standard problems it should function reliably 100% of the time.

victorbuilds•4h ago
Notable omission: 2025 is also when the ghosts started haunting the training data. Half of X replies are now LLMs responding to LLMs. The call is coming from inside the dataset.
vlod•40m ago
Any tips to spot this? I want to avoid arguing with a X bot.
thoughtpeddler•2h ago
I appreciate Andrej’s optimistic spirit, and I am grateful that he dedicates so much of his time to educating the wider public about AI/LLMs. That said, it would be great to hear his perspective on how 2025 changed the concentration of power in the industry, what’s happening with open-source, local inference, hardware constraints, etc. For example, he characterizes Claude Code as “running on your computer”, but no, it’s just the TUI that runs locally, with inference in the cloud. The reader is left to wonder how that might evolve in 2026 and beyond.
D-Machine•1h ago
The section on Claude Code is very ambiguously and confusingly written, I think he meant that the agent runs on your computer (not inference) and that this is in contrast to agents running "on a website" or in the cloud:

> I think OpenAI got this wrong because I think they focused their codex / agent efforts on cloud deployments in containers orchestrated from ChatGPT instead of localhost. [...] CC got this order of precedence correct and packaged it into a beautiful, minimal, compelling CLI form factor that changed what AI looks like - it's not just a website you go to like Google, it's a little spirit/ghost that "lives" on your computer. This is a new, distinct paradigm of interaction with an AI.

However, if so, this is definitely a distinction that needs to be made far more clearly.

magicalhippo•1h ago
From what I can gather, llama.cpp supports Anthropic's message format now[1], so you can use it with Claude Code[2].

[1]: https://github.com/ggml-org/llama.cpp/pull/17570

[2]: https://news.ycombinator.com/item?id=44654145

karpathy•1h ago
The CC point is more about the data and environmental and general configuration context, not compute and where it happens to run today. The cloud setups are clunky because of context and UIUX user in the loop considerations, not because of compute considerations.
bgwalter•1h ago
Vibe coding is sufficient for job hoppers who never finish anything and leave when the last 20% have to be figured out. Much easier to promote oneself as an expert and leave the hard parts to other people.
zingar•1h ago
I’ve found incredible productivity gains writing (vibe coding) tools for myself that will never need to be “productionised” or even used by another person. Heck even I will probably never use the latest log retrieval tool, which exists purely for Claude code to invoke it. There is a ton of useful software yet to be written for which there _is_ no “last 20%”.
delichon•1h ago
> I like this version of the meme for pointing out that human intelligence is also jagged in its own different way.

The idea of jaggedicity seems useful to advancing epistemology. If we could identify the domains that have useful data that we fail to extract, we could fill those holes and eventually become a general intelligence ourselves. The task may be as hard as making a list of your blind spots. But now we have an alien intelligence with an outside perspective. While making AI less jagged it might return the favor.

If we keep inventing different kinds of intelligence the sum of the splats may eventually become well rounded.

mvkel•41m ago
> In this world view, nano banana is a first early hint of what that might look like.

What is he referring to here? Is nano banana not just an image gen model? Is it because it's an LLM-based one, and not diffusion?

dragonwriter•27m ago
I think he is referring to capability, not architecture, and say that NB is at the point that it is suggestive of the near-future capability of using GenAI models to create their own UI as needed.

NB (Gemini 2.5 Flash Image) isn't the first major-vendor LLM-based image gen model, after all; GPT Image 1 was first.

starchild3001•28m ago
The distinction Karpathy draws between "growing animals" and "summoning ghosts" via RLVR is the mental model I didn't know I needed to explain the current state of jagged intelligence. It perfectly articulates why trust in benchmarks is collapsing; we aren't creating generally adaptive survivors, but rather over-optimizing specific pockets of the embedding space against verifiable rewards.

I’m also sold on his take on "vibe coding" leading to ephemeral software; the idea of spinning up a custom, one-off tokenizer or app just to debug a single issue, and then deleting it, feels like a real shift.

CSS Grid Lanes

https://webkit.org/blog/17660/introducing-css-grid-lanes/
221•frizlab•3h ago•71 comments

Mistral OCR 3

https://mistral.ai/news/mistral-ocr-3
373•pember•1d ago•68 comments

Garage – An S3 object store so reliable you can run it outside datacenters

https://garagehq.deuxfleurs.fr/
447•ibobev•9h ago•90 comments

A Better Zip Bomb

https://www.bamsoftware.com/hacks/zipbomb/
78•kekqqq•3h ago•27 comments

TP-Link Tapo C200: Hardcoded Keys, Buffer Overflows and Privacy

https://www.evilsocket.net/2025/12/18/TP-Link-Tapo-C200-Hardcoded-Keys-Buffer-Overflows-and-Priva...
215•sibellavia•7h ago•62 comments

PBS News Hour West to go dark after ASU discontinues contract

https://www.statepress.com/article/2025/12/politics-pbs-newshour-west-closure
15•heavyset_go•1h ago•0 comments

8-bit Boléro

https://linusakesson.net/music/bolero/index.php
165•Aissen•13h ago•29 comments

Amazon will allow ePub and PDF downloads for DRM-free eBooks

https://www.kdpcommunity.com/s/article/New-eBook-Download-Options-for-Readers-Coming-in-2026?lang...
534•captn3m0•15h ago•277 comments

GotaTun – Mullvad's WireGuard Implementation in Rust

https://mullvad.net/en/blog/announcing-gotatun-the-future-of-wireguard-at-mullvad-vpn
536•km•14h ago•112 comments

Graphite is joining Cursor

https://cursor.com/blog/graphite
168•fosterfriends•9h ago•195 comments

Brown/MIT shooting suspect found dead, officials say

https://www.washingtonpost.com/nation/2025/12/18/brown-university-shooting-person-of-interest/
91•anigbrowl•22h ago•97 comments

Qwen-Image-Layered: transparency and layer aware open diffusion model

https://huggingface.co/papers/2512.15603
60•dvrp•22h ago•7 comments

Performance Hints (2023)

https://abseil.io/fast/hints.html
48•danlark1•8h ago•24 comments

Show HN: TinyPDF – 3kb pdf library (70x smaller than jsPDF)

https://github.com/Lulzx/tinypdf
108•lulzx•1d ago•15 comments

Rust's Block Pattern

https://notgull.net/block-pattern/
114•zdw•20h ago•50 comments

NOAA deploys new generation of AI-driven global weather models

https://www.noaa.gov/news-release/noaa-deploys-new-generation-of-ai-driven-global-weather-models
84•hnburnsy•2d ago•56 comments

The FreeBSD Foundation's Laptop Support and Usability Project

https://github.com/FreeBSDFoundation/proj-laptop
134•mikece•10h ago•42 comments

Believe the Checkbook

https://robertgreiner.com/believe-the-checkbook/
115•rg81•9h ago•50 comments

The pitfalls of partitioning Postgres yourself

https://hatchet.run/blog/postgres-partitioning
46•abelanger•3d ago•5 comments

Buteyko Method

https://en.wikipedia.org/wiki/Buteyko_method
34•rzk•3h ago•14 comments

Lite^3, a JSON-compatible zero-copy serialization format

https://github.com/fastserial/lite3
131•cryptonector•6d ago•33 comments

Reverse Engineering US Airline's PNR System and Accessing All Reservations

https://alexschapiro.com/security/vulnerability/2025/11/20/avelo-airline-reservation-api-vulnerab...
85•bearsyankees•7h ago•40 comments

Response Healing: Reduce JSON defects by 80%+

https://openrouter.ai/announcements/response-healing-reduce-json-defects-by-80percent
36•numlocked•1d ago•36 comments

Language Immersion, Prison-Style (2017)

https://www.themarshallproject.org/2017/12/14/my-do-it-yourself-language-immersion-prison-style
6•johnny313•5d ago•0 comments

The scariest boot loader code

http://miod.online.fr/software/openbsd/stories/boot_hppa.html
25•todsacerdoti•4h ago•1 comments

Man Made Troubles (1953) [video]

https://www.youtube.com/watch?v=AW-dvD2ZLZY
6•CaliforniaKarl•4d ago•0 comments

Monumental snake engravings of the Orinoco River (2024)

https://www.cambridge.org/core/journals/antiquity/article/monumental-snake-engravings-of-the-orin...
12•bryanrasmussen•1w ago•1 comments

Show HN: Misata – synthetic data engine using LLM and Vectorized NumPy

https://github.com/rasinmuhammed/misata
11•rasinmuhammed•3d ago•0 comments

LLM Year in Review

https://karpathy.bearblog.dev/year-in-review-2025/
43•swyx•4h ago•14 comments

History LLMs: Models trained exclusively on pre-1913 texts

https://github.com/DGoettlich/history-llms
760•iamwil•1d ago•375 comments