Neural Networks: Zero to Hero

791•suioir•1mo ago

Comments

suioir•1mo ago

I saw this on a comment [0] and thought it deserved a post.

[0] https://news.ycombinator.com/item?id=46483776

butanyways•1mo ago

maybe we can create one ourselves. Posted this a few days ago here. A decent read: https://zekcrates.quarto.pub/deep-learning-library/

m-hodges•1mo ago

A couple years ago I wrote a tutorial how to build a Neural Network in NumPy from scratch.¹

¹ https://matthodges.com/posts/2022-08-06-neural-network-from-...

bariswheel•1mo ago

This new? Hasn't the zero-to-hero course been around for a while?

fragmede•1mo ago

https://xkcd.com/1053/

sh3rl0ck•1mo ago

Is it weird that I now know exactly which xkcd it will be just with conversational context?

Granted I'm a bit of a Randall Munroe content addict, but it's become second nature now.

messe•1mo ago

You're not alone. At this point I'm starting to recognise some by number as well.

ojo-rojo•1mo ago

Ha, you made me think of casually referring to xkcd's by number just as we did with RFC's back in the day. "I don't know, the socket states seem to follow RFC 793, but remember it's a 1918 address on the southside of the NAT."

I gonna keep a look out for doing this with xkcd's now :)

fragmede•1mo ago

There are a few that pop out but the one that has managed to stick (aside from 1053 that just came up), is 927 for standards, which you can remember as 3^2 for 9 and 3^3 for 27. Or Yoda's age + the 27 club.

jll29•1mo ago

Communicating the number of XKCD comics, especially in binary, is a very efficient and energy-preserving way to get a laugh.

A: 10000011101 !

B: ACK. LOL !

amenhotep•1mo ago

A newly convicted criminal arrived in prison, and on the first night he was puzzled to hear his fellow inmates yelling numbers to each other. "36!" one would yell, and the rest would chuckle. "19!" went another, to uproarious laughter. "50," remarked a third wryly, which provoked groans and ironic cheers. Eventually his cellmate sat up and cried out "114" and it brought the house down.

In a lull, he asked his cellmate what on earth was going on? The cellmate explained that most of them had been in prison so long that they already knew all the jokes, so to save time they just referred to them by number. "Oh," says the man, "that makes sense. Can I try?"

His cellmate encouraged him to go ahead, so he stood up and went to the bars and shouted as loud as he could "95!"

Absolutely no reaction. His cellmate looked at him and shook his head. "You didn't tell it right."

tzot•1mo ago

And some time later, someone shouts “72!” Everyone chuckles except from the one in the corner cell, who laughs so loud and for so long people think he'll have a heart attack. When eventually he stops laughing, someone yells: “Hey Fred, why did you laugh so much?” “I'd never heard that one!”

improbableinf•1mo ago

So you are not the part of a lucky 10,000 today…

pylotlight•1mo ago

I feel like the same top 5~ are often repeated so it becomes easy to guess.

Marciplan•1mo ago

I think, in the spirit of the xkcd, you were supposed to pretend you have never heard of it

OJFord•1mo ago

I know exactly what you mean. It broke my workflow too.

rsanek•1mo ago

should have a (2022) label

apetrov•1mo ago

it's an ongoing project, the last lecture is about i year old

mcapodici•1mo ago

A bit of shameless plug, I wrote 2 articles about this after doing the course a while ago.

https://martincapodici.com/2023/07/15/no-local-gpu-no-proble...

https://martincapodici.com/2023/07/19/modal-com-and-nanogpt-...

Flere-Imsaho•1mo ago

I'm not sure how it compares, but another option is the Hugging Face learning portal [0]. I'm doing the Deep RL Course and so far it's pretty straight forward (although when it gets math heavy I'm going to suffer).

[0] - https://huggingface.co/learn

canpan•1mo ago

I found the Karpathy videos very approachable. While I did study CS, I never went deep into ML. My main knowledge about matrices is for graphic development, so vectors and matrices up to 4x4 in size only. But following the videos, starting to learn about backprob and building the tiny GPT was understandable to me.

Karpathy's lessons are great to really grog the background and underlying basics. They do not go into the many libraries available, the course you link might be more practically applicable.

BinaryMachine•1mo ago

Meh, I took a couple Hugging Face courses, I might not take them again.

The grading system forces you to write specifically to pass their LLM grading system, terrible design. Maybe its gotten better I had to constantly look up how to write the correct answer just to pass their automatic grading system. Not a good way to learn and time wasted.

Karpathy videos posted here are GOLD.

webdevver•1mo ago

its fun seeing HN articles with huge upvotes but no comments, similar to when some super esoteric maths gets posted: everyone upvotes out of a common understanding of its genius, but indeed by virtue of its genius most of us are not sufficiently cognitively gifted to provide any meaningful commentary.

the karpathy vids are very cool but having watched it, for me the takeaway was "i had better leave this for the clever guys". thankfully digital carpentry and plumbing is still in demand, for now!

larodi•1mo ago

everyone understood overnight what vibe-coding was, but only dared to go through the looking mirror and try to grok what the mirror is made of.

apetrov•1mo ago

actually it's quite the opposite: lectures are as approachable as one could possible make them, no fancy math and a walkthrough over attention is all you need

pbd•1mo ago

what next now tho? i co-incidentally completed watching his last vid of training up gpt-2 today :-) .

butanyways•1mo ago

maybe creating a simple "pytorch like library" and training models using that? No?

kirurik•1mo ago

Saving this

esafak•1mo ago

You just click 'favorite' and it appears in https://news.ycombinator.com/favorites?id=kirurik

kirurik•3w ago

Thank you for replying to me and recommending a feature - that I didnt know about!

cube2222•1mo ago

I’ve gone through this series of videos earlier this year.

In the past I’ve gone through many “educational resources” about deep neural networks - books, coursera courses (yeah, that one), a university class, the fastai course - but I don’t work with them at all in my day to day.

This series of videos was by far the best, most “intuition building”, highest signal-to-noise ratio, and least “annoying” content to get through. Could of course be that his way of teaching just clicks with me, but in general - very strong recommend. It’s the primary resource I now recommend when someone wants to get into lower level details of DNNs.

3abiton•1mo ago

Karpathy has a great intuitive style, but sometimes it's too dumbed down. If you come from adjacent fields, it might be a bit dragging, but it's always entertaining

ronbenton•1mo ago

>Karpathy has a great intuitive style, but sometimes it's too dumbed down

As someone who has tried some teaching in the past, it's basically impossible to teach to an audience with a wide array of experience and knowledge. I think you need to define your intended audience as narrowly as possible, teach them, and just accept that more knowledgeable folk may be bored and less knowledgeable folk may be lost.

miki123211•1mo ago

I think this is where LLM-assisted education is going to shine.

An LLM is the perfect tool to fill the little gaps that you need to fill to understand that one explanation that's almost at your level, but not quite.

mlmonkey•1mo ago

When I was an instructor for courses like "Intro to Programming", this was definitely the case. The students ranged from "have never programmed before" to "I've been writing games in my spare time", but because it was a prerequisite for other courses, they all had to do it.

Teaching the class was a pain in the ass! What seemed to work was to do the intro stuff, and periodically throw a bone to the smartasses. Once I had them on my side, it became smooth sailing.

lazarus01•1mo ago

I like Karpathy, we come from the same lineage and I am very proud of him for what he's accomplished, he's a very impressive guy.

In regards to deep learning, building deep learning architecture is one of my greatest joys in finding insights from perceptual data. Right now, I'm working on spatiotemporal data modeling to build prediction systems for urban planning to improve public transportation systems. I build ML infrastructure too and plan to release an app that deploys the model in the wild within event streams of transit systems.

It took me a month to master the basics and I've spent a lot of time with online learning, with Deeplearning.ai and skills.google. Deeplearning.ai is ok, but I felt the concepts a bit dated. The ML path at skills.google is excellent and gives a practical understanding of ML infrastructure, optimization and how to work with gpus and tpus (15x faster than gpus).

But the best source of learning for me personally and makes me a confident practitioner is the book by Francois Chollet, the creator of Keras. His book, "Deep Learning with Python", really removed any ambiguity I've had about deep learning and AI in general. Francois is extremely generous in how he explains how deep learning works, over the backdrop of 70 years of deep learning research. Francois keeps it updated and the third revision was made in September 2025 - its available online for free if you don't want to pay for it. He gives you the recipe for building a GPT and Diffusion models, but starts from the ground floor basics of tensor operations and computation graphs. I would go through it again from start to finish, it is so well written and enjoyable to follow.

The most important lesson he discusses is that "Deep learning is more of an art than a science". To get something working takes a good amount of practice and the results on how things work can't always be explained.

He includes notebooks with detailed code examples with Tensorflow, Pytorch and Jax as back ends.

Deep learning is a great skill to have. After reading this book, I can recreate scientific abstracts and deploy the models into production systems. I am very grateful to have these skills and I encourage anyone with deep curiosity like me to go all in on deep learning.

nemil_zola•1mo ago

The project you mentioned you are working sounds interesting. Do you have more to share ?

I’m curious how ML/AI is leveraged in the domain of public transport. And what can it offer when compared to agent based models.

lazarus01•1mo ago

The project I’m working on emulates a scientific abstract. I’m not a scientist by any means, but am adapting an abstract to the public transit system in NYC. I will publish the project on my website when it’s done. I think it’s a few weeks away. I built the dataset, now doing experimental model training. If I can get acceptable accuracy, I will deploy in a production system and build a UI.

Here is a scientific abstract that inspired my to start building this system. -> https://arxiv.org/html/2510.03121

I am unfamiliar with agent based models, sorry I can’t offer any personal insight there, but I ran your question through Gemini and here is the AI response:

Based on the scientific abstract of the paper *"Real Time Headway Predictions in Urban Rail Systems and Implications for Service Control: A Deep Learning Approach"* (arXiv:2510.03121), agent-based models (ABMs) and deep learning (DL) approaches compare as follows:

### 1. Computational Efficiency and Real-Time Application

* *Deep Learning (DL):* The paper proposes a *ConvLSTM* (Convolutional Long Short-Term Memory) framework designed for high computational efficiency. It is specifically intended to provide real-time predictions, enabling dispatchers to evaluate operational decisions instantly. * *Agent-Based Models (ABM):* While the paper does not use ABMs, it contrasts its DL approach with traditional *"computationally intensive simulations"*—a category that includes microscopic agent-based models. ABMs often require significant processing time to simulate individual train and passenger interactions, making them less suitable for immediate, real-time dispatching decisions during operations.

### 2. Modeling Methodology

* *Deep Learning (DL):* The approach is *data-driven*, learning spatiotemporal patterns and the propagation of train headways from historical datasets. It captures spatial dependencies (between stations) and temporal evolution (over time) through convolutional filters and memory states without needing explicit rules for train behavior. * *Agent-Based Models (ABM):* These are typically *rule-based and bottom-up*, modeling the movement of each train "agent" based on signaling rules, spacing, and train-following logic. While highly detailed, they require precise calibration of individual agent parameters.

### 3. Handling Operational Control

* *Deep Learning (DL):* A key innovation in this paper is the direct integration of *target terminal headways* (dispatcher decisions) as inputs. This allows the model to predict the downstream impacts of a specific control action (like holding a train) by processing it as a data feature. * *Agent-Based Models (ABM):* To evaluate a dispatcher's decision in an ABM, the entire simulation must typically be re-run with new parameters for the affected agents, which is time-consuming and difficult to scale across an entire metro line in real-time.

### 4. Use Case Scenarios

* *Deep Learning (DL):* Optimized for *proactive operational control* and real-time decision-making. It is most effective when large amounts of historical tracking data are available to train the spatiotemporal relationships. * *Agent-Based Models (ABM):* Often preferred for *off-line evaluation* of complex infrastructure changes, bottleneck mitigation strategies, or microscopic safety analyses where the "why" behind individual train behavior is more important than prediction speed.

zingar•1mo ago

I have lots of non-AI software experience but nothing with AI (apart from using LLMs like everyone else). Also I did an introductory university course in AI 20 years ago that I’ve completely forgotten.

Where do I get to if I go through this material?

Enough to build… what? Or contribute on… ? Enough knowledge to have useful conversations on …? Enough knowledge to understand where to … is useful and why?

Where are the limits, what is it that the AI researchers have that this wouldn’t give?

p1esk•1mo ago

Strange question. If you don’t know why you need this, you probably don’t. It will be the same as with the introductory AI course you did 20 years ago.

HarHarVeryFunny•1mo ago

Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.

A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".

https://github.com/yanshengjia/ml-road/blob/master/resources...

Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".

ruraljuror•1mo ago

I think they meant the result— not the content—would be the same.

eps•1mo ago

Neural nets were taught in my Uni in the late 90s. They were presented as the AI technique, which was however computationally infeasible at the time. Moreover, it was clearly stated that all supporting ideas were developed and researched 20 years prior, and the field was basically stagnated due to hardware not being there.

HarHarVeryFunny•1mo ago

I remember reading "neural network" articles back from late 80's, early 90's, which weren't just about ANNs, but also other connectionist approaches like Kohonen's Self-Organizing Maps and Stephen Grossberg's Adaptive Resonance Theory (ART) ... I don't know how your university taught it, but back then this seemed more futuristic brain-related stuff, not a practical "AI" technique.

CamperBob2•1mo ago

Anyone who watches the videos and follows along will indeed come up to speed on the basics of neural nets, at least with respect to MLPs. It's an excellent introduction.

HarHarVeryFunny•1mo ago

Sure, the basics of neural nets, but it seems just as a foundation leading to LLMs. He doesn't cover the zoo of ANN architectures such as ResNets, RNNs, LSTMs, GANs, diffusion models, etc, and barely touches on regularization, optimization, etc, other than mentioning BatchNorm and promising ADAM in a later video.

It's a useful series of videos no doubt, but his goal is to strip things down to basics and show how an ANN like a Transformer can be built from the ground up without using all the tools/libraries that would actually be used in practice.

zingar•1mo ago

My introductory course used that exact textbook and I still have it on my shelf :).

It has a chapter or two on NNs and even mentions back propagation in the index, but the majority of the book focuses elsewhere.

baxuz•1mo ago

A bit of a tangential topic — what would you recommend to someone who wants to get into computer vision and 3D (NERFs, photogrammetry, 3DGS etc)?

For someone who has a middling amount of math knowledge, what would you recommend?

I went to uni 15y ago, but only had "proper" math in the first 2 semesters, let's says something akin to Calculus 1 and Linear Algebra 1. Hated math back then, plus I had horrible habits.

a_r41•1mo ago

I've been working in the novel view synthesis domain since 2019 and I would recommend starting with "nerfstudio". The documentation does a good job of explaining all the components involved (from dataset to final learned representation), the code is readable and it's relatively simple to set up and run. I think it's a nice place to start from before diving deeper into the latest that is going on in the 3D space.

jaccola•1mo ago

For learning 3dgs (and its derivatives) I would recommend grabbing the original 3d Gaussian Splatting paper + repository and going through it and using an LLM to ask many questions.

LLMs aren't that great at explaining concepts a lot of the time so when you get stuck there, google around and learn that subtopic. E.g. you will come across "Jacobian" that you may or may not have seen before, but you can search Youtube and find a great Khan Academy/3b1b collab explaining it.

Get the code running also, play around with parameters, try to implement the whole thing from scratch, making sure you intuitively understand each part with the above method.

Obviously time scales vary for everyone, that having been said: I'd guess if you have a decent technical background, are OK feeling uncomfortable with the maths for a while (it is all understandable after a bit of pain), and are willing to keep plugging for a few hours a day you will have a very decent understanding in 6mo, and probably be "cutting edge" in a year or so (obviously the learning never ends, it is an active area of research after all!)

chronicler•1mo ago

I don't even have enough knowledge to grasp the first video. Is there a list of knowledge requirements to look at?

_ea1k•1mo ago

3blue1brown videos are great if you want to go deep on the math behind it.

If you are struggling with the neural network mechanics themselves, though, I'd recommend just skimming them once and then going back for a second watch later. The high level overview will make some of the early setup work make much more sense in a second viewing.

HarHarVeryFunny•1mo ago

IMO that's a bit of a strange video for Karpathy to start with, perhaps even to include at all.

Let me explain why ...

Neural nets are trained by giving them lots of example inputs and outputs (the training data) and incrementally tweaking their initially random weights until they do better and better at matching these desired outputs. The way this is done is by expressing the difference between the desired and current (during training) outputs as an error function, parameterized by the weights, and finding the values of the weights that correspond to the minimum value of this error function (minimum errors = fully trained network!).

The way the minimum of the error function is found is simply by following its gradient (slope) downhill until you can't go down any more, which is hopefully the global minimum. This requires that you have the gradient (derivative) of the error function available so you know what direction (+/-) to tweak each of the weights to go in the downhill error direction, which will bring us to Karpathy's video ...

Neural nets are mostly built out of lego-like building blocks - individual functions (sometimes called nodes, or layers) that are chained/connected together to incrementally transform the neural network's input into it's output. You can then consider the entire neural net as a single giant function outputs = f(inputs, weights), and from this network function you can create the error function needed to train it.

One way to create the derivative of the network/error function is to use the "chain rule" of calculus to derive the combined derivative of all these chained functions from their own individual pre-defined derivative functions. This is the way that most machine learning frameworks, such as TensorFlow, and the original Torch (pre-PyTorch) worked. If you were using a machine learning framework like this then you would not need Karpathy's video to understand how it is working under the hood (if indeed that is something you care about at all!).

The alternative, PyTorch way, of deriving the derivative of the neural network function, is more flexible, and doesn't require you to build the network just out of nodes/layers that you already have the derivative functions for. The way PyTorch works is to let you just use regular Python code to define your neural network function, then record this python code as it runs to capture what it is doing as the definition of neural network function. Given this dynamically created neural network function, PyTorch (and other similar machine learning frameworks) then uses a built-in "autograd" (automatic gradient) capability to automatically create the derivative (gradient) of your network function, without someone having had to do that manually, as was the case for each of the lego building blocks in the old approach.

What that first video of Karpathy's is explaining is how this "autograd" capability works, which would help you build your own machine learning framework if you wanted to, or at least understand how PyTorch is working under the hood to create the network/error function derivative for you, that it will be using to train the weights. I'm sure many PyTorch users happily use it without caring how it's working under the hood, just as most developers happily use compilers without caring about exactly how they are working. If all you care about is understanding generally what PyTorch is doing under the hood, then this post may be enough!

For an introduction to machine learning, including neural networks, that assumes no prior knowledge other than hopefully being able to program a bit in some language, I'd recommend Andrew Ng's Introduction to ML courses on Coursera. He's modernized this course over the years, so I can't speak for the latest version, but he is a great educator and I trust that the current version is just as good as his old one that was my intro to ML (building neural nets just using MATLAB rather than using any framework!).

nobodyistaken•1mo ago

This is great, but if I'm starting ML from scratch, what would you recommend? I'm coming from a webdev background and have used LLMs but nothing about ML, might even need the refresher on math, I think.

lazarus01•1mo ago

https://deeplearningwithpython.io/

nobodyistaken•1mo ago

Is it wise to start to with deep learning without knowing machine learning?

lazarus01•1mo ago

That's a great question. Machine Learning is the overarching space where deep learning is a subspace of machine learning. So if you grasp some basic concepts of machine learning, then you can apply them to deep learning.

All the exciting innovation over the past 13 years comes from deep learning mainly in working with images and natural language.

Machine learning is good for tabular data problems, particularly decision trees, that work well to reduce uncertainty for business outcomes, like sales and marketing as one example.

Machine Learning Basics:

Linear regression - Y = Mx + B (predicts a future value) Classification (logistic regression) - Y = 1 / 1 + e^-(b0 + b1x) (predicts probability of a class or future event)

There is a common learning process between the two called gradient descent. It starts with the loss function, that measures the error between predictions and ground truth, where you backpropogate the errors as a feedback signal to update the learned weights which are the parameters of your ml model which is a more meaningful representation of your dataset that you train on.

In deep learning it's more appropriate for perception problems, like vision ,language and time sequences. It gets more complex where you are dealing with significantly more parameters in the millions, that are organized in hierarchical layer representation.

There are different layers for different types of learning representation, Convolutions for Images and RNN for Sequence to Sequence learning and many more examples of layers, which are the basis of all deep learning models.

So there is a small conceptual overlap; but I would say deep learning has a wider variety of interesting applications, is much more challenging to learn, but not impossible by any stretch.

There is no harm in giving it a try and diving in. If you get lost and drown in complexity, start with machine learning. It took me 3 years to grasp, so it's a marathon, not a sprint.

Hope this helps

meken•1mo ago

Has anyone gone through cs231n and this as well?

I went through the former and it was one of the best classes I’ve ever taken. But I’ve been procrastinating on going through this because it seems like there’s a lot of overlap and the benefit seems marginal (I guess transformers are covered here?).

misiti3780•1mo ago

I just finished this series and found it very useful. Especially the back-propagation lectures.

lfliosdjf•1mo ago

I wish Karpathy's star fleet academy becomes a huge success.

ed4bb9fb7c•1mo ago

Is there a text tutorial of this approach building NN from scratch? As a dad I simply don’t have a chance to watch this. Also maybe something for more math inclined? (MS in math) Deep learning in python that is recommended in other comments is way too basic and slow and hand wavy imo.

npalli•1mo ago

This is a good resource, however for about 99.99% of people, you are most likely to just use a foundation model like ChatGPT, Claude, Gemini etc. so this knowledge/training will get you neither here or there. I would suggest you look into another Karpathy's video -- Deep Dive into LLMs like ChatGPT.

https://www.youtube.com/watch?v=7xTGNNLPyMI

kamranjon•1mo ago

"Prerequisites: ... intro-level math (e.g. derivative, gaussian)"

Anyone got recommendations for learning resources for this type of math? Realizing now that I might be a bit behind on my intro-level math.

chandureddyvari•1mo ago

3b1b yt channel calculus & LA

https://explained.ai/matrix-calculus/

khan academy - Multivariable Calculus course by Grant Sanderson(3b1b fame)

nickpsecurity•1mo ago

Coursera and Udemy have Math for Machine Learning Courses. Udemy is self-paced. If you need, you can pause to learn an unforseen prerequisite.

I bought John Krohn's Mathematical Foundations and Krista King's Statistics and Probability.

cjamsonhn•1mo ago

Highly recommend this as well. Does a great job of helping you build intuition for why things like gradient descent and normalization work. Also gets into the weeds on training dynamics and how to ensure they are behaving properly

m3kw9•1mo ago

Does learning this still matter now?

shwaj•1mo ago

Matter to who? If you want to deeply understand how this technology works, this is still relevant. If you want to vibe code, maybe not.

lazarus01•1mo ago

Yes, the current technology cannot replace an engineer.

The easiest way to understand why is by understanding natural language. A natural language like english is very messy and and doesn't follow formal rules. It's also not specific enough to provide instructions to a computer, that's why code was created.

The AI is incredibly dumb when it comes to complex tasks with long range contexts. It needs an engineer that understands how to write and execute code to give it precise instructions or it is useless.

Natural Language Processing is so complex, it started around the end of world war two and we are just now seeing innovation in AI where we can mimmick humans, where the AI can do certain things faster than humans. But thinking is not one of them.

CamperBob2•1mo ago

LOL. Figuring out how to solve IMO-level math problems without "thinking" would be even more impressive than thinking itself. Now there's a parrot I'd buy.

lazarus01•1mo ago

It isn't thinking it's RL with reward hacking.

It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems.

It's just memorizing a trajectory as part of a specific goal. That's what RL is.

CamperBob2•1mo ago

It's like taking a student who wins a gold in IMO math, but can't solve easier math problems

I've tried to think of specific follow-up questions that will help me understand your point of view, but other than "Cite some examples of easier problems than a successful IMO-level model will fail at," I've got nothing. Overfitting is always a risk, but if you can overfit to problems you haven't seen before, that's the fault of the test administrators for reusing old problem forms or otherwise not including enough variety.

GPT itself suggests[1] that problems involving heavy arithmetic would qualify, and I can see that being the case if the model isn't allowed to use tools. However, arithmetic doesn't require much in the way of reasoning, and in any case the best reasoning models are now quite decent at unaided arithmetic. Same for the tried-and-true 'strawberry' example GPT cites, involving introspection of its own tokens. Reasoning models are much better at that than base models. Unit conversions were another weakness in the past that no longer seems to crop up much.

So what would some present-day examples be, where models that can perform complex CoT tasks fail on simpler ones in ways that reveal that they aren't really "thinking?"

1: https://chatgpt.com/share/695be256-6024-800b-bbde-fd1a44f281...

lazarus01•1mo ago

In response to your direct question -> https://gail.wharton.upenn.edu/research-and-insights/tech-re...

“ This indicates that while CoT can improve performance on difficult questions, it can also introduce variability that causes errors on “easy” questions the model would otherwise answer correctly.”

Other response to strawberry example; There are 25,000 people employed globally that repair broken responses and create training data, a big whack-a-mole effort to remediate embarrassing errors.

CamperBob2•1mo ago

(Shrug) Ancient models are ancient. Please provide specific examples that back up your point, not obsolete .PDFs to comb through.

lazarus01•3w ago

Your ideas are quite weak and you ask for overwhelming proof, but not willing to read any research. That’s just intellectually lazy.

Perhaps if you took some time to learn from the experts, those who create these systems and really understand what’s happening you would realize these limitations in AI are widely known.

Take a look around the 5 minute mark.

https://youtu.be/PqVbypvxDto?si=gZq-2yEuE4sTeQZe

Just understand you are dead wrong in your assumptions.

CamperBob2•2w ago

You appear to be arguing with someone who isn't here (or else you replied to the wrong post.) Your personal fallacy of choice appears to be, "LLMs aren't godlike and infallible only a few years after being invented, despite absolutely no one ever claiming they were, so it's all a bunch of empty hype."

No one cares about the state of the art. Only the first couple of time derivatives matters. You're not getting smarter, but the models are.

How are those examples coming along, by the way? The ones that prove that IMO-level models aren't reasoning, but just getting really, really lucky?

_giorgio_•1mo ago

He should have done more of these simple videos, instead he aimed for a bigger target... it has been two years, and still nothing.

Forever grateful for this series anyway.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Homeland Security Spying on Reddit Users

Hoot: Scheme on WebAssembly

LLMs as the new high level language

Stories from 25 Years of Software Development

Total Surface Area Required to Fuel the World with Solar (2009)

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

FDA intends to take action against non-FDA-approved GLP-1 drugs

Vouch

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Axiomeer – An open marketplace for AI agents

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Where did all the starships go?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

The F Word

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

Haskell for all: Beyond agentic coding

SectorC: A C Compiler in 512 bytes (2023)

Speed up responses with fast mode

Software factories and the agentic moment

Brookhaven Lab's RHIC concludes 25-year run with final collisions

Homeland Security Spying on Reddit Users

Hoot: Scheme on WebAssembly

LLMs as the new high level language

Stories from 25 Years of Software Development

Total Surface Area Required to Fuel the World with Solar (2009)

First Proof

Vocal Guide – belt sing without killing yourself

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Al Lowe on model trains, funny deaths and working with Disney

FDA intends to take action against non-FDA-approved GLP-1 drugs

Vouch

Show HN: A luma dependent chroma compression algorithm (image compression)

Show HN: Axiomeer – An open marketplace for AI agents

Start all of your commands with a comma (2009)

The AI boom is causing shortages everywhere else

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

I write games in C (yes, C) (2016)

Learning from context is harder than we thought

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

Selection rather than prediction

Where did all the starships go?

Unseen Footage of Atari Battlezone Arcade Cabinet Production

The F Word

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Neural Networks: Zero to Hero

Comments