frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Your URL Is Your State

https://alfy.blog/2025/10/31/your-url-is-your-state.html
1•thm•2m ago•0 comments

Nubeian Translation for Childhood Songs by Hamza El Din

https://nubianfoundation.org/translations/
1•tzury•8m ago•0 comments

Show HN: Hephaestus – Autonomous Multi-Agent Orchestration Framework

https://github.com/Ido-Levi/Hephaestus
1•idolevi•8m ago•0 comments

AWS emerges as 'sole bidder' for HMRC's £500M datacentre migration project

https://www.computerweekly.com/news/366633603/AWS-emerges-as-sole-bidder-for-HMRCs-500m-datacentr...
1•latein•10m ago•0 comments

Show HN: Auto-Adjust Keyboard and LCD Brightness via Ambient Light Sensor[Linux]

https://github.com/donjajo/als-led-backlight
1•donjajo•11m ago•0 comments

Using FreeBSD to make self-hosting fun again

https://jsteuernagel.de/posts/using-freebsd-to-make-self-hosting-fun-again/
3•todsacerdoti•14m ago•0 comments

Alta Router marketed as IPS not IPS

https://forum.alta.inc/t/ids-ips-automatically-block-attempting-ip-after-x-number-of-alerts/4886?...
1•anon-moose•14m ago•0 comments

Why I love my Boox Palma e-reader

https://minimal.bearblog.dev/why-i-love-my-boox-palma-e-reader/
2•pastel5•17m ago•0 comments

Scientists Generate Matter Directly from Light (2021)

https://scitechdaily.com/scientists-generate-matter-directly-from-light-physics-phenomena-predict...
2•ciconia•22m ago•0 comments

California Wildfire Map/Tracker (2025)

https://www.latimes.com/wildfires-map/
1•hamonrye•22m ago•0 comments

Social media apps are getting worse so I created one by myself

https://www.sweatbuzz.app
2•icleanuc•24m ago•0 comments

30x30 pixels grayscale camera made out of an optical mouse sensor

https://old.reddit.com/r/3Dprinting/comments/1olyzn6/i_made_a_camera_from_an_optical_mouse_30x30/
2•eps•25m ago•0 comments

Application is down – it is always your fault

https://www.ufried.com/blog/it_is_your_fault/
1•BinaryIgor•32m ago•0 comments

SRI and Arc

https://www.abortretry.fail/p/sri-and-arc
2•klelatti•34m ago•0 comments

How AI browsers sneak past blockers and paywalls

https://www.cjr.org/analysis/how-ai-browsers-sneak-past-blockers-and-paywalls.php
1•thm•41m ago•0 comments

Grand Egyptian Museum

https://en.wikipedia.org/wiki/Grand_Egyptian_Museum
1•tosh•42m ago•0 comments

Xi Jinping Joked About Espionage

https://www.nytimes.com/2025/11/02/world/asia/xi-jinping-china-south-korea-spying.html
1•fleahunter•42m ago•0 comments

Vibecoding my way to a crit on GitHub

https://furbreeze.github.io/2025/10/28/vibecoding-my-way-to-a-crit-on-github.html
1•jgeralnik•46m ago•0 comments

Show HN: Emdash – Coding Agent Orchestrator Powered by Git Worktrees

https://github.com/generalaction/emdash
1•arnestrickmann•48m ago•0 comments

Show HN: Neustream – Multistream to all platforms from one place

https://neustream.app/
1•thefarseen•50m ago•0 comments

Rewilding the Internet

https://www.protein.xyz/rewilding-the-internet/
1•thinkingemote•51m ago•0 comments

In a First, AI Models Analyze Language as Well as a Human Expert

https://www.quantamagazine.org/in-a-first-ai-models-analyze-language-as-well-as-a-human-expert-20...
1•nsoonhui•52m ago•0 comments

Revisiting Interface Segregation in Go

https://rednafi.com/go/interface-segregation/
1•ingve•58m ago•0 comments

Columnar and the ADBC Driver Foundry

https://columnar.tech/blog/announcing-columnar/
1•refset•1h ago•0 comments

(1) the Great American Soybean Con Job [video]

https://www.youtube.com/watch?v=PYEMuzss1Ys
1•xbmcuser•1h ago•0 comments

Comparison Traits – Understanding Equality and Ordering in Rust

https://itsfoxstudio.substack.com/p/comparison-traits-understanding-equality
1•rpunkfu•1h ago•0 comments

Leaving the Freedesktop.org Community

https://vt.social/@lina/115431232807081648
8•birdculture•1h ago•7 comments

A Death Train Is Haunting South Florida

https://www.theatlantic.com/technology/2025/10/brightline-train-florida/684624/
1•raw_anon_1111•1h ago•0 comments

Parsing with zippers improves parsing with derivatives

https://dl.acm.org/doi/10.1145/3408990
2•fanf2•1h ago•0 comments

Trump threatens to go into Nigeria 'guns-a-blazing' over attacks on Christians

https://www.theguardian.com/us-news/2025/nov/01/trump-nigeria-christian-persecution
3•prmph•1h ago•0 comments
Open in hackernews

Backpropagation is a leaky abstraction (2016)

https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b
149•swatson741•5h ago

Comments

joshdavham•4h ago
Given that we're now in the year 2025 and AI has become ubiquitous, I'd be curious to estimate what percentage of developers now actually understand backprop.

It's a bit snarky of me, but whenever I see some web developer or product person with a strong opinion about AI and its future, I like to ask "but can you at least tell me how gradient descent works?"

I'd like to see a future where more developers have a basic understanding of ML even if they never go on to do much of it. I think we would all benefit from being a bit more ML-literate.

kojoru•3h ago
I'm wondering: how can understanding gradient descent help in building AI systems on top of LLMs? To mee it feels like the skills of building "AI" are almost orthogonal to skills of building on top of "AI"
joshdavham•3h ago
I take your point in that they are mostly orthogonal in practice, but with that being said, I think understanding how these AI's were created is still helpful.

For example, I believe that if we were to ask the average developer about why LLM's behave randomly, they would not be able to answer. This to me exposes a fundamental hole in their knowledge of AI. Obviously one shouldn't feel bad about not knowing the answer, but I think we'd benefit from understanding the basic mathematical and statistical underpinnings on these things.

Al-Khwarizmi•23m ago
You can still understand that quite well without understanding backprop, though.

All you need is:

- Basic understanding of how a Markov chain can generate text (generating each word using corpus statistics on the previous few words).

- Understanding that you can then replace the Markov chain with a neural model which gives you more context length and more flexibility (words are now in a continuous space so you don't need to find literally the same words, you can exploit synonyms, similarity, etc., plus massive training data also helps).

- Finally, you add the instruction tuning (among all the plausible continuations the model could choose, teach it to prefer the ones human prefer - e.g. answering a question rather than continuing with a list of similar questions. You give the model cookies or slaps so it learns to prefer the answers humans prefer).

- But the core is still like in the Markov chain (generating each word using corpus statistics on the previous words).

I often give dissemination talks on LLMs to the general public and I have the feeling that with this mental model, you can basically know everything a lay user needs to know about how they work (you can explain things like hallucinations, stochastic nature, relevance of training data, relevance of instruction tuning, dispelling myths like "they always choose the most likely word", etc.) without any calculus at all; although of course this is subjective and maybe some people will think that explaining it in this way is heresy.

lock1•3h ago

  > I'd like to see a future where more developers have a basic understanding of ML even if they never go on to do much of it. I think we would all benefit from being a bit more ML-literate.
Why "ML-literate" specifically? Also, there are some people against calculus and statistic in CS curriculum because it's not "useful" or "practical", why does ML get special treatment here?

Plus, I don't think a "gotcha" question like "what is gradient descent" will give you a good signal about someone if it get popularized. It probably will lead to the present-day OOP cargo cult, where everyone just memorizes whatever their lecturer/bootcamp/etc and repeats it to you without actually understanding what it does, why it's the preferred method over other strategies, etc.

joshdavham•38m ago
> Why "ML-literate" specifically?

We could also say AI-literate too, I suppose. I guess I just like to focus on ML generally because 1) most modern AI is possible only due to ML and 2) it’s more narrow and emphasizes the low level of how AI works.

confirmmesenpai•2h ago
so if you want to have a strong opinion on electric cars you need to be able to explain how an electric engine works right?
augment_me•2h ago
Impossible requirement. The inherent quality of abstractions is to allow us to get more done without understanding everything. We dont write raw assembly for the same reason, you dont make fire by rubbing sticks, you dont go hunting for food in the woods, etc.

There is no need for the knowledge that you propose in a world where this is solved, you will achieve more goals by utilizing higher-level tools.

joshdavham•34m ago
I get your point and this certainly applies to most modern computing where each new layer of abstraction becomes so solid and reliable that devs can usually afford to just build on top of it without worrying about how it works. I don’t believe this applies to modern AI/ML however. Knowing the chain rule, gradient descent and basic statistics IMO is not the same level of solid as other abstractions in computing. We can’t afford to not know these things. (At least not yet!)
gchadwick•3h ago
Karpathy's contribution to teaching around deep learning is just immense. He's got a mountain of fantastic material from short articles like this, longer writing like https://karpathy.github.io/2015/05/21/rnn-effectiveness/ (on recurrent neural networks) and all of the stuff on YouTube.

Plus his GitHub. The recently released nanochat https://github.com/karpathy/nanochat is fantastic. Having minimal, understandable and complete examples like that is invaluable for anyone who really wants to understand this stuff.

throwaway290•3h ago
And to all the LLM heads here, this is his work process:

> Yesterday I was browsing for a Deep Q Learning implementation in TensorFlow (to see how others deal with computing the numpy equivalent of Q[:, a], where a is an integer vector — turns out this trivial operation is not supported in TF). Anyway, I searched “dqn tensorflow”, clicked the first link, and found the core code. Here is an excerpt:

Notice how it's "browse" and "search" not just "I asked chatgpt". Notice how it made him notice a bug

stingraycharles•3h ago
First of all, this is not a competition between “are LLMs better than search”.

Secondly, the article is from 2016, ChatGPT didn’t exist back then

code51•2h ago
I doubt he's letting LLM creep in to his decision-making in 2025, aside from fun side projects (vibes). We don't ever come across Karpathy going to an LLM or expressing that an LLM helped in any of his Youtube videos about building LLMs.

He's just test driving LLMs, nothing more.

Nobody's asking this core question in podcasts. "How much and how exactly are you using LLMs in your daily flow?"

I'm guessing it's like actors not wanting to watch their own movies.

danielbln•2h ago
https://news.ycombinator.com/item?id=45788753
mquander•2h ago
Karpathy talking for 2 hours about how he uses LLMs:

https://www.youtube.com/watch?v=EWvNQjAaOHw

code51•1h ago
Vibing, not firing at his ML problems.

He's doing a capability check in this video (for the general audience, which is good of course), not attacking a hard problem in ML domain.

Despite this tweet: https://x.com/karpathy/status/1964020416139448359 , I've never seen him citing an LLM helped him out in ML work.

confirmmesenpai•2h ago
> Continuing the journey of optimal LLM-assisted coding experience. In particular, I find that instead of narrowing in on a perfect one thing my usage is increasingly diversifying

https://x.com/karpathy/status/1959703967694545296

confirmmesenpai•2h ago
what you did here is called confirmation bias.

> I think congrats again to OpenAI for cooking with GPT-5 Pro. This is the third time I've struggled on something complex/gnarly for an hour on and off with CC, then 5 Pro goes off for 10 minutes and comes back with code that works out of the box. I had CC read the 5 Pro version and it wrote up 2 paragraphs admiring it (very wholesome). If you're not giving it your hardest problems you're probably missing out.

https://x.com/karpathy/status/1964020416139448359

kubb•2h ago
I was slightly surprised that my colleagues, who are extremely invested in capabilities of LLMs, didn’t show any interest in Karpathy’s communication on the subject when I recommended it to them.

Later I understood that they don’t need to understand LLMs, and they don’t care how they work. Rather they need to believe and buy into them.

They’re more interested in science fiction discussions — how would we organize a society where all work is done by intelligent machines — than what kinds of tasks are LLMs good at today and why.

teiferer•1h ago
Which is terrible. That's the root of all the BS around LLMs. People lacking understanding of what they are and ascribing capabilities which LLMs just don't have, by design. Even HN discussions are full of that. Even though this page literally has "hacker" in its name.
kubb•1h ago
I’m trying not to be disappointed by people, I’d rather understand what’s going on in their minds, and how to navigate that.
tim333•1m ago
I see your point but on the other hand a lot of conversations go: A: what will we do when AI do all the jobs, B: that's silly LLMs can't do the jobs. The thing is A didn't say LLM, they said AI as in whatever that will be a short while into the future.
Al-Khwarizmi•1h ago
What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works (sentence written by a human despite use of "delve"). Everyone should have some notions on what LLMs can or cannot do, in order to use them successfully and not be misguided by their limitations, but we don't need everyone to understand what backpropagation is, just as most of us use cars without knowing much about how an internal combustion engine works.

And the issue you mention in the last paragraph is very relevant, since the scenario is plausible, so it is something we definitely should be discussing.

Marazan•41m ago
Because if you don't understand how a tool works you can't use the tool to it's full potential.

Imagine if you were using single layer perceptrons without understanding seperability and going "just a few more tweaks and it will approximate XOR!"

kubb•22m ago
You hit the nail on the head, in my opinion.

There are things that you just can’t expect from current LLMs that people routinely expect from them.

They start out projects with those expectations. And that’s fine. But they don’t always learn from the outcomes of those projects.

Al-Khwarizmi•6m ago
I don't think that's a good analogy, becuase if you're trying to train a single layer perceptron to approximate XOR you're not the end user.
Archelaos•8m ago
> What's wrong or odd about that? You can like a technology as a user and not want to delve into how it works

The question here is whether the details are important for the major issues, or whether they can be abstracted away with a vague understanding. To what extent abstracting away is okay depends greatly on the individual case. Abstractions can work over a large area or for a long time, but then suddenly collapse and fail.

The calculator, which has always delivered sufficiently accurate results, can produce nonsense when one approaches the limits of its numerical representation or combines numbers with very different levels of precision. This can be seen, for example, when one rearranges commutative operations; due to rounding problems, it suddenly delivers completely different results.

The 2008 financial crisis was based, among other things, on models that treated certain market risks as independent of one another. Risk could then be spread by splitting and recombining portfolios. However, this only worked as long as the interdependence of the different portfolios was actually quite small. An entire industry, with the exception of a few astute individuals, had abstracted away this interdependence, acted on this basis, and ultimately failed.

As individuals, however, we are completely dependent on these abstractions. Our entire lives are permeated by things whose functioning we simply have to rely on without truly understanding them. Ultimately, it is the nature of modern, specialized societies that this process continues and becomes even more differentiated.

But somewhere there should be people who work at the limits of detailed abstractions and are concerned with researching and evaluating the real complexity hidden behind them, and thus correcting the abstraction if necessary, sending this new knowledge upstream.

The role of an expert is to operate with less abstraction and more detail in her oder his field of expertise than a non-expert -- and the more so, the better an expert she or he is.

arisAlexis•1h ago
Obviously they are more focused on making something that works
spwa4•29m ago
Wow. Definitely NOT management material then.
CuriouslyC•53m ago
I think there are a lot of people who just don't care about stuff like nanochat because it's exclusively pedagogical, and a lot of people want to learn by building something cool, not taking a ride on a kiddie bike with training wheels.
android521•38m ago
Do you go deep into molecular biology to see how it works , it is much more interesting and important
drivebyhooting•3h ago
I have a naive question about backprop and optimizers.

I understand how SGD is just taking a step proportional to the gradient and how backprop computes the partial derivative of the loss function with respect to each model weight.

But with more advanced optimizers the gradient is not really used directly. It gets per weight normalization, fudged with momentum, clipped, etc.

So really, how important is computing the exact gradient using calculus, vs just knowing the general direction to step? Would that be cheaper to calculate than full derivatives?

mgh95•3h ago
> But with more advanced optimizers the gradient is not really used directly. It gets per weight normalization, fudged with momentum, clipped, etc.

Why would these things be "fudging"? Vanishing gradients (see the initial batch norm paper) are a real thing, and ensuring that the relative magnitudes are in some sense "smooth" between layers allows for an easier optimization problem.

> So really, how important is computing the exact gradient using calculus, vs just knowing the general direction to step? Would that be cheaper to calculate than full derivatives?

Very. In high dimensional space, small steps can move you extremely far from a proper solution. See adversarial examples.

ssivark•3h ago
> So really, how important is computing the exact gradient using calculus, vs just knowing the general direction to step? Would that be cheaper to calculate than full derivatives?

Yes, absolutely -- a lot of ideas inspired by this have been explored in the field of optimization, and also in machine learning. The very idea of "stochastic" gradient descent using mini-batches basically a cheap (hardware compatible) approximation to the gradient for each step.

For a relatively extreme example of how we might circumvent the computational effort of backprop, see Direct Feedback Alignment: https://towardsdatascience.com/feedback-alignment-methods-7e...

Ben Recht has an interesting survey of how various learning algorithms used in reinforcement learning relate with techniques in optimization (and how they each play with the gradient in different ways): https://people.eecs.berkeley.edu/~brecht/l2c-icml2018/ (there's nothing special about RL... as far as optimization is concerned, the concepts work the same even when all the data is given up front rather than generated on-the-fly based on interactions with the environment)

danielmarkbruce•3h ago
Calculus isn't that complicated, at least not what's done in backprop.

How do you propose calculating the "general direction" ?

And, an example "advanced optimizer" - AdamW - absolutely uses gradients. It just does more, but not less.

blackbear_•3h ago
Two thoughts:

> how important is computing the exact gradient using calculus

Normally the gradient is computed with a small "minibatch" of examples, meaning that on average over many steps the true gradient is followed, but each step individually never moves exacty along the true gradient. This noisy walk is actually quite beneficial for the final performance of the network https://arxiv.org/abs/2006.15081 , https://arxiv.org/abs/1609.04836 so much so that people started wondering what is the best way to "corrupt" this approximate gradient even more to improve performance https://arxiv.org/abs/2202.02831 (and many other works relating to SGD noise)

> vs just knowing the general direction to step

I can't find relevant papers now, but I seem to recall that the Hessian eigenvalues of the loss function decay rather quickly, which means that taking a step in most directions will not change the loss very much. That is to say, you have to know which direction to go quite precisely for an SGD-like method to work. People have been trying to visualize the loss and trajectory taken during optimization https://arxiv.org/pdf/1712.09913 , https://losslandscape.com/

raindeer2•2h ago
The first bit is why it is called Stochastic gradient decent. You follow the gradient of a randomly chosen minibatch at each step. It basically makes you "vibrate" down along the gradient.
imtringued•54m ago
All first order methods use the gradient or Jacobian of a function. Calculating the first order derivatives is really cheap.

Non-stochastic gradient descent has to optimize over the full dataset. This doesn't matter for non-machine learning applications, because often there is no such thing as a dataset in the first place and the objective has a small fixed size. The gradient here is exact.

With stochastic gradient descent you're turning gradient descent into an online algorithm, where you process a finite subset of the dataset at a time. Obviously the gradient is no longer exact, you still have to calculate it though.

Seems like "exactness" is not that useful of a property for optimization. Also, I can't stress it enough, but calculating first order derivatives is so cheap there is no need to bother. It's roughly 2x the cost of evaluating the function in the first place.

It's second order derivatives that you want to approximate using first order derivatives. That's how BFGS and Gauss-Newton work.

macleginn•30m ago
It is possible to compute the approximate gradient (direction to step) without using the formulas: we can change the value of each parameter individually, compute the loss, set the values of all parameters in such a way that the loss is minimized, and then repeat. This means, however, that we have to do number-of-parameters forward passes for one optimization step, which is very expensive. With formulas, we can compute all these values in one backward pass.
emil-lp•3h ago
... (2016)

9 years ago, 365 points, 101 comments

https://news.ycombinator.com/item?id=13215590

alyxya•3h ago
More generally, it's often worth learning and understanding things one step deeper. Having a more fundamental understanding of things explains more of the "why" behind why some things are the way they are, or why we do some things a certain way. There's probably a cutoff point for balancing how much you actually need to know though. You could potentially take things a step further by writing the backwards pass without using matrix multiplication, or spend some time understanding what the numerical value of a gradient means.
phplovesong•3h ago
Sidenote why are people still using medium?
evbogue•3h ago
article is from 2016
joaquincabezas•3h ago
I took a course in my Master's (URV.cat) where we had to do exactly this, implementing backpropagation (fwd and backward passes) from a paper explaining it, using just basic math operations in a language of our choice.

I told everyone this was the best single exercise of the whole year for me. It aligns with the kind of activity that I benefit immensely but won't do by myself, so this push was just perfect.

If you are teaching, please consider this kind of assignments.

P.S. Just checked now and it's still in the syllabus :)

LPisGood•3h ago
I did this in highschool from some online textbook in plain Java. I recall implementing matrix multiplication myself being the hardest part.

I made a UI that showed how the weights and biases changed throughout the training iterations.

blitzar•2h ago
The difference in understanding (for me and how my brain works) between reading the paper in what appears to be a future or past alien language & doing a minimal paper / code example is massive.
joaquincabezas•2h ago
same here, even more if I'm doing it over few days and different angles
littlestymaar•2h ago
I was happy to see Karpathy writing a new blog post instead of simply Twitter threads, but when I opened the link I just got dispointed to realize it's from 9 years ago…

I really hate what Twitter did to blogging…

Geee•1h ago
He has a new blog at https://karpathy.bearblog.dev/blog/
jamesblonde•2h ago
I have to be contrarian here. The students were right. You didn't need to learn to implement backprop in NumPy. Any leakiness in BackProp is addressed by researchers who introduce new optimizers. As a developer, you just pick the best one and find good hparams for it.
_diyar•2h ago
From the perspective of the university, the students are being trained to become researchers, not engineers.
PeterStuer•2h ago
The problem with your reasoning is you never tackle your "unknown unknowns". You just assume they are "known unknowns".

Diving through the abstraction reveals some of those.

gchadwick•2h ago
It's for a CS course at Stanford not a PyTorch boot camp. It seems reasonable to expect some level of academic rigour and need to learn and demonstrate understanding of the fundamentals. If researchers aren't learning the fundamentals in courses like these where are they learning them?

You've also missed the point of the article, if you're building novel model architectures you can't magic away the leakiness. You need to understand the back prop behaviours of the building blocks you use to achieve a good training run. Ignore these and what could be a good model architecture with some tweaks will either entirely fail to train or produce disappointing results.

Perhaps you're working at a level of bolting pre built models together or training existing architectures on new datasets but this course operates below that level to teach you how things actually work.

froobius•2h ago
> Any leakiness in BackProp is addressed by researchers who introduce new optimizers

> As a developer, you just pick the best one and find good hparams for it

It would be more correct to say: "As a developer, (not researcher), whose main goal is to get a good model working — just pick a proven architecture, hyperparameters, and training loop for it."

Because just picking the best optimizer isn't enough. Some of the issues in the article come from the model design, e.g. sigmoids, relu, RNNs. And some of the issues need to be addressed in the training loop, e.g. gradient clipping isn't enabled by default in most DL frameworks.

And it should be noted that the article is addressing people on the academic / research side, who would benefit from a deeper understanding.

brcmthrowaway•2h ago
Do LLMs still use backprop?
ForceBru•2h ago
Are LLMs still trained by (variants of) stochastic GRADIENT descent? AFAIK what used to be called "backprop" is nowadays known as "automatic differentiation". It's widely used in PyTorch, JAX etc
imtringued•32m ago
Gradient descent doesn't matter here. Second order and higher methods still use lower order derivatives.

Back propagation is reverse mode auto differentiation. They are the same thing.

And for those who don't understand what back propagation is, it is just an efficient method to calculate the gradient for all parameters.

stared•1h ago
The original title is "Yes you should understand backprop" - which is good and descriptive.
WithinReason•55m ago
Karpathy suggests the following error:

  def clipped_error(x): 
    return tf.select(tf.abs(x) < 1.0, 
                   0.5 * tf.square(x), 
                   tf.abs(x) - 0.5) # condition, true, false
Following the same principles that he outlines in this post, the "- 0.5" part is unnecessary since the gradient of 0.5 is 0, therefore -0.5 doesn't change the backpropagated gradient. In addition, a nicer formula that achieves the same goal as the above is √(x²+1)
macleginn•35m ago
If we don't subtract from the second branch, there will be a discontinuity around x = 1, so the derivative will not be well-defined. Also the value of the loss will jump at this value, which will make it hard to inspect the errors, for one thing.
WithinReason•26m ago
No, that's not how backprop works. There will be no discontinuity in a backpropagated gradient.