Elevating "memoization+recursion" to "dynamic programming" seems like a development that Bellman didn't anticipate
https://cs.stackexchange.com/questions/99513/dynamic-program...
There is another mid century term which is oddly specific for something oddly general (or the other way?) "Operations Research"
Guess Bellman should have called it "Reactive Optimization", then we would have the totally appropriate "Recursive Memoization" or "Memoized Recursion"
Indeed even in layman’s terms, thinking of it as in television programming is more accurate than thinking it is related to computer programming (as is mentioned in TFA)
Can't think of a universe in which you'd learn about these things before learning "computer programming".
Computing predates computers.
As an analogy, it's not unusual to learn a good chunk of calculus before learning that "calculus" is a thing they're part of - for example, by having an over-eager physics teacher who teaches you some of it so we can understand physics material deeper, but without ever mentioning the branch of math we're now using.
(Except, well, I guess I understand what "dynamic programming" is more than I understand what the other forms of programming you mention is; "dynamic programming" is solving certain classes of recursive problems by using arrays, sometimes nested arrays, to avoid re-computation, and somehow that's supposed to be more "dynamic" than not doing that)
Well, it also isn't a development that happened. You can (always) implement dynamic programming as recursion with memoization. But an algorithm that is recursive and memoized isn't necessarily an example of dynamic programming.
The point of dynamic programming is that you set up the recursion so that recursive calls do hit the cache, not that there is a cache and if you're lucky they might hit it.
I aced the competition in Sri Lanka, but I failed to win a medal the first time I competed internationally. I solved several problems with recursion, but these failed to complete within the strict time limits allowed.
One of the contestants from another country told me: "You needed dynamic programming to solve problems 1 and 5." I then spent the next year trying to figure out what dynamic programming was. The folks over here weren't familiar with the term. In fact, I was often asked "did you mean dynamic memory allocation?"
After a while, I managed to find a book that talked about dynamic programming, and I was like "Oh, it's recursion + storing results" and "Ah, you can also do this iteratively in certain circumstances"
Bottom line, armed with this new knowledge, I ended up winning a gold medal at IOI 2001. Fun times.
Another fun anecdote from that time is that we had to take special precautions with the scoring program which connected to the participants machine over the serial port. The previous year one participant had hacked the serial port interface to manipulate the results. :-)
I used to know the ZA team leaders as well (after competing, I was a team leader for several more years).
In the Fibonacci example, it seems more like common sense to avoid recomputing expensive things. Instead we should give derogatory names to the inefficient implementations, not an opaque name to the only sane implementation.
I think it's because access to knowledge has become a lot easier, so contestants know more; plus, the competitions themselves have become more organized.
For example, last I checked, the IOI had a syllabus outlining the various algorithms contestants needed to know. During my time, it was more of a wild west.
I came to programming after I graduated, and I never really understood Dynamic Programming until I watched one of Stanford’s algorithm courses (kudos to MOOCs).
But honestly, I’ve never done anything truly useful with it—except that it gave me a new perspective on ordinary things. For example, I found that there may be great wisdom in the Taiji symbol (image here: https://commons.wikimedia.org/wiki/File:Yin_yang.svg). We can divide the current problem and conquer it at the lower levels (like one of the dots in yin and yang). And of course, those lower levels still contain their own divisions and computations—until there are none left.
At the same time, we can also start from the base and build upward (bottom-up), moving toward higher-level computations—where possible divisions await, until eventually one big unified answer emerges.
Pretty useless for its actual usage here, isn’t it? ^^
I do feel dynamic programming influences how I solve problems in my work. How, if you compose the problem correctly, it can never reach a fault state in a sense. Not sure if I'm able to explain what I mean, but for instance I recently refactored an old application at work that had lots of bugs, was hard to grasp and had loads of error checking code / asserts. However, by composing it a bit differently, modeling it properly, I could remove swaths of code / defensive programming because I with the help of the compilator now can guarantee the base cases holds etc.
Edit: one a bit contrived example might be an Order. If you have one big model tracking the lifetime of an order, some fields will have to be null until they have a value. For instance a field sent_at_time. Then you have a function that generates an email to the customers, which uses that field. How do you handle that field possibly being null? Do you say "I know that at this point in time, this field is always set since the email is generated after sending" and tell the compiler to ignore the possibly null value? Or do you handle the possible null with some error handling / fallback value, that in practice is dead and cluttered code and will just confuse a future developer seeing it with questions like "how can this happen"?
With the lessons from dynamic programming in mind, I think both are bad solutions. You want to be able to go from state n to n+1 with safety (order purchased, ordered sent, email sent). So model it in a way that the email sending code only ever can get an order in the correct previous state as input. Then if the assumption were to break in the future (maybe some orders are digital and the email is sent without the order having been physically sent), that breaking assumption will immediately be obvious in how the state/models are composed, and not in runtime when either the not-null-assert fails or the emails start having the fallback value.
https://fsharpforfunandprofit.com/posts/designing-with-types... https://inside.java/2024/06/03/dop-v1-1-illegal-states/
IMO it's doesn't have much relation with dynamic programming though.
I'll report back once I find the persons involved.
I remember getting an "aha" moment, writing a program, and then submitting (scored 100% too!). Then, I met a guy who also cracked the problem and realized that he just needed to paste the test data into a spreadsheet, do some basic sorting there, and then paste the output into a text file; no coding necessary.
I felt pretty stupid afterwards.
2. Naming things
3. Off-by-one errors
[Welcome to Coffee Talk, with Linda Richman. Todays topic: Wave Function Collapse is neither quantum physics, particles, waves, nor collapsing functions. Discuss amongst yourselves, while I am constrained to solve all these Sudoko puzzles. Oh, I am so verklempt!]
https://news.ycombinator.com/item?id=42749675
DonHopkins 6 months ago | parent | context | favorite | on: Generating an infinite world with the Wave Functio...
Ha ha, great comment! You're right, WFC's name just refers to quantum mechanics because it sounds cool, not because the algorithm has anything to do with physics, waves, or collapsing functions (it's more just a constraint based Sudoko solver). But I perceive it more as a playful buzzword for a useful algorithm, and not as maliciously deceptive like Deepak Chopra, who I was just commenting on recently (Quantum was cool until Deepak Chopra ruined it for everybody):
https://news.ycombinator.com/item?id=42714477
I do think it's useful combined with other techniques like procedural programming, cellular automata, Voronoi diagrams / jump flooding, noise, etc, to steer the higher and lower level structures. Check out Maxim Gumin's work on Markov Junior!
https://twitter.com/ExUtumno/status/1531992699997507586
Maxim Gumin @ExUtumno: I published a new project about combining rewrite rules and solving constraint problems made of rewrite rules
https://github.com/mxgmn/MarkovJunior
I think the really nice property of WFC is that it's very naturally and easily artist-driven. It doesn't require programming skills to use, you just supply it with a bunch of concrete before and after examples. (Like "Programming by Example".) That makes it much easier for a wide range of people to use, which is an important feature for democratizing custom world generation. Especially when you can construct or discover the examples in-world, and have it operate kind of like a 2-d tile or 3-d cube auto-complete, that looks at the rest of the stuff you've built or seen, like Canvas does with 1-d code.
https://news.ycombinator.com/item?id=42751176
DonHopkins 6 months ago | parent | context | favorite | on: Generating an infinite world with the Wave Functio...
The difference between Deepak Chopra's abuse of Quantum Physics terminology and WFC's is that WFC actually works and is useful for something, and its coiner publishes his results for free as open source software and papers, so he deserves more poetic license than a pretentious new-age shill hawking books and promises of immortality for cash like Deepak.
Here are some notes I wrote and links I found when researching WFC (which is admittedly a catchier name than "Variable State Independent Decaying Sum (VSIDS) branching heuristics in conflict-driven clause-learning (CDCL) Boolean satisfiability (SAT) solvers"):
https://donhopkins.com/home/wfc-notes.txt
Here are some notes I wrote and links I found when researching Wave
Function Collapse (WFC). -Don Hopkins
Wave Function Collapse
Maxim Gumin
Paul Merrell
https://paulmerrell.org/research/
https://paulmerrell.org/model-synthesis/
Liang et al
Jia Hui Liang, Vijay Ganesh, Ed Zulkoski, Atulan Zaman, and
Krzysztof Czarnecki. 2015. Understanding VSIDS branching heuristics
in conflict-driven clauselearning SAT solvers. In Haifa Verification
Conference. Springer, 225–241.
WaveFunctionCollapse is constraint solving in the wild
https://escholarship.org/content/qt1f29235t/qt1f29235t.pdf?t=qwp94i
Constraint Satisfaction Problem (CSP)
Machine Learning (ML)
[...lots more stuff...]Even more fun stuff about WFC, AgentSheets, and defining cellular automata rules by example:
https://news.ycombinator.com/item?id=42749578
And now about "Extreme Learning Machines" and "Liquid State Machines":
https://news.ycombinator.com/item?id=42750857
DonHopkins 6 months ago | parent | context | favorite | on: Generating an infinite world with the Wave Functio...
Like calling random vector functional link networks and single-layer feed-forward networks with random hidden weight an "Extreme Learning Machine".
https://en.wikipedia.org/wiki/Extreme_learning_machine#Contr...
>Controversy
>There are two main complaints from academic community concerning this work, the first one is about "reinventing and ignoring previous ideas", the second one is about "improper naming and popularizing", as shown in some debates in 2008 and 2015.[33] In particular, it was pointed out in a letter[34] to the editor of IEEE Transactions on Neural Networks that the idea of using a hidden layer connected to the inputs by random untrained weights was already suggested in the original papers on RBF networks in the late 1980s; Guang-Bin Huang replied by pointing out subtle differences.[35] In a 2015 paper,[1] Huang responded to complaints about his invention of the name ELM for already-existing methods, complaining of "very negative and unhelpful comments on ELM in neither academic nor professional manner due to various reasons and intentions" and an "irresponsible anonymous attack which intends to destroy harmony research environment", arguing that his work "provides a unifying learning platform" for various types of neural nets,[1] including hierarchical structured ELM.[28] In 2015, Huang also gave a formal rebuttal to what he considered as "malign and attack."[36] Recent research replaces the random weights with constrained random weights.[6][37]
But at least it's easier to say, rolls off the tongue smoothly, and makes better click bait for awesome blog postings!
I also love how the cool buzzwords "Reservoir Computing" and "Liquid State Machines" sounds like such deep stuff.
https://news.ycombinator.com/item?id=40903302
>"I'll tell you why it's not a scam, in my opinion: Tide goes in, tide goes out, never a miscommunication." -Bill O'Reilly
How about rebranding WFC as "Extreme Liquid Quantum Sudoko Machines"? ;)
Then there's "Crab Computing"!
https://news.ycombinator.com/item?id=42701560
[...] If billiard balls aren't creepy enough for you, live soldier crabs of the species Mictyris guinotae can be used in place of the billiard balls.
https://web.archive.org/web/20160303091712/https://www.newsc...
https://www.wired.com/2012/04/soldier-crabs/
http://www.complex-systems.com/abstracts/v20_i02_a02.html
Robust Soldier Crab Ball Gate
Yukio-Pegio Gunji, Yuta Nishiyama. Department of Earth and Planetary Sciences, Kobe University, Kobe 657-8501, Japan.
Andrew Adamatzky. Unconventional Computing Centre. University of the West of England, Bristol, United Kingdom.
Abstract
Soldier crabs Mictyris guinotae exhibit pronounced swarming behavior. Swarms of the crabs are tolerant of perturbations. In computer models and laboratory experiments we demonstrate that swarms of soldier crabs can implement logical gates when placed in a geometrically constrained environment.
https://www.futilitycloset.com/2017/02/26/crab-computing/
And of course "Moveable Feast Machines":
https://news.ycombinator.com/item?id=15560845
https://news.ycombinator.com/item?id=24157104
https://news.ycombinator.com/item?id=34561910
The most amazing mind blowing MFM demo is Robust-first Computing: Distributed City Generation:
You'd hardly see people rallying behind concepts like pure functions or clean code if they were called brown functions or moist code.
For example let's look at the fibonacci sequence, F(n) = F(n-1) + F(n-2), naively you just fill in a (1 dimensional) table with the results [1, 1, 2, 3, 5, 8, 13...] and it feels static because there's nothing to choose or decide.
In contrast for example the longest common subsequence looks like this:
lcs[m, n] = 0 if m = 0 or n = 0
lcs[m, n] = lcs[m-1, n-1] + 1 if X[m] = Y[n]
lcs[m, n] = max(lcs[m-1, n], lcs[m, n-1]) if X[m] != Y[n]
At every step there's a "choice" to be made depending on the situation. The functions like "max()" are quite typical for dynamic programming. At least this is how I interpret the "dynamic" part.I still have no idea why it's called "dynamic".
Neither did I – until I read TFA :)
Our industry has a bad habit of this, where we name things in such a way that make them harder to explain.
See for example one of my other pet peeves: "lambdas". Sounds fancy and difficult! Sounds like a concept you may read about in a research paper! The reality is that it's really quite simple, and we do ourselves a disservice when we name things like this.
According to wikipedia: https://en.wikipedia.org/wiki/Dynamic_programming#History_of...
The word dynamic was chosen by Bellman to capture the time-varying aspect of the problems, and because it sounded impressive.The word programming referred to the use of the method to find an optimal program, in the sense of a military schedule for training or logistics. This usage is the same as that in the phrases linear programming and mathematical programming, a synonym for mathematical optimization.
Also, Harold J. Kushner stated in a speech that, "On the other hand, when I asked [Bellman] the same question, he replied that he was trying to upstage Dantzig's linear programming by adding dynamic.
I ended up solving a problem using DP a couple of minutes before the deadline, which was enough to get us to 3rd and to ICPC. Fun times.
Trades memory for CPU when it's worth it. Dynamic programming takes the same idea but does it at runtime.
https://m.youtube.com/watch?v=oBt53YbR9Kk
Yes, it is 5 hours long. At least 4 of those are worth it.
Many people confuse caching with DP. I’ve had that conversation too often. I think it’s down to the memoization examples people choose being too toy and just looking like caching. Caching is global shared state, whereas memoization can be scoped to the current calculation and still be useful.
But they always skip over tabulation, which is where I believe DP distinguishes itself.
Caching is global shared state. It brings nearly every problem that global shared state brings with it. If you conflate the two you miss all of the advantages of DP and “just use caches”.
Memoization in the scope of DP (some languages misuse the word in their APIs) is acknowledging that a problem is self recursive and eliminating the duplicate work by either inverting the problem or storing previously calculated values. These are usually call graph scoped. Nobody else sees the value. For instance in the case of solving Fibonacci with iteration, you need only remember the previous two values, which you store in registers. There is no table. There is no O(n) storage space.
Again, tabulation is when the contrast becomes stark. You’re creating an array where the relationship between the values contains information that makes the calculations cheap.
As a concrete example, my coworker was trying to walk a DAG with many leaf nodes that were repeated (think dependency tree). He used a cache to do this. It had bugs (2 I knew, but 6 I found). It was one of three large calculations we made at startup, and it should have been the least of these. But it ate memory like candy and added a lot to our start time which was slowing down deployments.
So I replaced it with a DP solution that just looked like normal code, reducing startup memory by ~80 MB and start time by half. By walking the tree and making a list of all the nodes in dependency order and then processing them once, after the list was compiled. And for the microservices and dev tools where we ran this same code the startup time became negligible.
Part of the problem is that the library we used did not like being called recursively, so you had to dance around reentrancy bugs and that cost time and space. But a linearized run has no recursion.
Edit to add: I realize now that I’ve left the conclusion ambiguous.
His two main sins were one, that the cache lived beyond the useful lifetime, and two, that even with the cache some amount of post-processing was required. By unrolling the tree and running each calculation once I reduced the post processing by about 3/4 and climbing as the tree grew in width and height.
SOME_FUN = dict(base_cases)
def some_fun(a, b, c):
if (a,b,c) in SOME_FUN: return SOME_FUN[(a,b,c)]
result = some_fun(a - 1, b - 1, c) + some_fun(a, b, c-1) # or whatever the recursive calls would be
SOME_FUN[(a,b,c)] = result
return result
That's it. Some languages (Python, Lisp) make it easy to take the naive version and wrap it (Python decorators are useful here) so that you don't have to write that code above. That's what the @cache decorator in functools does for you. You can memoize the naive version without needing to actually write a specific memoized version.> In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts (and for purposes other than speed gains), such as in simple mutually recursive descent parsing.[1] It is a type of caching, distinct from other forms of caching such as buffering and page replacement. In the context of some logic programming languages, memoization is also known as tabling.[2]
It includes citations so I won't bother repeating them. Memoization is caching, there is no confusion about this in the literature.
The distinction I am making is that caching is not memoization.
> It is a type of caching, distinct from other forms of caching
Distinct. As in not to be confused with.
Even your quote doesn't support your quibbling. It says that memoization is distinct from other forms of caching. Which does not mean that it is not a form of caching.
EDIT: Removed a particularly rude bit.
Yeah, dogs are great. But so are cats, and llamas.
Note that in the case of memoization, the guarantee of determinism (and associated lifespan) is attached to the function, not the caching structure. Unlike some other caching approaches that rely on a replacement policy.
I can assure you I'm not.
There are absolute claims to make, they just aren't the ones you're making. Determinism is an absolute claim of memoization. "Local rather than shared state" is simply not, no matter how many time you repeat it is.
Excluding shared state from memoization implies that you can't rely on the technique in parallel scenarios, which is just not true. I can think of more than a handful of DP problems where you can further optimize with parallelism if the function is modified to rely on shared state memoization. For example, the slew of problems that rely on the tabulation to simply fetch one value in the "previous row" and the value at the "previous column" of the current row. In practice, instead of iterating through rows and waiting to completely process them one at a time, you could simply start processing another row in parallel once you have just enough of the requisite "previous" data. This is absolutely valid DP. Some DP solutions even receive their tabulation. How does that not qualify as global?
If you thought "memoization is never shared", then in practice, not only your possibly very deterministic functions would end up solving the same problems over and over each time they're called, you would also deny yourself the opportunity of tremendous optimization techniques given by a guarantee of determinism.
Determinism is key. Not shared state.
This is the beginning of my thesis. Without this no other conversation is productive.
The problem with the DP space in specific and coding in general is that a substantial fraction of 'everyone' hears 'memoization' and immediately substitutes caching into their thought processes, "oh that's just caching, got it," and then believes they are productively contributing to the followup conversation when really all they are doing is confusing everyone else and themselves.
That's my beef, frustration, and frankly, institutional trauma. Equating them in your brain makes everything you say from them on practically useless.
You have a fair point about parallelism, but even here the sharing you illustrate is between parallel sub-problems, the 'sub' is IMO doing all of the work in that sentence. Versus me caching a header because you and I have the same account settings, only there's a bug because it's now showing you my unread message status due to incorrect sharing.
> *This Haskell fan’s contender is “dynamic typing” :P
Nothing is impossible!
You can find defenders of dynamic typing, but dynamic scope is now widely seen as a mistake. Or at least dynamic scope by default; it has specialised uses--and in Haskell the Reader Monad is basically isomorphic to dynamic scoping and no one complains about it.
Right you are. Even more horrible. The tcl language still has it!
Perl's the only non-lisp I'm aware of that does dynamic binding well-enough to get the idea, but too many things just don't work:
local *int = sub { return 69 }; print int("42");
is a particularly annoying one: You just can't do it. Other names might have better chances, e.g. package foo; sub int { 42 };
package main; sub a{local *foo::int = sub { return 69 };&b(@_);} sub b{goto \&foo::int};
print b("x"), a("x"), b("x"), a("x");
but this is a little unsatisfying.Well, Haskell does dynamic binding in the form of the Reader Monad. It's well received there.
But yeah, by that standard, Haskell implements both dynamic scoping and mutable state via getters and setters. (Though, of course, you can go all out on these via lenses.)
And yes, dynamic scopes can be useful in specific cases. What's wrong is having your variables scoped dynamically by default.
And they are all much more enticing than the things deserve.
'Idiot' and 'retard' didn't start out as pejorative either.
I prefer "cached recursion".
We had a different characterisation in one of our mathematical optimisation classes: dynamic programming is basically every problem that isomorphic to finding the longest path in a graph, with suitable 'overloading' of 'maximum' (for picking among multiple alternative paths) and 'addition' (for combining multiple sub-paths).
First, to illustrate what I mean by this overloading: matrix multiplication usually has multiplication () and addition (+). However, in the min-plus-algebra you overload (+) with minimum and () with plus, and then multiplying matrices becomes equivalent to hopping two paths in the adjacency matrix of a graph. (Sorry, if this is a bit confusing.) Specifically, taking an adjacency matrix A and calculating A* := 1 + A + AA + AAA + ... is equivalent to taking the transitive closure of edge hopping in your graph, ie the shortest paths between all pairs of vertices.
If you overload (+) with maximum and (
) with plus instead, you can find longest paths. For that to make sense, you should not have cycles in your graph.Now let's go back to dynamic programming: the shared optimal sub-structure property of dynamic programming is exactly the same as what you have in finding longest paths.
The structure might sound overly restrictive, but you'll find that it works for all (most?) examples of dynamic programming.
P.S. My recollection was a bit vague, so I asked ChatGPT for some help, and apparently I wasn't completely wrong: https://chatgpt.com/share/687de24c-c674-8009-9984-0fda56d1c1...
P.P.S. Despite everything I said above, I agree that 'cached recursion' is still a better name than dynamic programming.
The memoized recursive function is an implementation methodology on a digital computer that specifically side-steps the question of in which order one should do the tabulation. I would argue that recursive memoization is a possible solution technique to dynamic programming problems (defined as the universe of problems that admit a dynamic programming solution,) but strictly speaking, in and of itself, is not dynamic programming.
Memoisation is a more general tactic that hands more of the busy work over to the computer.
1) the recursion solution is often too slow, but uses little memory.
2) the memoization solution can make the algorithms from 1 a lot faster, but blows up memory use.
3) the dynamic programming solution only keeps the previous partial solutions in memory that will be needed for future partial solutions. Therefore it is the "Minimal Memory Memoized" solution. It often requires a smart ordering of partial solutions that allows for the earliest cache eviction.
Your "cached recursion" sounds like number 2 to me, and the crux about dynamic programming is to figure out when to remove an entry from your cache.
Can you give an example of this? I don't think that is correct. But we might be saying the same thing, because in practice by the time you achieve minimal memory, you will have a bottom-up solution.
Is your comment that this is higher than the memory requirement of the recursive solution, because that doesn't seem correct to me? The recursive solution also has O(N) memory, which is the depth of the tree it is searching.
If you go bottom up, you don't need any memoization. You just keep the paths and compare them, discarding the ones that are guaranteed to be suboptimal. This way initially, you will have N paths of length 1 (the leaves) and each step up you will have -1 path to keep, where the paths grow by 1 element.
If you go recursively from the top, then you don't know anything about the lower levels and thus must "fork" into left and right branch at each step, until you reach the bottom, from which you start filling the cache with the results of subproblems.
Of course one problem of using a global cache is, that parallelizing this becomes ugly and requires a mutex, while in contrast with the bottom up idea you can cleanly separate, because you don't need any global state at all.
I didn't say that recursion and caching are the opposite of dynamic, I said they are essentially orthogonal concepts to it.
Generally speaking, dynamical systems are systems that have some sort of dependence on time [1]. In dynamic programming we have an optimization/decision problem in which time plays an essential role. The optimality is measured with respect to the whole time evolution. Bellmann showed that you can break this intertemporal optimization down into individual time steps.
So the key observation of dynamic programming is how to turn an "all times at once" problem into a "one time at a time" problem.
The reason that Hamilton's name shows up next to Bellmann here is that physicists and mathematicians (Lagrange, Hamilton, Jacobi) figured out that you can do the inverse: You can write the dynamical equations of physics as an intertemporal (all times at once) optimization problem. This has been one of the most influential ideas in theoretical physics _ever_, especially after Noether's theorems leveraged this structure to connect symmetries and conserved quantities. In many fields of theoretical physics you no longer write down differential equations to model a dynamical system, you write down a "Lagrangian", and mean that your system follows those trajectories that minimize the integral of the Lagrangian over all time. So the central ideas of dynamic programming are extremely tightly related to physics and dynamical systems.
[1] https://en.wikipedia.org/wiki/Dynamical_system
Edit: After reading a bit more I think I understand why people focus on this. The algorithmic examples with shortest paths sort of show the point. To me the core point of Dynamic Programming is before you start thinking about recursion or memoization. It's when you establish that you have optimal substructures:
https://en.wikipedia.org/wiki/Optimal_substructure
what is dynamical is that the subproblems depend on each other, in the way that what happens at time t+1 depends on what happens at time t, or, more to the point, what is the optimal decision at time t depends on what the optimal decision at time t+1 is. If your subproblems are ordered by time, then of course you don't need recursion to solve them, you just iterate over time steps, hence my confusion.
If we set aside the fact that the term "dynamic" was apparently chosen here basically because it sounded cool, and for no good reason, and are trying to retroactively redefine it in a way where the word has meaning, then:
With recursion the computation order is predefined and deterministic.
With dynamic programming the computation order is more flexible since the core of the technique is just memoizing prior work. Your DP solution could be organized in recursive top-down fashion, or more optimally in bottom-up fashion if the algorithm is strictly recursive by nature, or neither of the above.
i.e. With dynamic programming in general you are just reusing results if/when they exist, else calculating them - it a more "go with the flow" dynamic approach than recursion where the order of calls is strictly prescribed.
Also: the core of dynamic programming is breaking down a global optimization problem into many interacting local pieces. The Bellmann equation, which is the core mathematical result of dynamic programming, features neither recursion nor memoization.
It reminds also of "blackboard-based" AI systems where a bunch of cooperating agents with different capabilities jointly work on a problem by posing sub-problems and sharing them on a common blackboard where other agents can grab them and post a solution back on the board.
As per Wikipedia, that origin story is somewhat disputed:
"According to Russell and Norvig, the above story "cannot be strictly true, because his first paper using the term (Bellman, 1952) appeared before Wilson became Secretary of Defense in 1953." Also, Harold J. Kushner stated in a speech that, "On the other hand, when I asked [Bellman] the same question, he replied that he was trying to upstage Dantzig's linear programming by adding dynamic. Perhaps both motivations were true."
This said, the "programming" part is pretty accurate, but IMO the author misses the point with what "computer programming" meaning is, afterall.
Even though I knew it wasn't referring to programming I did wonder why it was called that, interesting!
That reminds me of, much later, getting my first basic programming book. In communist Romania. It was titled "Let's learn interactive microelectronics" because "programming" books were considered a waste of resources and did not get approved.
The only other use of "program" I remember from back then is "tv program".
Also I don't think Soviet examples help because I don't think the last few soviet dictators before the iron curtain's fall were against computers, while Ceausescu definitely was.
[1] https://www.reddit.com/r/learnprogramming/comments/1ac9zbl/d...
A program can use dynamic programming to dynamically figure out the programme required to perform a computation.
https://en.wikipedia.org/wiki/Gosling_Emacs
It computed the minimal cost path through a cost matrix of string edit operations (the costs depended i.e. on the number of characters to draw, length of the escape codes to insert/delete lines/characters, padding for slow terminals, etc).
The algorithm used is a dynamic programming one, where
M[i,j] = MIN (M[i-1,j]+dcost, # UP, implies delete
M[i,j-1]+icost+redraw cost, # LEFT, implies ins+redraw
M[i-1,j-1]+rewrite cost) # BACK, implies rewrite
Each terminal type would configure the display driver according to its supported escape codes and screen update times (some terminals were REALLY SLOW inserting and deleting lines or even scrolling, and you had to send null padding to wait for it to finish).It's infamous both for its skull-and-crossbones warning comment [designed by Brian Reed] and correspondingly poisonous complex code, and also RMS's battle with UniPress software, incorporating it into Gnu Emacs, getting threatened by UniPress, then rewriting it from scratch.
https://features.slashdot.org/story/13/01/06/163248/richard-...
>Q: Give me your best hack. [...]
>RMS: I can't remember all the hacks that I was proud of, so I can't pick the best. But here's something I remember fondly. The last piece of Gosmacs code that I replaced was the serial terminal scrolling optimizer, a few pages of Gosling's code which was proceeded by a comment with a skull and crossbones, meaning that it was so hard to understand that it was poison. I had to replace it, but worried that the job would be hard. I found a simpler algorithm and got it to work in a few hours, producing code that was shorter, faster, clearer, and more extensible. Then I made it use the terminal commands to insert or delete multiple lines as a single operation, which made screen updating far more efficient.
There's some more discussion about it here that ended up being pretty accurate once all was said and done:
https://www.reddit.com/r/emacs/comments/bek5b2/til_emacs_was...
Emacs's infamous "Ultra-hot screen management package" with its "Skull and Crossbones" warning was definitely black magic!
The algorithm worked great and was well worth it over a 300 / 1200 / 2400 baud connection to a Vax 780 / 750 / etc, and as modems and computers got faster it was still useful, but at today's network bandwidths and cpu speeds it's thunderous overkill.
https://news.ycombinator.com/item?id=38996713
A redisplay algorithm, by James Gosling (ACM SIGPLAN Notices, April 1981):
https://donhopkins.com/home/documents/EmacsRedisplayAlgorith...
>7. Acknowledgements: The people who did the real work behind this paper are Mike Kazar, Charles Liescrson and Craig Everhart; all from CMU.
>Bibliography: 1. Kevin Q. Brown. Dynamic Programming in Computer Science. CMU, February, 1979.
That code was also used in Maryland Windows (a text based overlapping/tiled window system developed at the University of Maryland by Chris Torek, like Emacs - Text Editor + Window System, kind of like "screen" or "mux" or "mgr").
https://donhopkins.com/home/archive/emacs/mw/display.c
https://en.wikipedia.org/wiki/Dynamic_programming
https://wiki.c2.com/?DynamicProgramming
https://en.wikipedia.org/wiki/ManaGeR
https://news.ycombinator.com/item?id=22849522
>Gosling Emacs was especially noteworthy because of the effective redisplay code, which used a dynamic programming technique to solve the classical string-to-string correction problem. The algorithm was quite sophisticated; that section of the source was headed by a skull-and-crossbones in ASCII art, warning any would-be improver that even if they thought they understood how the display code worked, they probably did not.
https://donhopkins.com/home/archive/emacs/mw/display.c
/* 1 2 3 4 .... Each Mij represents the minumum cost of
+---+---+---+---+----- rearranging the first i lines to map onto
1 | | | | | the first j lines (the j direction
+---+---+---+---+----- represents the desired contents of a line,
2 | | \| ^ | | i the current contents). The algorithm
+---+---\-|-+---+----- used is a dynamic programming one, where
3 | | <-+Mij| | M[i,j] = min( M[i-1,j],
+---+---+---+---+----- M[i,j-1]+redraw cost for j,2
4 | | | | | M[i-1,j-1]+the cost of
+---+---+---+---+----- converting line i to line j);
. | | | | | Line i can be converted to line j by either
. just drawing j, or if they match, by moving
. line i to line j (with insert/delete line)
*/
Trivia: That "Skull and Crossbones" ASCII art is originally from Brian Reid's Scribe program, and is not copyrighted.https://donhopkins.com/home/archive/emacs/skull-and-crossbon...
/-------------\
/ \
/ \
/ \
| XXXX XXXX |
| XXXX XXXX |
| XXX XXX |
\ X /
--\ XXX /--
| | XXX | |
| | | |
| I I I I I I I |
| I I I I I I |
\ /
-- --
\-------/
XXX XXX
XXXXX XXXXX
XXXXXXXXX XXXXXXXXXX
XXXXX XXXXX
XXXXXXX
XXXXX XXXXX
XXXXXXXXX XXXXXXXXXX
XXXXX XXXXX
XXX XXX
**************
* BEWARE!! *
**************
All ye who enter here:
Most of the code in this module
is twisted beyond belief!
Tread carefully.
If you think you understand it,
You Don't,
So Look Again.
But if you're not trying to support old terminals (but might still have a slow "thin wire" network connection), there is an orders of magnitude better approach: The Emacs NeWS display driver (for both UniPress and Gnu Emacs) downloaded PostScript code to define an efficient application specific network protocol with instantaneous local input tracking and feedback (unlike how X-Windows uses a fixed protocol, but like how AJAX uses JavaScript).The source code to Gosling's UniPress Emacs 2.20 just recently surfaced, and the display code is well commented (still including the skull and crossbones and ascii art diagrams):
https://github.com/SimHacker/NeMACS/blob/main/src/DspVScreen...
And the lower level terminal driver layer:
https://github.com/SimHacker/NeMACS/blob/main/src/DspTrm.c
The NeWS terminal drivers I worked on for UniPress and Gnu Emacs were layered on top of that dynamic programming screen update code. But instead of sending escape codes to a dumb terminal, it downloaded PostScript code into the NeWS server to implement drawing, mouse tracking, text selection feedback, pie menus, tabbed windows, etc, and sent binary PostScript tokens back and forth (a practice now called "AJAX" for X = XML or JSON text instead of binary PostScript data and code):
TrmPS.c: https://github.com/SimHacker/NeMACS/blob/main/src/D.term/Trm...
TrmPS.cps: https://github.com/SimHacker/NeMACS/blob/main/src/D.term/Trm...
PostScript stuff for Emacs: https://github.com/SimHacker/NeMACS/tree/main/ps
Emacs 2.20 Demo (NeWS, multiple frames, tabbed windows, pie menus, hypermedia authoring):
https://www.youtube.com/watch?v=hhmU2B79EDU
Here's a brochure from February 1988 about UniPress Emacs 2.20 and "SoftWire" (NeWS without graphics, kind of like Node with PostScript instead of JavaScript).
What is Emacs: https://www.donhopkins.com/home/ties/scans/WhatIsEmacs.pdf
I also worked on the Gnu Emacs 18 NeWS display driver (supporting a single tabbed windows and pie menus in The NeWS Toolkit 2.0):
tnt.ps: https://www.donhopkins.com/home/code/emacs18/src/tnt.ps
tnt.cps: https://www.donhopkins.com/home/code/emacs18/src/tnt_cps.cps
tnt.c: https://www.donhopkins.com/home/code/emacs18/src/tnt.c
tnt-win.el: https://www.donhopkins.com/home/code/emacs18/lisp/term/tnt-w...
More on the NeWS versions of Emacs with links to code here:
https://en.wikipedia.org/wiki/Wagner%E2%80%93Fischer_algorit...
The name "dynamic programming" was chosen to impress bureaucrats, and AI is where it is at today.
"Subtask training" maybe, as the idea is to train the algorithm on smaller tasks to to make it more efficient at solving larger tasks. Ok, it is just storing intermediate results, but it sounds more AI-ish like that.
Bottom-up memoization?
Iterative memoization?
We have an interpreted language in which you can call "subroutines" by name. But you can build the name in you program "on the fly", for example out of data.
It's called programming because programming means scheduling. And most of the problems the field initially concerned itself with were logistics scheduling problems.
Computer programming is called programming because it involves scheduling instructions.
Dynamic optimization makes just as much sense. Same as Mathematical optimization does for the wider field of study.
Then, how much fuel should the pilot buy at each stop to minimize expected cost for the trip? I.e., in simple terms, buy extra fuel where it's cheap and carry (tanker) it to where it's expensive but in so doing burn extra fuel from the extra weight of the extra fuel.
Solution: Stochastic dynamic programming.
Hint: In Canada, the solution is illegal!
Was meeting with Nemhauser (see references below); my cab back to the airport was waiting; and he gave me a 15 second lecture. On the plane thought about his lecture, ..., and stood for my Ph.D. orals and passed (extra credit for guessing my employer).
Some references (with TeX markup):
Stuart E.\ Dreyfus and Averill M.\ Law, {\it The Art and Theory of Dynamic Programming,\/} ISBN 0-12-221860-4, Academic Press, New York.\ \
Dimitri P.\ Bertsekas, {\it Dynamic Programming: Deterministic and Stochastic Models,\/} ISBN 0-13-221581-0, Prentice-Hall.\ \
George L.\ Nemhauser, {\it Dynamic Programming,\/} ISBN 0-471-63150-7, John Wiley and Sons, New York\ \
E.\ B.\ Dynkin and A.\ A.\ Yushkevich, {\it Controlled Markov Processes,\/} ISBN 0-387-90387-9, Springer-Verlag, Berlin.\ \
Wendell H.\ Fleming and Raymond W.\ Rishel, {\it Deterministic and Stochastic Optimal Control,\/} ISBN 0-387-90155-8, Springer-Verlag, Berlin.\ \
Michael Athans and Peter L.\ Falb, {\it Optimal Control:\ \ An Introduction to the Theory and Its Applications,\/} McGraw-Hill Book Company, New York.\ \
Dimitri P.\ Bertsekas and Steven E.\ Shreve, {\it Stochastic Optimal Control: The Discrete Time Case,\/} ISBN 0-12-093260-1, Academic Press, New York.\ \
E.\ B.\ Lee and L.\ Markus, {\it Foundations of Optimal Control Theory,\/} ISBN 0471-52263-5, John Wiley, New York,
What’s interesting in terms of the theoretical properties and classification of a problem is the recursive nature with overlapping recursion.
https://en.wikipedia.org/wiki/Dynamic_programming#Fibonacci_...
If you need to solve one problem, bottom-up is often most efficient. But if you need to solve many related problems (say trialing many different initial conditions) then you may want the memoized solution.
On that note, I really struggle to understand what's special about "bottom up" versus "top down" DP or what categorizes it as unique from any other algorithms.
Someone posted an Euler problem with finding the maximal sum along any path (binary choice at each step) down a pyramid. The bottom-up solution only ever generates a "cache" of row elements (starting the row count from 1). The 100th row has 100 elements. When you go up to the 99th row you only need 99 elements by the end of calculating it, and so on. This is more efficient in memory than the n*(n+1)/2 (approx.) elements you'd need doing a top-down recursive, memoized version. Though in that case it's not awful on modern computers, it is a consideration for complex and large problems.
The bottom-up version also lends itself to not needing to bring the entire pyramid into memory, only one row + the cache at a time so you only have |row| + |row| + 1 items in memory at any given time (plus some temp variables or index variables). Again, it can be very memory efficient. Technically, you don't even need the entire current row in memory, only one element of it and then read each new element as you step across the row which further reduces the memory overhead.
With regard to building a full table, I realize Fib is a poor example in general but it is illustrative and fits in a comment. Let's say you memoize the naive version, then you have to include this check either before the recursive calls or at the start of each call:
if n in FIB: # either return if at top of the function, or retrieve rather than make the recursive call
This is unneeded if you build the table bottom-up: def fib_bottom_up(n):
FIB = {1: 0, 2: 1}
i = 3
while i <= n:
FIB[i] = FIB[i - 1] + FIB[i - 2]
return FIB
Bottom-up construction of the table skips those checks which can add up for larger table constructions, or if you're building up many different tables. Consider a more complex optimization problem where different initial parameters result in different tables being generated (but by the same method). This bottom-up approach will be more efficient than the top-down approach.Of course, a benefit for top-down is when an input only needs some subset of the table. If the generated table will be sparse, but may not be easily constructed in a sparse manner bottom-up, then top-down can be better.
https://github.com/duckdb/duckdb/blob/61f07c3221e01674d8fe40...
Richard Bellman on the Birth of Dynamic Programming (2002) [pdf] - https://news.ycombinator.com/item?id=42482289 - Dec 2024 (9 comments)
Dynamic programming is not black magic - https://news.ycombinator.com/item?id=38988948 - Jan 2024 (202 comments)
Dynamic Programming vs. Divide-and-Conquer (2018) - https://news.ycombinator.com/item?id=26930667 - April 2021 (115 comments)
Real-world dynamic programming: seam carving - https://news.ycombinator.com/item?id=20285242 - June 2019 (66 comments)
Graphical Introduction to Dynamic Programming - https://news.ycombinator.com/item?id=19682258 - April 2019 (11 comments)
Dynamic Programming for Technical Interviews - https://news.ycombinator.com/item?id=19395578 - March 2019 (218 comments)
Solving dynamic programming interview problems - https://news.ycombinator.com/item?id=17102604 - May 2018 (164 comments)
Dynamic Progamming: First Principles - https://news.ycombinator.com/item?id=15545686 - Oct 2017 (98 comments)
What is difference between memoization and dynamic programming? - https://news.ycombinator.com/item?id=13281884 - Dec 2016 (1 comment)
Dynamic Programming: The Name (1984) - https://news.ycombinator.com/item?id=12476597 - Sept 2016 (61 comments)
Fibonacci series and Dynamic programming - https://news.ycombinator.com/item?id=5893237 - June 2013 (28 comments)
Dynamic Programming versus Memoization - https://news.ycombinator.com/item?id=4434671 - Aug 2012 (26 comments)
Visualizing dynamic programming - https://news.ycombinator.com/item?id=2243297 - Feb 2011 (11 comments)
Ask HN: Have you ever used Dynamic programming in your dev job? - https://news.ycombinator.com/item?id=1796861 - Oct 2010 (11 comments)
Why the name "dynamic programming"? - https://news.ycombinator.com/item?id=1290397 - April 2010 (25 comments)
The Edit Distance Problem - https://news.ycombinator.com/item?id=571570 - April 2009 (13 comments)
Introduction to Dynamic Programming - https://news.ycombinator.com/item?id=329953 - Oct 2008 (12 comments)
It’s Bellmans autobiography and it’s very good though he doesn’t go as much into depth as I wanted on how he came up with DP and his inequality
You have to understand that Eisenhower believed that the military industrial complex was out of control. He brought Wilson in to help rein it in and reorganize the defense department.
The military and military contractors were conducting "research" in the form of doing chemical and biological weapons experimentation on soldiers and civilians without their knowledge or consent. Wilson was trying to stop that.
And of course many of the think tanks were trying to lead the United States into an offensive war against the Soviet Union, which Wilson and Eisenhower did not want or think was necessary. It's entirely probable that the only reason Bellman got funding to do this work at RAND was because they intended to use it for running their desired war.
For intranet routing however shortest path algorithm based on Dijkstra approach underlies the OSPF and IS-IS protocols.
In the old days distant-vector protocol like Routing Information Protocol protocol or RIP is also used but it's now becoming obsolete and humorously known as Rest In Peace protocol.
Distant-vector now is making a comeback for intranet in the form of mobile ad hoc and mesh networks (guerilla networks for routing without infrastructure) with Ad hoc On-Demand Distance Vector Routing protovol or AODV [2]. This setup can be very beneficial for disaster and also for mobile group or platoon communication.
[1] Distance-vector routing protocol:
https://en.wikipedia.org/wiki/Distance-vector_routing_protoc...
[2] Ad hoc On-Demand Distance Vector Routing:
https://en.wikipedia.org/wiki/Ad_hoc_On-Demand_Distance_Vect...
smokel•6mo ago
In the first it's for optimizing a relatively small deterministic system, in the latter it is for optimizing a stochastic process.
Pretty confusing.
jeltz•6mo ago
smokel•6mo ago
jeltz•6mo ago
There is nothing derived here. It is exactly the same meaning.
smokel•6mo ago
Do you mean to say that these two concepts also have exactly the same meaning?