I have kind of grown to prefer Clojure's loop/recur construct, since it gives you something more or less akin to tail recursion but it doesn't pretend to be actually recursive.
It is true that TCO messes up your stack traces. It is also true that loops mess up your stack traces, because they don't even create a stack trace. In many languages that support TCO, TCO can be turned off for debugging and enabled for production, so Guido's characterization of the issue is ridiculous.
I don't agree that being able to turn off TCO in debugging is enough to call Guido's characterization "ridiculous". I often have to debug running production systems, and those are (hopefully) not deployed with debug compilations.
You're not wrong about loops not creating a stack trace; I guess what bothers me (and maybe Guido) is that with loops there's no expectation of a stack trace, while with a function call there is.
No, because an indirect tail-call eliminates the calling frame. After, it looks like the caller of that function called a function it did not call. In fact, with tail-calls a whole pile of steps gets telescoped down to nothing[1]. This is not the case with a loop, it doesn't jump to itself indirectly. Loops don't jump into the middle of other loops in other functions. We can still follow the chain of control flow from the start of the main function, through (non-tail) calls, i.e. the stack, through the start of the loop, and then from some backedge in the loop body we're looking at.
Tail-calls are absolutely harder to debug in general than loops.
[1] In the limit, if the entire program was transformed into CPS with tail-calls everywhere, the original execution stack is now strewn throughout the heap as a chain of closures.
Notable exceptions to the above are python3 with generators, which I believe truly use O(1) memory with map and fold. Haskell has lists that are lazy by default, but if you fold or map over them, it still "forces the thunk" for each element and thus you still end up using O(N) memory.
Yeah this was a big help when I started F#. Basically "if you're using the rec keyword, you're probably missing something" and hell that even goes for a lot of uses of fold, from the beginners perspective.
I actually like the compromise. I get to write safe functional code while getting all the benefits of a highly optimised iterative operation.
https://pragtob.wordpress.com/2016/06/16/tail-call-optimizat...
It is true that every tail recursive function can be converted into a semantically equivalent loop via a transformation like CPS (which the author mentions). However, for mutually tail-recursive functions, this conversion loses control flow information. This is because after the CPS transformation, calls to the other function become calls to a continuation; this call usually must be implemented as an indirect jump. On the other hand, mutually tail-recursive functions can call each other with direct/statically-known jumps.
This loss of information might appear trivial, but in practice it has some important consequences. Classic examples are interpreter loops. It is well-known that computed gotos can result in modest to large speedups for interpreters [1]. The reason why is that computed gotos create an indirect jump per opcode, so a branch predictor can take advantage of correlations between opcodes. For example, looking at Python disassembly, the header of a standard range for loop compiles down to three opcodes: GET_ITER, FOR_ITER, STORE_FAST in sequence [2]. A branch predictor can recognize that the target of the "FOR_ITER" indirect jump will likely be the "STORE_FAST" instruction pointer; it cannot predict this in the naive implementation where jumps for all instructions are "merged" into a single indirect jump / switch at the top of the loop body. In this case, computed goto is effectively equivalent to a CPS transformation whose closures require no storage on the heap.
Suppose, however, we know even more information about the instruction sequence; for example, we know ahead of time that every FOR_ITER opcode will be followed by a STORE_FAST opcode. We could completely replace the indirect jump with a direct jump to the instruction pointer for the STORE_FAST opcode. Because modern branch predictors are very good, this will have about the same performance in practice as the computed goto loop.
However, consider the limiting case where we know the entire instruction sequence beforehand. If we write our interpreter as many mutually tail-recursive functions, with one function for every instruction, an optimizing compiler can replace every indirect call with a direct (tail-recursive) call to the function that implements the next instruction's opcode. With a good enough optimizer / partial evaluator, you can turn an interpreter into a compiler! This is known as the first Futamura projection [3].
To see an example of this in action, I wrote a prototype of a Brainfuck compiler via the Futamura projection; it uses LLVM as a partial evaluator [4]. The main interesting function is `interpret`, which is templated on the program counter / instruction. That is, `interpret` is really a family of mutually tail-recursive functions which statically call each other as described above. For short Brainfuck programs, the LLVM optimizer is able to statically compute the output of the Brainfuck program. (The one in the Godbolt link compiles to a loop, likely because LLVM does not want to unroll the mutual recursion too much.) You can play around with different Brainfuck programs by modifying the `program` string on line 5.
[1] https://eli.thegreenplace.net/2012/07/12/computed-goto-for-e...
[2] https://godbolt.org/z/rdhMvPo36
[3] https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...
To the contrary, I'd argue that immutability isn't the only alternative to universal mutability: many newer imperative languages (such as Rust) have various forms of controlled mutability, where one must explicitly declare which variables may be modified by imperative assignments.
IME, controlled mutability captures just about all the benefit of immutable languages, without requiring any special data structures or sufficiently-smart compiler analyses to ensure good performance. I've never really understood the desire for tail-recursive versions of iterative algorithms, except for a prior commitment to functional programming.
- V8 JavaScript Engine
- Python
hinkley•2h ago
Tail recursion is meant to fix the latter. But what we mean to happen and what actually happens ain't ever exactly similar.
Tail recursion IME is a bigger foot gun than relying on someone to add a new conditional branch at the end of a block in an iterative algorithm without fucking it up in the process. And iteration responds generally better to Extract Function. And while I can think of counter cases easily enough, in the large iteration is less work and less vigilance. And you cannot scale a project up without the vigilance requirement amortizing basically to 0 per line of code.
tombert•2h ago
At least in theory, a tail recursive call will be converted into a dumb jump, so there shouldn't be a performance penalty, but since from your code's perspective you're passing in the stuff for the next iteration, you can keep using the pretty and easy-to-reason-about structures without creating any kind of mutable reference.
I'm not 100% sold on tail recursion in a broader sense, but at least with Clojure's loop/recur stuff it is kind of cool to be able to keep using persistent data structures across iterations of loops.
weitendorf•1h ago
Of course, this means that the same implementation could also be directly expressed logically in a way that is mutable/iterative.
func pow(uint base, uint n): n == 0 ? return 1 : return n * pow(base, n-1)
is just
func pow(uint base, uint n): uint res = 1; for(i=0; i<n; i++){ res *= n} return res
There is no real "advantage" to, or reason to "sell" anybody on tail call recursion if you are able to easily and clearly represent both implementations, IMO. It is just a compiler/runtime optimization, which might make your code more "elegant" at the cost of obfuscating how it actually runs + new footguns from the possibility that code you think should use TCO actually not (because not all immutable + recursive functions can use TCO, only certain kinds, and your runtime may not even implement TCO in all cases where it theoretically should).
As an aside, in C++ there is something very similar to TCO called copy-elision/return-value-optimization (RVO): [1]. As with TCO it is IMO not something "buy into" or sell yourself on, it is just an optimization you can leverage when structuring your code in a way similar to what the article calls "continuation passing style". And just like TCO, RVO is neat but IMO slightly dangerous because it relies on implicit compiler/runtime optimizations that can be accidentally disabled or made non-applicable as code changes: if someone wanders in and makes small semantic to changes to my code relying on RVO/TCO for performance they could silently break something important.
[0] EXCEPT in practice all implementation differences/optimizations introduce observable side effects that can otherwise impact program correctness or semantics. For example, a program could (perhaps implicitly) rely on the fact that it errors out due to stack overflow when recursing > X times, and so enabling TCO could cause the program to enter new/undesirable states; or a program could rely on a functin F making X floating point operations taking at least Y cycles in at least Z microseconds, and not function properly when F takes less than Z microseconds after enabling vectorization. This is Hyrum's Law [2].
[1] https://en.wikipedia.org/wiki/Copy_elision#Return_value_opti...
[2] https://www.hyrumslaw.com/
gleenn•1h ago
weitendorf•33m ago
That said, just as I'd expect experienced C++ programmers to be able to recognize others' code using RVO and be careful not to restructure things to break it, I'd concede that experienced FP programmers might be unlikely to accidentally break others' TCO. It's just that ad absurdum you cannot expect every developer to be able to read every other developers' mind and recognize/workaround all implicit behavior they encounter.
ramchip•1h ago
tombert•39m ago
`for (var i = 0; i < n, i++)`, for example, requires that `i` mutate in order to keep track of the value so that the loop can eventually terminate. You could also do something like:
There, the reference to `i` is being mutated.For tail recursion, it's a little different. If I did something like Factorial:
Doing this, there's no explicit mutation. It might be happening behind the scenes, but everything there from the code perspective is creating a new value.In something like Clojure, you might do something like
(stolen from [1])In this case, we're creating "new" structures on every iteration of the loop, so our code is "more pure", in that there's no mutation. It might be being mutated in the implementation but not from the author's perspective.
I don't think I'm confusing anything.
[1] https://clojure.org/reference/transients
charcircuit•49m ago
The parameter essentially becomes a mutable reference.
tombert•38m ago
charcircuit•36m ago
tombert•31m ago
If I pass a persistent structure into a tail call, it looks like I'm passing a copy into there, in the same way that I might if I did something like `Math.abs(myValue * 3)`; the compiler converts it to a mutable reference but I see it as the same as passing a value.
weitendorf•31m ago
CalChris•2h ago
pmg101•2h ago
jandrese•2h ago
kylec•2h ago
electroly•1h ago
bjoli•1h ago
Jtsummers•1h ago
In tail recursive algorithms, there is no stack, it's just a straight loop.
Is simply: If it's not tail recursive you introduce a stack, for instance a DFS on a binary tree: Note the insertion order is reversed from the recursive calls in a typical DFS. We want left to be searched first and then its children and then we "recur" back to right and deal with its children, so we need to push right into the stack first and then left.When you have multiple mutually recursive functions (as is likely the case in a recursive descent parser) then things become more complicated, but it's still feasible.
LegionMammal978•1h ago
Jtsummers•1h ago
That's basically what continuations provide. Scheme, SML, and others provide them.
LegionMammal978•46m ago
fellowniusmonk•58m ago
Recursion isn't physically real, any book that teaches the abstraction before explaining either the call stack (for non-TCO recursion) or in the GOTO context is a book actively trying to gatekeeper CS and weed out readers. (Not that any of those old pascal turbo/boreland books from the 90s actually shipped code that compiled.)
I had several people on HN of all places try to "teach me" recursion after this simple line inside a larger comment:
> It's like acting like recursion is physically real (it's not) when it smuggles in the call stack.
Recursion IS real abstractly. It's just not physically real, it was developed before we knew how DNA/RNA encoding handles the same issues in a more performant way.
achierius•38m ago
That was a sharp left turn -- how do you figure DNA/RNA are relevant here? I feel like iteration pre-dates our modern understanding of RNA in particular (though I could be mistaken) so I struggle to imagine how DNA/RNA were particularly informative in this regard.
javcasas•2h ago
When you use one to process the other, you get into trouble. For example, you need to manage a stack to do iteration on your binary tree. Or you need to construct slices to recurse on your arrays.
It's all about ergonomics.
dreamcompiler•1h ago
Is a binary tree recursive from the perspective of a type declaration? Yes.
Is it recursive from the point of view of search? Depends. If you're doing depth-first search, then you need a stack and a recursive algorithm seems natural. If you're doing breadth-first search, then you need a queue and the algorithm is less obviously recursive.
In either case the program could be expressed recursively or as an explicit loop. When a stack is needed, the underlying hardware always provides automatic stack management so recursion feels natural in that case.
When a queue is needed it's more of a tossup since hardware and programming languages don't generally provide automatic queue management, so you're going to need to manage that data structure manually anyway. In which case whether you choose to use recursion or not is more of a personal preference.
javcasas•1h ago
Kinda shows that recursive data structures are usually better dealt with recursion. Worst case you end up with the same level of difficulty as iteration.
busterarm•30m ago
Just look at myriad ways that people implement something like checking win conditions in Rock, Paper, Scissors. Switch statements, conditional hell, etc.
...you can just do something as simple as stick Rock, Scissors, Paper into a circular list and compare to the next element. It's practically readable as plain English. No programming skill required.
If you need to code golf you can use modulo arithmetic, for a different kind of "simple", but then you lose readability.
Spivak•1h ago
busterarm•36m ago
bjoli•1h ago
I just feel that a recursive calls makes state much more clear, but then again I am no fan of mutation in general. In my old python days I think a good 60% of my bugs were due to me being bad at managing state.
[1] https://rikspucko.koketteriet.se/bjoli/goof-loop
iamevn•1m ago