I think it's a better idea to write a function that applies a function to the elements of a tree. Ideally you'd make a function for each traversal order. This makes it obvious what is happening.
map_tree_pre(my_tree, my_func);
map_tree_in(my_tree, my_func);
map_tree_post(my_tree, my_func);
map_tree_bfs(my_tree, my_func);
foreach (Node node in EnumerateNodes(root, x => x != null, x => [x.Left, x.Right]))
where EnumerateNodes uses `yield return` (i.e. is a generator) and calls itself recursively. Though it'd probably be easier / better performance to write an implementation specific to each node type.Encoding the semantics of a tree traversal operator likewise is difficult in the general case. What exactly would the order be, what if I want to traverse in a non-standard ordering, what about skipping branches; all would be difficult to cleanly represent.
I have seen it done where you return actions with key ones being recurse, stop, replace, and replace & then carry out some function, but again, this is pretty simple to implement.
Candidates: Racket, Scheme, Rust.
Defining your own iterator would be enough for most cases.
For example if you literally don’t care about the order then your code can map over a default iterator that is efficient (DFS).
Then, for some cases, depth-first traversal is needed; for others, breadth-first.
Then, there's parallelism, and even the plain old for loops aren't parallel by default.
By the time you specify exactly what you need from a tree traversal, you've written code to do it.
And if you're fine with some default choice — you already can use the default iterator with the for_each loop.
I don't see what need there is for adding an extra for_tree syntax to do that.
Trees are often considered too diverse to include even in the standard library, let alone as a primitive. Even Python doesn't have trees in the standard library. I'm sure it's been proposed and rejected at some point
I hate to be a broken record, but this again goes back to the assumption of a specific sequential computation model. Modelling relations as in SQL automatically supports for N-ary child relations. There are other computation models! eg. logic, relational, etc.
Let's be practical about it: the reason you might want to say map_tree is so it could potentially be optimized to run more quickly, e.g. via automatic parallelization.
But to even parallelize a tree read operation we'd have to pull in other threads, split work, we'd want metadata on each node (child counts), and at the end of the day within each worker you'd still have a serial execution order.
For map_tree to have practical benefit your language would need to make a ton of opinionated choices about trees and parallel programming. Feasible? Yes. Worth making it a basic primitive? Well, even LISPs don't offer map_tree, despite building the whole language on top of car/cdr.
Because "trees" and "graphs" aren't actually single concepts, they're families of concepts that can vary along many dimensions. Most programming languages offer one or a small handful of sequence types, because most sequence use patterns are the same. Most languages don't offer a tree or graph type, and as soon as you try to implement one you see why: there are multiple different kinds of trees and graphs, and multiple different representations that are appropriate for different use cases, and any one you might pick would be unsuitable for most programs people want to write.
I think it's more because the abstract interfaces for trees and graphs that can support multiple representations aren't as well known [1]. An iterator/sequence interface has a simple abstract structure that everyone immediately understands, but the structure needed to abstract over graphs and trees are trickier.
[1] https://www.cs.tufts.edu/~nr/cs257/archive/andrey-mokhov/alg...
Major tools that exist today for partial structure traversal and focused manipulation:
- Optics (Lenses, Prisms, Traversals)
Elegant, composable ways to zoom into, modify, and rebuild structures.
Examples: Haskell's `lens`, Scala's Monocle, Clojure's Specter.
Think of these as programmable accessors and updaters.
- Zippers Data structures with a "focused cursor" that allow local edits without manually traversing the whole structure.
Examples: Huet’s original Zipper (1997), Haskell’s `Data.Tree.Zipper`, Clojure’s built-in zippers.
- Query Languages (for semantic traversal and deep search) When paths aren't enough and you need semantic conditionals:
- SPARQL (semantic web graph querying)
- Datalog (logic programming and query over facts)
- Cypher (graph traversal in Neo4j)
- Prolog (pure logic exploration)
These approaches let you declaratively state what you want instead of manually specifying traversal steps.
The problem is that 'lens', 'monocle', etc. are famously abstract and difficult for people to apply to their actual problems. IMO, the solution would be for standard libraries to specify interfaces called 'BreadthFirstTraverse', 'DepthFirstTraverse', etc.
I think people are often too enamored by general purpose languages that can express such abstractions natively. I don't see an issue with a language that provides this as a primitive without being able to express it itself, constraints can be useful for other properties. Once you can traverse trees, most programming problems can be tackled even in such constrained languages, eg. SQL with CTE.
Optics are famously abstract in implementation, but I don't think people have trouble applying them - people seem to like JQuery/CSS selectors, and insist on `object.field` syntax; it's kind of wild that no mainstream language has a first-class way to pass around the description of a location in an arbitrary data structure.
[1] https://ghc-proposals.readthedocs.io/en/latest/proposals/002...
[2] https://ghc-proposals.readthedocs.io/en/latest/proposals/015...
Account acc = getAccount();
QVERIFY(acc.person.address.house == 20);
auto houseLens = personL() to addressL() to houseL();
std::function<int(int)> modifier = [](int old) { return old + 6; };
Account newAcc2 = over(houseLens, newAcc1, modifier);
These also use templating to get something that still feels maybe a little less ergonomic than it could be, though.[1] https://github.com/graninas/cpp_lenses [2] https://github.com/jonsterling/Lens.hpp
[1] https://www.cs.ox.ac.uk/jeremy.gibbons/publications/iterator...
Think of these as programmable accessors and updaters.
How is iterating through something already not 'programmable' ?
If you read a bit more about them, I think you will be surprised to see the breadth of what these abstractions can be used for. To start, they've been used to build a new compositional foundation for game theory[1], and they can be used to model gradient-based learning in machine learning[2].
As for their simpler applications as getters and setters, they are programmable in the sense that you can think of lens/prism types as interfaces that can be implemented arbitrarily. So you can create your own lenses and combine them like legos to construct new, "bigger" getters and setters from the smaller components.
[1] https://arxiv.org/pdf/1603.04641 [2] https://arxiv.org/abs/2103.01931
EDIT: fixed typo
When does someone give up on the pagentry of programming and just get something done by looping through data instead of "constructing getters and setters to model gradient based machine learning".
It really seems like the straight forward way of doing things is to make simple iteration and worry about the game theory after that is all figured out.
I think your comment is valuable in spirit because it could bring a more thorough discussion of the creation or popularization of a syntactically clean and a shift of the idiomatic-ness of such a solution to traversing trees. Leaving the more abstract and general Optics discussion to be a side dish for commenters to discuss as well.
Also, just a nitpick, but traversing a tree is a broader topic than iteration. Iteration implies, both colloquially and formally, performing a computation on each element of a data structure, while traversal is a more general algorithm that could be the basis of iteration, but may also be more akin to a search through the data structure without expectation of performing computation until the searched for ‘member’ is returned from the traversal. So it’s not an apples-to-oranges comparison with your conflation of iteration and traversal, but does seem a little like Canis Familiaris-to-Mammal comparison.
Definitely can be done to streaming data… protobufs in a way is this, given it’s a sort of BNF from a bird’s eye.
https://wiki.haskell.org/index.php?title=Research_papers/Gen...
For C++, you could define yourself a template that expands to the two functions you listed.
For any language, you could write yourself a pre-processor that adds for_tree notation and expands it, either with pre-processor semantics or working on the abstract syntax tree (which is more "proper" but also more work"). I would recommend the former to test notations, and then you can live with them for a while to experiment with them and see if and how often you really need the construct (recall that is how C++ evolved from C via "C with classes" - it was first a pre-processor). Once you and others are 100% convinced of the new syntax, go to your prefered language's working group/ISO committee and propose inclusion.
My own feeling is that this is not something for which I need more than recursion; calling inside traverse() traverse(left) and traverse(right) for binary trees or using a normal for loop to iterate over all this->children() from 0 to this->children.size() is something that occurs in graph libraries and parsers/compilers once in a while but not often enough to warrant its own notation. Rather, when I look at languages like C++, I'd have a language that is simpler, cleaner, more homogeneous and more orthogonal; C++ is complicated, convoluted, too large to implement it correctly by a single person in reasonable time (compared to beauties like C, Pascal, Scheme), so I stand on the side of minimalism.
Be that as it may, for C++, Eric Neibler's [Boost.Proto](https://www.boost.org/doc/libs/1_84_0/doc/html/proto.html) could likely help conveniently connect syntax to implementation to achieve something similar to what the author is taking about.
# Expects:
# Tuple of iterateOn where iterateOn can be None
def fancyIt(init, *options):
if init != None:
yield(init)
for f in options:
newVal = f(init)
yield from fancyIt(newVal, *options)
class Tree:
def __init__(self, val, left = None, right = None):
self.val = val
self.left = left
self.right = right
def left(self):
return self.left
def right(self):
return self.right
myTree = Tree(
1,
Tree(2,
Tree(3),
Tree(4, None, Tree(5))),
Tree(6, None, Tree(7)))
for node in fancyIt(myTree, Tree.left, Tree.right):
print(node.val)
which prints the numbers 1 through 7 in order.Breadth-first is slightly trickier, but only slightly trickier one time.
While easy, I think bisect would be a good addition to every stdlib too.
I don't think this is a large problem in practice because you shouldn't be using dozens of tree types in a given code base, so adding iterators to a tree is no big deal. In general there aren't enough types of iteration available to a given data structure that you need to describe how to iterate on it from the "outside". (Generally when you are doing that, it's too non-trivial to fit into this pattern anyhow; see the Visitor pattern in general.) This strikes me as maybe the sort of default tool you might slap in a library somewhere, but it should be a niche tool. If you're using it all the time you're probably doing something wrong. By default your data structures should be providing iteration packaged with them and it should generally be what you need. And your language should support aborting iteration, in whatever that looks like normally. I'm not sure I know a language that doesn't, it's a fairly basic element of iterator support when you get into implementation.
There are also many cases where a tree iterator will perform significantly better, including CPython. I don't have enough experience with PyPy to know if it could inline the Tree.left and Tree.right calls down to zero penalty at JIT time. Rust and C++ and the other static languages with sophisticated compilers might be able to get that down to fully inlined and zero-cost, but even if they can it's probably better not to push that on to the optimizer as the optimizers will eventually give up if this is composed with enough other stuff. Better to just have an efficient implementation in the first place.
(this is for iterating over nested JSON-like objects, which are just weird trees)
There are a lot of ways you could avoid the recursion, but that's a particularly nice way!
[1] https://doc.rust-lang.org/std/vec/struct.Vec.html#method.bin...
Yes, students should absolutely implement the classic algorithms to learn.
Yes, there are some occasions when you need to home grow one at $work.
BUT, in my opinion, most of the time, professional code should use a battle tested, vuln hardened library or builtin version. These things are VERY HARD to get exactly right. Jon Bently's Programming Pearls famously had a latent bug in its binary search for 20 years before someone caught it.
https://research.google/blog/extra-extra-read-all-about-it-n...
So yeah, it looks easy but don't do it. Stand on some giant's shoulders instead.
Anyone who copies and pastes it is welcome to both pieces when it breaks. Others have already alluded to possible improvements that could be made, and I already have my own analysis in a grandchild reply as to why I don't think this is a terribly pressing need or necessarily even a good idea.
The reason I provide code is that it gets past the "oh, you say it's just an iterator, but I still don't believe you, since you haven't spelled it out to the n'th degree". When code is provided, belief ceases to be an issue. It is clearly something an iterator can implement, in existing languages, with existing iterator support.
Unless you're going to claim it is somehow impossible to provide this functionality in a tested manner, you're completely changing the topic in an uninteresting direction, since it is always true that functionality generally needs testing and bits of code slammed into an HN conversation just to make a particular point probably shouldn't be copied wholesale into your production code.
> Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}
You could definitely find every string composed of "a", "b", and "c" of length 8 or less by defining a custom iterator but it would be a verbose and unpleasant way of writing it: class StringIterator {
public:
using iterator_category = std::forward_iterator_tag;
using value_type = std::string;
using difference_type = std::ptrdiff_t;
using pointer = const std::string*;
using reference = const std::string&;
StringIterator(bool begin = false) : is_end_(!begin) { if (begin) s_ = ""; }
const std::string& operator*() const {
if (is_end_) throw std::out_of_range("End iterator");
return s_;
}
StringIterator& operator++() {
if (is_end_) return *this;
if (s_.size() < 8) return s_.push_back('a'), *this;
while (!s_.empty() && s_.back() == 'c') s_.pop_back();
if (s_.empty()) is_end_ = true;
else s_.back() = s_.back() == 'a' ? 'b' : 'c';
return *this;
}
StringIterator operator++(int) { auto tmp = *this; ++(*this); return tmp; }
bool operator==(const StringIterator& other) const {
return is_end_ == other.is_end_ && (is_end_ || s_ == other.s_);
}
bool operator!=(const StringIterator& other) const { return !(*this == other); }
private:
std::string s_;
bool is_end_;
};
int main() {
StringIterator begin(true), end;
int count = 0;
for (auto it = begin; it != end; ++it) ++count;
std::cout << (count == 9841 ? "Pass" : "Fail") << std::endl;
return 0;
}
def itWithStop(init, stop, *options):
if init is not None and not stop(init):
yield(init)
for f in options:
newVal = f(init)
yield from itWithStop(newVal, stop, *options)
for s in itWithStop("",
lambda x: len(x) > 2,
lambda x: x + "a",
lambda x: x + "b",
lambda x: x + "c"):
print(s)
yields the combinations of 0 - 2 length strings with a, b, and c.Python has a number of ways to achieve this depending on exactly how you want to pass the arguments; multiple functions, optional arguments, etc. How nice the final call looks is more about your local language's closures look.
The main point here is that this will happily iterate on things that don't "exist".
module Tmp where
iter :: forall a. (a -> Bool) -> [a -> a] -> a -> [a]
iter p opts x = if p x then x:concatMap (iter p opts) (opts <*> [x]) else []
ghci> :l tmp.hs
[1 of 1] Compiling Tmp ( tmp.hs, interpreted )
Ok, one module loaded.
ghci> iter (\x -> length x < 3) [(++ "a"), (++ "b"), (++ "c")] ""
["","a","aa","ab","ac","b","ba","bb","bc",
"c","ca","cb","cc"]
(Since things are lazy in Haskell, functions that return lists effectively are iterators. There's probably something in the standard library somewhere for (opts <*> [x]) to avoid the wrapping x in an unnecessary list, but my Haskell is rusty.)And yes, Haskell is amazing at this sort of thing.
If the poster wants to particularize this to C++ because C++'s syntax can't support it in any reasonable manner, that's fine, but that's a C++ problem, not a "Programming languages..." problem. Which would be perfectly understandable and I'm not really complaining, more clarifying that most of the rest of the world can just rub together three or four existing constructs in a pretty reasonable manner to get this.
Ceterum censeo this would be a family of simple macros in LISP.
IMO the thing that would be really nice is if control flow like `for` was actually the same as using an iterator. This would really help in Rust too where handling errors inside iterator callbacks is a right pain.
I've seen a few languages try this but it seems to not be very popular. I think it can get a bit confusing how control flow keywords like `return` and `break` work if you turn `if` into syntactic sugar for a function call taking a closure etc.
In PHP you loop through an iterator with the foreach control structure.
In JavaScript you use for of.
In rust it’s for in.
What am I missing?
The same from another angle: there are a lot of trees in the indices of SQL databases (example [1]) but we don't zoom in to that level of detail very often when defining our tables.
To implement Brown's algorithm to optimize class-based language models I had to implement a complex forest (DAG, actually) in Python using lists of fixed length. That was not especially nice to work with.
type Node = { value: Any, left: Node, right: Node }
type Direction = Left | Right
type TreePosition = { root: Node, currentNode: Node = root, position: Direction[] = [] }
# Implementation left as an exercise but should be obvious and run in O(1), I believe. Returns Nothing when we're out of nodes.
function nextPosition(position: TreePosition): Option<TreePosition>
# The tree you want to iterate through
const myTree: Node = ...
# The loop
for(let position: TreePosition? = TreePosition(root: myTree); position != Nothing; position = nextPosition(position) {
node = position!.currentNode
# Your loop code
}
I'd argue this doesn't belong as a language-level feature, but maybe an API/stdlib-level feature. // Comments for the non-Rust native reader, regarding this Function declaration:
// successors is a function that accepts an `Option` container for some Value of type T, called `first`
// and a Closure called `succ`, constrained below:
pub fn successors<T, F>(first: Option<T>, succ: F) -> Successors<T, F> ⓘ
where
// `succ` must receive the iterated state, and return the next iterated state
F: FnMut(&T) -> Option<T>,
// Each time the `next()` function is called on the returned Iterator (a Successors-flavored iterator),
// the state of `first` is yielded, and then
// `succ` is called to progress
// until a `None` type is reported by `succ`
I'm not sure where the concept came from, but it's not dissimilar to the author's implementation, but instead of the ControlFlow enum, it relies simply on the Option enum. I know though, that it was initially built in the Itertools crate as unfold and then upstreamed some time later.Essentially you use `first` to contain a Queue, Stack, or Level for the different traversals, and define traversal or activities from there.
It's fairly ergonomic in practice, ergonomic enough for Leetcode.
Here's a BFS: https://leetcode.com/problems/course-schedule-iv/solutions/6...
[0] https://doc.rust-lang.org/std/iter/fn.successors.html
[1] https://docs.rs/itertools/latest/itertools/fn.unfold.html
> This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Also I’m slightly confused by this example.
for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
print(x);
}So, our “next node” operation is to concatenate to x. Won’t we either have to have a method for modifying x to go “up” a node, or we’ll have to keep a record of what x was upon entering each node? Like in this example we’ll end up with x=“aaaaaaaa” and then go up a node, over to a “b” node, and get x=“aaaaaaaab”, right?
I guess we can delete the a node’s copy of x after all of a node’s child nodes are visited, at least.
Perhaps it can be optimized to be a little better than the recursive version, depending on how much overhead your language uses for a stack frame that it won't need for this special case.
But tree traversal doesn't have this universal property. There are too many methods and purposes for traversing a tree, sufficient that IMHO no single primitive embodiment could materially improve a language. Also, modern compilers efficiently break down high-level traversal code so well that expressing the idea at a high level incurs no serious penalty compared to having a primitive for that purpose, or a series of them.
[0] https://www.hillelwayne.com/post/graph-types/ and https://news.ycombinator.com/item?id=39592444
One great reason not to use recursive functions for traversing trees is that you can allocate your own stack data structure rather than relying on the call stack itself. In most languages/runtimes, the call stack has a maximum depth which limits the depth of trees you can process, usually on the order of thousands of stack frames.
Managing your own stack usually produces weirder looking code (personally I find "naive" recursive approaches more readable) - but having it as a first-class language feature could solve that!
(defn walk [inner outer form]
(cond
(list? form) (outer (with-meta (apply list (map inner form)) (meta form)))
(instance? clojure.lang.IMapEntry form)
(outer (clojure.lang.MapEntry/create (inner (key form)) (inner (val form))))
(seq? form) (outer (with-meta (doall (map inner form)) (meta form)))
(instance? clojure.lang.IRecord form)
(outer (reduce (fn [r x] (conj r (inner x))) form form))
(coll? form) (outer (into (empty form) (map inner form)))
:else (outer form)))
(defn postwalk [f form]
(walk (partial postwalk f) f form))
(defn prewalk [f form]
(walk (partial prewalk f) identity (f form)))
Another reason why this perlisism holds: 9. It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.
"Let's move on." (defn tree-seq
"Returns a lazy sequence of the nodes in a tree, via a depth-first walk.
branch? must be a fn of one arg that returns true if passed a node
that can have children (but may not). children must be a fn of one
arg that returns a sequence of the children. Will only be called on
nodes for which branch? returns true. Root is the root node of the
tree."
{:added "1.0"
:static true}
[branch? children root]
(let [walk (fn walk [node]
(lazy-seq
(cons node
(when (branch? node)
(mapcat walk (children node))))))]
(walk root)))
(defn tree-seq-breadth
"Like tree-seq, but in breadth-first order"
[branch? children root]
(let [walk (fn walk [node]
(when (branch? node)
(let [cs (children node)]
(lazy-cat cs (mapcat walk cs)))))]
(cons root (walk root))))
Complex data structures absorb lot of the complexity of the problem and reduce the complexity of the rest of the code.
If every ounce of performance matters, e.g. in a database, you want 10000 functions, 100 for each data structure.
I guess all you really need are dynamically allocated arrays. A cons cell is an array of two. A struct with N fields is an array of N. Everything else is built on that.
If this becomes a C++ feature, imagine how many data structures we would need to support?
Many other languages, specially the FP languages, allow to do that as a library. Even the languages that are only inspired by FP. Example, Ruby:
class BinTree
include Enumerable
def initialize v, l, r
@v, @l, @r = v, l, r
end
def each &block
@l.each(&block) unless @l.nil?
yield @v
@r.each(&block) unless @r.nil?
end
end
Using the Enumerable mixin includes many FP-based methods, such as map, filter and reduce by only defining each, which in this case is DFS.Then we can proceed to define a binary tree:
tree = BinTree.new(
1,
BinTree.new(
2,
BinTree.new(4, nil, nil),
BinTree.new(5, nil, nil)
),
BinTree.new(
3,
BinTree.new(6, nil, nil),
BinTree.new(7, nil, nil),
)
)
Iterate over all the elements: tree.each{|v| puts v}
Iterate over the even elements: tree.filter{|v| v.even?}.each{|v| puts v}
Stop iteration when finding a value: tree.each do |v|
break if v == 1
puts v
end
And so on. The same can be done in Python, Kotlin and many others.C++ already solved that problem. Iterator are designed so that an algorithm can be written once and used with multiple data structures.
I really don't see your point
1. BFS is not supported.
2. Traversal type (inorder, preorer, postorder, or even mixed order!) is also not handled.
3. The syntax doesn't make it apparent that stack overflow can occur, e.g. by doing DFS on a linked list.
Thank you... I'll see myself out... lol =3
For example the most basic operations of a pointer are to advance and dereference.
std::map is actually implemented as a tree. To iterator its members you can do
for (cost auto &pair : map)
The only requirement for your custom data structure to work is to implement begin() and end() which return iterators - “pointer like” objects. for (const auto&node: await depth_first_tree(node))
And generators have the added advantage that walking trees is just a special case of the more general case of traversing a directed graph of elements, which is equally easy.Are generators not a thing in other languages?
The site cppreference has an example of walking trees in C++ using them.
https://en.cppreference.com/w/cpp/coroutine/generator
But I have a question: why is it that in your example, you write `await` before the generator function, but it's not in the example given on cppreference? Also, did you mean `co_await`?
Do NOT use the c++20 co-routine APIs (a half-implemented nightmare of an API, although what is there does miraculously work, contrary to expectations). Probably better to wait for c++23 generators, which are so far available as an experimental feature on GCC 14.0 (which makes the feature unusable in any of my projects).
All of which, I guess, answers my question about why nobody has brought up c++ generators yet. C# has nice generators.
Sorry I brought it up. :-(
Iterators for trees are not implemented with call stack recursion.
> I'm surprised nobody's brought up C++ generators yet
You cannot modify, insert, or use standard algorithms with a generator.
Not as ergonomic as a direct tree-iterator, but I can't see of an elegant way to introduce that in an imperative language while keeping the forking/recursion aspect clear
Here is how I one could write in C: https://godbolt.org/z/P3ErP4T4d or possibly with the setup code moved into the helper function for convenience: https://godbolt.org/z/soz7W5z1G
Programming languages should have less primitives like this and instead we should have better foundation libraries for the languages, i.e., containing iterator/-interfaces like Rust and Python (or Smalltalk).
var todo = new List<T>();
todo.append(root);
while (var item = todo.pop_front()) {
todo.append(item.left); // or .prepend for depth-first
todo.append(item.right); // or .prepend()
// do stuff...
}
Or for a possibly more useful comment, constant space tree traversal is genuinely quite difficult to implement. I don't know a general solution to it (other than walk from the start N times, quadratic style), would be interested to hear of one.
Naively I would expect something like left-hand-rule maze traversal to be constant space. The tree may need additional structures to support this sort of travel.
Other thoughts: most state machines are constant space.... there may be something there.
That constant space used to store the size could instead be a pointer to the root, but all the use cases I have for trees involve aliasing subtrees in which case there can be multiple "parents" for a given subtree and one cannot point to it.
In the spirit of completeness, my standard tree traversal recurses to a fixed depth and then falls back to the walk-repeatedly-from-the-root, which is constant space of a sort, but feels suboptimal.
It might be a competitive approach if the tree is very unbalanced or there are enough concurrent traversals, but in main experience the main reason to use parent pointers or the like are operations that are not visiting each node once starting from the root.
There's Morris tree traversal. It basically turns an ordinary binary tree into a threaded binary tree and fixes it up after it's done. But, this is mainly possible because of the space wasted by the null pointers at the leaves of the tree. You can hide a stack within the tree.
This kind of imperative iteration seems better served by the traditional visitor design pattern: more verbose (more explicit, not more complex) and more general.
class InOrderTreeIterator {
Stack stack;
TreeNode cursor;
InOrderTreeIterator(TreeNode root) {
cursor = root;
s = new Stack;
}
bool hasNext() {
return cursor != null || !stack.empty();
}
TreeNode next() {
if (cursor != null) {
while (cursor.left != null) {
stack.push(cursor);
cursor = cursor.left;
}
} else if (!stack.empty()) {
cursor = stack.pop();
} else {
throw new NoSuchElementException();
}
TreeNode ret = cursor;
cursor = cursor.right
return ret;
}
}
Iterable<TreeNode> inOrder(TreeNode node) sync* {
if (node.left != null) yield* inOrder(node.left!);
yield node;
if (node.right != null) yield* inOrder(node.right!);
}
But it's not obvious to me how you'd write a generic iterator that supports something like finding all free variables in an expression tree, or even just express a tree mapping function that constructs a new tree with the same structure (for example, "make all variables in the expression uppercase"). It's been a while since I worked with iterators so correct me if I'm wrong and this is in fact easy with iterators.
With something like Haskell's recursion-schemes library [0] these operations are all just a few lines long, guaranteed to terminate, and don't need to be updated if your data structure changes (for example you add new expression nodes). I'm not aware of any non-functional programming language that can do that. See for example the freeVars function at the bottom of the linked recursion-schemes doc.
[0] https://hackage.haskell.org/package/recursion-schemes-5.2.3
At the same time, the way it's implemented in Haskell's recursion-schemes library might be hard to wrap one's head around at first, kind of like how the list functions "foldr" and "foldl" are also often confusing to newbies, even though they're like the go-to, default way to make list functions in Haskell.
Bonus: Java's HotSpot magick will NOP (most?) methods of Null Objects, making this a zero cost abstraction.
I should probably write a concrete example for a blog or something.
TLDR: For every base class such as TreeNode, create a NullTreeNode that does nothing, then replace all uses of null with NullTreeNode. Voilá, no more null checks or NPEs.
We don't have syntax and semantics for recursion schemes in any programming language - it's always deferred to library support at best. As far as I'm concerned, this is an open problem in programing language design where we finally replace the vestigial recursion hack with a proper structured programming solution.
EnScript had a forall(NodeClass n in tree.GetRoot(){} construct that was very easy. It was essentially a depth-first iterator.
for recursive (Node t = tree.root; t != NULL;) {
puts(t.value);
if (t.value == target) break;
if (t.value == dontfollow) continue;
if (t.left) continue t.left;
if (t.right) continue t.right;
}
return t;
Regular 'break' is to really break out of the structure like a regular for, as regular 'continue' is to do the next iteration. But if continue has a value to recurse on, it reenters the for loop like a subroutine.As a bonus, I think this is tail-call-optimization friendly.
I explored this idea with gstd, a standard library for graphs inspired by the JS Array interface: https://github.com/cdrini/gstd/
- Every html document is a tree structure. And css documents have special syntax to address nodes within those trees
- If it's any good, you're routing framework probably uses some kind of tree to quickly match the request to it's handler
- The database you write to uses a tree to quickly find the location it needs to write to
(And to be clear, none of these involve sitting down and writing some generic Tree<T> structure, they're all cases of "Well, there's some tree-like datastructure that already exists or is a natural fit for this system and I'm going to be traversing it depth-first or breadth-first to do something to the elements of it or calculate something based on what nodes I pass through on a given traversal")
I use it in every project for data navigation and transformation, and it's more performant than standard Clojure data manipulation, while retaining types (instead of coercing back from seqs).
E.g. if you have a map and you want to increment every value in the map: (require '[com.rpl.specter :as S])
``` (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS] inc)) => {:a 6, :b 7, :c 8} ```
^ try to write that in normal clojure.
Now let's say you have a map of vectors and want to increment all of those? (->> {:a 5, :b 6, :c 7} (S/transform [S/MAP-VALS S/ALL] inc)) ;; note the navigator juts got another level of nesting => {:a [2 3], :b [4 5], :c [6 7]}works for all clj data types, and of course it has navigators for recursive walking .
It took me a while to get good at Specter, but it was worth it. I hear Rama uses Specter navigators internally.
https://www.metosin.fi/blog/transforming-data-with-malli-and...
I think allowing for starting this kind of search from a given node would cover most (though not all) of what you'd want to expose the trees structure for directly.
But I think grafting those two together is the right answer, not inventing a new loop construct.
Trees are easy to write iterators for. DAGs are a bit harder and full graphs are an advanced interview question.
Both looping and tree traversal can be done with library functions.
In general, everything that can be done with library functions, should be done with library functions.
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
Eh, in Haskell you can derive many of these things automatically. And even Rust has some automatic derivation.
[1]: https://lean-lang.org/doc/reference/latest//The-Type-System/...
> Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?
No it does not
A TXR Lisp macro for-tree for traversing a tree, given a variable, an expression for the root node, and expressions for how to to left and right relative to the variable.
A node structure that for-tree knows nothing about. No iterator abstraction or API, nothing.
(defstruct node ()
value
left
right)
Example tree (a balanced binary tree whose numeric values happen to be sorted): (defvar example-tree #S(node value 5
left #S(node value 3
left #S(node value 1)
right #S(node value 4))
right #S(node value 7
left #S(node value 6)
right #S(node value 9)))
Walk it: (for-tree nod example-tree nod.left nod.right
(prinl nod.value))
1
3
4
5
6
7
9
Code: (defmacro go-leftmost (var root left-expr stack)
^(for ((,var ,root)) (,var) ((set ,var ,left-expr))
(push ,var ,stack)))
(defmacro for-tree (var root left-expr right-expr . body)
(with-gensyms (stack node left right)
^(let (,stack)
(go-leftmost ,var ,root ,left-expr ,stack)
(for ((,var (pop ,stack)))
(,var)
((iflet ((,right ,right-expr))
(go-leftmost ,var ,right ,left-expr ,stack))
(set ,var (pop ,stack)))
,*body))))
The left-expr and right-expr must be expressions that involve the var variable; the construct evaluates these to find the left or right child. When the value is nil it means there is no child.This is almost exactly what the blog is asking for, transliterated to Lisp syntax.
Original concept in fantasy C syntax:
for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}
{
print(N->value);
}
Lispified: (for-tree nod example-tree nod.left nod.right
(prinl nod.value))
As required, the construct specifies the node variable to be the loop dummy, the root expression giving its initial value and the two accessor expressions for left and right traversal.The termination test N != NULL is made implicit in the Lisp version, so early termination requires a break out of the loop. It could be arranged.
The fantasy C syntax specifies a variable name in the navigation declarator: N : { N->left, N->right }. Presumably, the variable here can be renamed to anything you want; it just arbitrarily happens to have the same name as the loop dummy.
I didn't replicate this feature exactly because it just adds verbosity for nothing. It's okay if the navigation declarator just refers to the loop's one and only dummy variable.
Anyway, programming languages should definitely not have a tree traversal primitive. Rather, languages should be Lisp.
The left, right and key fields of a node object are accessed with like-named functions. We must call tree-root on a tree object to get to the encapsulated root node:
1> (for-tree n (tree-root #T(() 1 2 3 4 5 6 7 8)) (left n) (right n)
(prinl (key n)))
1
2
3
4
5
6
7
8
How about a tree of integers. Let's say the root integer is 1, and to go left is to multiply by 2, and to go right is to multiply by 2 and mask in a 1. And let's say the depth caps out at the value 16: 2> (for-tree n 1 (if (< n 8) (* 2 n)) (if (< n 8) (+ 1 (* 2 n)))
(prinl n))
8
4
9
2
10
5
11
1
12
6
13
3
14
7
15
nil
Let's see that in binary: 3> (for-tree n 1 (if (< n 8) (* 2 n)) (if (< n 8) (+ 1 (* 2 n)))
(format t "~b\n" n))
1000
100
1001
10
1010
101
1011
1
1100
110
1101
11
1110
111
1111
nil
Let's right-align digits to see different patterns in it: 2> (for-tree n 1 (if (< n 8) (* 2 n)) (if (< n 8) (+ 1 (* 2 n)))
(format t "~6b\n" n))
1000
100
1001
10
1010
101
1011
1
1100
110
1101
11
1110
111
1111
nil
Nah, left-aligned wins. 3> for-tree n 1 (if (n < 8) (n << 1)) (if (n < 8) (n << 1 | 1))
(format t "~b\n" n)
1000
100
1001
10
[...]
For exposition, lets seperate loop initialization from the rest, make `continue` mandatory to repeat the loop with another specified value (i.e. the loop implicitly ends without a continue) and lets say `descend` to recurse into the tree.
sum = 0;
loop (node = tree.root) {
sum += node.value;
if (node.left) descend node.left;
if (node.right) descend node.right;
}
Compare with a linear loop: sum = 0;
loop (i = 0) {
if (i >= array.length) break;
sum += array[i];
continue i + 1;
}
The `descend` keyword differs from `continue` in that its more of a continuation - control flow will continue from that point once it is done.You could make this more functional and less imperative (and I'd be surprised if such things don't exist as libraries in some functional languages). Yes ultimately we can transform this to recursive functions and compile it that way... but for imperative languages (especially those without closures) something like the above might be useful.
Arguably the above loops are _more_ imperative and less declaritive than the standard ones from the C family. You could add a functional twist by breaking with a value (which is the result of the overall `loop` expression).
func Search(Root : Tree; Desired_Value : Tree_Value)
-> optional Tree_Id is
var Identifier : optional Tree_Id := null;
for T => Root then Left(T) || Right(T)
while T not null concurrent loop
if Value(T) == Desired_Value then
// Found desired node,
// exit with its identifier
exit loop with Identifier => Key(T);
end if;
end loop;
return Identifier;
end func Search;
I actually don't use this feature much, because usually there is a key of some sort that determines which side of the tree is of interest.Web browsers have this. It’s the DOM and it scares the shit out of most JavaScript developers. The fun part is confronting people about their irrational fears watching them spin their wheels with all the imagined excuses.
I am a tremendous fan of tree systems. Tree systems should completely replace inheritance in programming to enforce the composition over inheritance principle.
Here's the range based for loop counterexample from the blog post, as a python generator:
import itertools
def iterate_tree():
for length in range(1, 8):
for combo in itertools.product("abc", repeat=length):
yield ''.join(combo)
for s in iterate_tree():
print(s)
It allows you to specify traversal in a structure-shy way. Personally, I think it goes a little too far, but it's definitely a good inspiration.
[1] https://www2.ccs.neu.edu/research/demeter/
Presentation: https://www2.ccs.neu.edu/research/demeter/talks/overview/qua...
OTOH I read/heard that the beauty of array languages is that they have few data types, but they can be easily worked on. So maybe the answer is not easier tree traversal primitives, but better literature/training on how to transform it in a more manageable data type.
Sometimes the answer is to adapt our vision to our world, sometimes the answer is to adapt the world to our vision.
So something similar to `break` and `continue` should be the method of branching (descending into the tree).
I think Scala might be a great language to prototype implementation of such construct.
Const km = 1 Const velocity = 11
And do km + velocity = 12 is all kinds of silly.
Const true = False Const false = True
And do if (true) "won't happen" else if (false) "will happen" is all kinds of silly.
Edit: Assuming your example is even statically typed.
I guess/expect that will have to expose the implementation detail of the branching factor (yes, a binary tree is a special-case of a https://en.wikipedia.org/wiki/B-tree for B = 2, but algorithms tend to be different)
On top of that, build a library of tree traversal routines (breadth first, depth first with a few variants), allowing the user to specify code that decides, on each node visited, whether to navigate further down and whether to stop the traversal.
Now, whether the language proper needs a generic way to specify tree traversals? I doubt it, and doubt even more that it is possible too that in a way so that a) most graph traversal algorithms can use the primitive and b) the result is easier to use/looks significantly nicer than just calling those routines.
public static IEnumerable<TNode> DepthFirstTraverse<TNode>(
TNode root,
Func<TNode, IEnumerable<TNode>> children
)
You can then just ‘foreach’ over it.This doesn’t require the whole tree to be in-memory. Optimal implementation would recurse via programmer-controlled stack (instead of language-controlled) to avoid nested iterators.
Here is a TypeScript version:
function* traverseDepthFirst<T>(
root: T,
children: (parent: T) => Iterable<T>
)
A similar approach should be possible in any language which supports iterators/generators.I will be transparent here, I do not know of a non recursive, tree traversal alogorthim that uses constant storage but doesn't modify the tree.. (Morris Traversal satisfies 2 but makes temporary changes to the tree)
Most primatives do not allocate hidden data (even C++ iterators require you to instantiate the iteration state as a variable) and operate within a known stack/execution context.
Having an actual primitive that implements arbitrary tree traversal would require a substantial complier generated, but non-transparent, framework for that execution. It would either need to be recursive, have localized dynamic data, or would require tree structures that are well defined to use something like Morris. (And Morris is natively non-concurrent even for reads)
While this means that that simple primitive would actually do a lot of potentially problematic things under the hood. How many Junior programmers would call it concurrently for reads because it's not "modifying the data". Why do you need to lock? Things like brake and continue and other control mechanisms wouldn't actually work as assumed from the flat context.... I.E what happens if somebody called a goto inside of that block? Also, in an embedded context where you may not have a dynamic allocator, how would a version written as in reiterative function with a expanding stack data structure function? From a language standard point of view, there are so many complexities to building something like this that runs in any context that the language is supported. Something like this that runs in any context that the language is supported.
I mean look at C++, they have specifically and (deliberately / arguably rightfully) failed to build copy on write strings (a simpler abstraction) because there are cases where they couldn't guarantee that they would have the expected results in all contexts.
This to me sounds more like something that should be a lambda function and a standard library, not a primitive... Languages that support dictionaries and maps as part of their standard data structures already have library support for tree traversal internally. They have solved many of the problems of making that language exposed, so that would be a much smaller ask. And programmers rightfully assume a lot more complexity within library calls then they do within primitives. Primitives are designed to be primitive. Having primitives able to create stack frames, dynamically allocate hidden memory, and/or recurse honestly worries me.
MJGrzymek•2mo ago