Lesson 5 ======== - Hierarchical data - Nodes and subtrees - Roots, branches, and leaves - N-way and binary trees - Tree ADTs - Nested-list trees - Tree mapping Represent information with hierarchy, then choose a tree representation that matches the meaning of the problem. ## Goal After this lesson you should be able to describe a tree using the words node, root, branch, leaf, child, and subtree; explain why there is no single official Scheme tree representation; define a small tree abstract data type; and map a procedure over two different tree representations. This lesson is about tree shape and representation. The next lesson will focus on traversal order, search, and expression trees. Keeping those apart makes the first tree programs easier to read. ## A Tree Is A Hierarchy A tree represents information arranged in levels. Each item can have smaller items below it. The top item is the root. The items below a node are its children. A node with no children is a leaf. A node is a point in the tree, but it is also a tree of its own. If you focus on the node labelled Europe below, Europe is a child of World, but Europe is also the root of a smaller tree containing countries and cities. ```scheme (tree-diagram (node world :label World :root) (node europe :label Europe) (node africa :label Africa) (node americas :label Americas) (node portugal :label Portugal) (node kenya :label Kenya) (node brazil :label Brazil) (node lisbon :label Lisbon) (node nairobi :label Nairobi) (node brasilia :label Brasilia) (edge world europe) (edge world africa) (edge world americas) (edge europe portugal) (edge africa kenya) (edge americas brazil) (edge portugal lisbon) (edge kenya nairobi) (edge brazil brasilia)) ``` Trees are useful whenever the information itself is hierarchical: directories contain files and subdirectories, a document contains sections and paragraphs, a country contains regions and cities, and a programming language expression contains subexpressions. ## Tree Vocabulary The vocabulary is small, but worth being precise about. - A node is one position in a tree. - The root is the top node of the tree currently being discussed. - A child is a node immediately below another node. - A branch is a connection from a node to a child, or informally a non-leaf part of a tree. - A leaf is a node with no children. - A subtree is any node considered together with all of the nodes below it. The same object can be described in more than one way. Lisbon is a leaf in the geography tree. Portugal is a child of Europe, a parent of Lisbon, and also the root of the subtree containing Portugal and Lisbon. ## No Single Official Tree Representation Scheme has a standard notation and representation for proper lists. A proper list is a chain of pairs ending in the empty list. Trees are different. There is no single tree shape that fits every problem. Questions you have to answer include: - Does every node have data, or only the leaves? - Can a node have any number of children, or exactly two child positions? - Does the order of children matter? - Is an empty tree allowed? - Should a branch node store a value of its own? Because the answers vary, Scheme gives you pairs and lists, and you build the tree representation that matches the problem. ## N-Way And Binary Trees An n-way tree allows a node to have any number of children. The geography tree above is n-way: World has three children in the diagram, and it could have more. A binary tree has at most two child positions, conventionally called left and right. Binary trees are common when the representation itself has a left/right meaning, as in binary search trees or arithmetic expression trees. ```scheme (tree-diagram (node root :label 8 :root) (node left :label 3) (node right :label 13) (node left-left :label 1) (node left-right :label 5) (node right-left :label 11) (node right-right :label 21) (edge root left :label left) (edge root right :label right) (edge left left-left :label left) (edge left left-right :label right) (edge right right-left :label left) (edge right right-right :label right)) ``` This lesson will mostly use n-way trees. The next lesson will use binary trees when discussing inorder traversal and search trees. ## A Tree ADT With Data At Every Node One clean representation is a tree node with a datum and a list of children. Each child is itself a tree. ```scheme (define (make-tree datum children) (cons datum children)) (define (datum node) (car node)) (define (children node) (cdr node)) (define (leaf? node) (null? (children node))) ``` In this representation, a leaf is still a tree node. It simply has no children. ```scheme (define (make-tree datum children) (cons datum children)) (define (datum node) (car node)) (define (children node) (cdr node)) (define (leaf? node) (null? (children node))) (define city-tree (make-tree 'world (list (make-tree 'europe (list (make-tree 'portugal (list (make-tree 'lisbon '())))))))) (leaf? (car (children (car (children (car (children city-tree))))))) ``` The expression above is intentionally awkward. It shows why selectors are necessary, but it also shows that deeply nested data often needs better helper procedures before programs become pleasant to read. ## A Node Is Also A Subtree With the ADT above, the Europe node from the geography tree can be selected and then treated as a complete tree. ```scheme (define europe-subtree (car (children geography-tree))) ``` The value named `europe-subtree` is a child of the world tree. It is also a tree whose root datum is `europe`. ```scheme (define (make-tree datum children) (cons datum children)) (define (datum node) (car node)) (define (children node) (cdr node)) (define geography-tree (make-tree 'world (list (make-tree 'europe (list (make-tree 'portugal (list (make-tree 'lisbon '()))))) (make-tree 'africa (list (make-tree 'kenya (list (make-tree 'nairobi '())))))))) (define europe-subtree (car (children geography-tree))) (datum europe-subtree) ``` This is the same idea as with lists: the rest of a nonempty proper list is itself a proper list. In a tree, each child is itself a tree. ## Mapping Over A Tree ADT In Lesson 2, `map` over a list applied a procedure to each element of a sequence. With a tree ADT, there is one datum at the current node and then a list of child trees. That leads to this shape: ```scheme (define (tree-map proc tree) (make-tree (proc (datum tree)) (map (lambda (child) (tree-map proc child)) (children tree)))) ``` The recursive call is not just on one rest list. It is made for each child. The ordinary list procedure `map` handles the sequence of children; `tree-map` handles each child as a tree. ```scheme (define (make-tree datum children) (cons datum children)) (define (datum node) (car node)) (define (children node) (cdr node)) (define (tree-map proc tree) (make-tree (proc (datum tree)) (map (lambda (child) (tree-map proc child)) (children tree)))) (define number-tree (make-tree 1 (list (make-tree 2 '()) (make-tree 3 (list (make-tree 4 '())))))) (tree-map (lambda (n) (* n n)) number-tree) ``` The result uses the same tree shape, with each datum transformed. ## Leaves-Only Trees Another common representation treats ordinary nested lists as trees. In this view, branch nodes do not have data of their own. The leaves are the non-list values. For example: ```scheme '(a (b c d) (e (f g) h)) ``` This can be read as a tree with three top-level children: the leaf `a`, a branch containing `b`, `c`, and `d`, and a branch containing `e`, another branch, and `h`. ```scheme (tree-diagram (node root :label list :root) (node a :label a) (node bcd :label list) (node efg :label list) (node b :label b) (node c :label c) (node d :label d) (node e :label e) (node fg :label list) (node h :label h) (node f :label f) (node g :label g) (edge root a) (edge root bcd) (edge root efg) (edge bcd b) (edge bcd c) (edge bcd d) (edge efg e) (edge efg fg) (edge efg h) (edge fg f) (edge fg g)) ``` This is not the same ADT as the previous section. There is no separate constructor saying where the datum is and where the children are. The list structure itself is being interpreted as the tree. ## Mapping Over Nested-List Trees For nested-list trees, mapping over leaves means making a choice at each step: - If the tree is empty, return the empty list. - If the tree is a pair, recursively process both the `car` and the `cdr`. - Otherwise, the tree is a leaf, so apply the procedure to that leaf. R7RS does not provide `atom?`, so define the idea directly or use the `pair?` test as below. ```scheme (define (nested-tree-map proc tree) (cond ((null? tree) '()) ((pair? tree) (cons (nested-tree-map proc (car tree)) (nested-tree-map proc (cdr tree)))) (else (proc tree)))) ``` This is tree recursion over the pair structure: one recursive call for the `car` and one recursive call for the `cdr`. ```scheme (define (nested-tree-map proc tree) (cond ((null? tree) '()) ((pair? tree) (cons (nested-tree-map proc (car tree)) (nested-tree-map proc (cdr tree)))) (else (proc tree)))) (nested-tree-map (lambda (n) (* n n)) '(1 (2 3) (4 (5 6) 7))) ``` This direct pair-structure version is useful, but be clear about what it means. It treats every pair as two subtrees. That is different from treating a node as a datum plus a list of child trees. ## Choosing A Representation Both representations are reasonable, but they answer different questions. Use a datum-plus-children ADT when every node should carry a value. This is natural for organization charts, taxonomies, file-system indexes, outlines, and syntax trees where each branch has a label. Use a nested-list tree when the branch nodes are only grouping structure and the leaves carry the real data. This is natural for “apply this to every leaf” examples and for some symbolic data structures. The important habit is to write down the representation before writing procedures over it. If you change the representation, procedures written above a clear abstraction barrier should not all need to change. ## First Complete Example The published example source for this lesson checks the datum-plus-children tree ADT, leaf detection, subtrees, tree mapping, and nested-list tree mapping. ```scheme (define (tree-map proc tree) (make-tree (proc (datum tree)) (map (lambda (child) (tree-map proc child)) (children tree)))) (define (nested-tree-map proc tree) (cond ((null? tree) '()) ((pair? tree) (cons (nested-tree-map proc (car tree)) (nested-tree-map proc (cdr tree)))) (else (proc tree)))) ``` The full source file has a highlighted view at [`/examples/r7rs/intro/lesson-05/`](/examples/r7rs/intro/lesson-05/) and a raw source link from that page. When working from the source tree used to build this site, run the example suite with: ```sh scripts/test_examples_chicken.sh ``` ## Exercises These exercises are original and deliberately small. - In the geography diagram, identify the root, three leaves, and one subtree. - Explain why the Europe node can be viewed as both a child and a tree. - Define a tree for a small table of contents: book, chapters, and sections. - Write `leaf-count` for the datum-plus-children tree representation. - Use `tree-map` to add `1` to every datum in a numeric tree. - For `'(a (b c) d)`, draw the leaves-only tree by hand. - Use `nested-tree-map` to convert every symbol in `'(a (b c))` into the symbol `x`. - Explain why the datum-plus-children representation and the nested-list representation are not interchangeable. - Give an example where sibling order matters. - Give an example where sibling order does not matter. ## What Comes Next The next material will use the tree representations from this lesson to ask how a program should visit the nodes: depth first, breadth first, preorder, postorder, and inorder for binary trees. Before moving on, make sure you can explain these ideas without guessing: - What node, root, child, leaf, and subtree mean. - Why a node can also be considered a tree. - Why Scheme does not impose one official tree representation. - How `make-tree`, `datum`, `children`, and `leaf?` work together. - Why `tree-map` uses ordinary `map` over the children. - Why `nested-tree-map` recurs on both `car` and `cdr`. - Why choosing a representation is part of designing the program.