Introductory R7RS Scheme

Lesson 5

Represent information with hierarchy, then choose a tree representation that matches the meaning of the problem.

Goal

After this lesson you should be able to describe a tree using the words node, root, branch, leaf, child, and subtree; explain why there is no single official Scheme tree representation; define a small tree abstract data type; and map a procedure over two different tree representations.

This lesson is about tree shape and representation. The next lesson will focus on traversal order, search, and expression trees. Keeping those apart makes the first tree programs easier to read.

A Tree Is A Hierarchy

A tree represents information arranged in levels. Each item can have smaller items below it. The top item is the root. The items below a node are its children. A node with no children is a leaf.

A node is a point in the tree, but it is also a tree of its own. If you focus on the node labelled Europe below, Europe is a child of World, but Europe is also the root of a smaller tree containing countries and cities.

World Europe Africa Americas Portugal Kenya Brazil Lisbon Nairobi Brasilia

Trees are useful whenever the information itself is hierarchical: directories contain files and subdirectories, a document contains sections and paragraphs, a country contains regions and cities, and a programming language expression contains subexpressions.

Tree Vocabulary

The vocabulary is small, but worth being precise about.

The same object can be described in more than one way. Lisbon is a leaf in the geography tree. Portugal is a child of Europe, a parent of Lisbon, and also the root of the subtree containing Portugal and Lisbon.

No Single Official Tree Representation

Scheme has a standard notation and representation for proper lists. A proper list is a chain of pairs ending in the empty list.

Trees are different. There is no single tree shape that fits every problem.

Questions you have to answer include:

Because the answers vary, Scheme gives you pairs and lists, and you build the tree representation that matches the problem.

N-Way And Binary Trees

An n-way tree allows a node to have any number of children. The geography tree above is n-way: World has three children in the diagram, and it could have more.

A binary tree has at most two child positions, conventionally called left and right. Binary trees are common when the representation itself has a left/right meaning, as in binary search trees or arithmetic expression trees.

left right left right left right 8 3 13 1 5 11 21

This lesson will mostly use n-way trees. The next lesson will use binary trees when discussing inorder traversal and search trees.

A Tree ADT With Data At Every Node

One clean representation is a tree node with a datum and a list of children. Each child is itself a tree.

(define (make-tree datum children)
  (cons datum children))

(define (datum node)
  (car node))

(define (children node)
  (cdr node))

(define (leaf? node)
  (null? (children node)))

In this representation, a leaf is still a tree node. It simply has no children.


The expression above is intentionally awkward. It shows why selectors are necessary, but it also shows that deeply nested data often needs better helper procedures before programs become pleasant to read.

A Node Is Also A Subtree

With the ADT above, the Europe node from the geography tree can be selected and then treated as a complete tree.

(define europe-subtree
  (car (children geography-tree)))

The value named europe-subtree is a child of the world tree. It is also a tree whose root datum is europe.


This is the same idea as with lists: the rest of a nonempty proper list is itself a proper list. In a tree, each child is itself a tree.

Mapping Over A Tree ADT

In Lesson 2, map over a list applied a procedure to each element of a sequence. With a tree ADT, there is one datum at the current node and then a list of child trees.

That leads to this shape:

(define (tree-map proc tree)
  (make-tree (proc (datum tree))
             (map (lambda (child)
                    (tree-map proc child))
                  (children tree))))

The recursive call is not just on one rest list. It is made for each child. The ordinary list procedure map handles the sequence of children; tree-map handles each child as a tree.


The result uses the same tree shape, with each datum transformed.

Leaves-Only Trees

Another common representation treats ordinary nested lists as trees. In this view, branch nodes do not have data of their own. The leaves are the non-list values.

For example:

'(a (b c d) (e (f g) h))

This can be read as a tree with three top-level children: the leaf a, a branch containing b, c, and d, and a branch containing e, another branch, and h.

list a list list b c d e list h f g

This is not the same ADT as the previous section. There is no separate constructor saying where the datum is and where the children are. The list structure itself is being interpreted as the tree.

Mapping Over Nested-List Trees

For nested-list trees, mapping over leaves means making a choice at each step:

R7RS does not provide atom?, so define the idea directly or use the pair? test as below.

(define (nested-tree-map proc tree)
  (cond
    ((null? tree) '())
    ((pair? tree)
     (cons (nested-tree-map proc (car tree))
           (nested-tree-map proc (cdr tree))))
    (else (proc tree))))

This is tree recursion over the pair structure: one recursive call for the car and one recursive call for the cdr.


This direct pair-structure version is useful, but be clear about what it means. It treats every pair as two subtrees. That is different from treating a node as a datum plus a list of child trees.

Choosing A Representation

Both representations are reasonable, but they answer different questions.

Use a datum-plus-children ADT when every node should carry a value. This is natural for organization charts, taxonomies, file-system indexes, outlines, and syntax trees where each branch has a label.

Use a nested-list tree when the branch nodes are only grouping structure and the leaves carry the real data. This is natural for “apply this to every leaf” examples and for some symbolic data structures.

The important habit is to write down the representation before writing procedures over it. If you change the representation, procedures written above a clear abstraction barrier should not all need to change.

First Complete Example

The published example source for this lesson checks the datum-plus-children tree ADT, leaf detection, subtrees, tree mapping, and nested-list tree mapping.

(define (tree-map proc tree)
  (make-tree (proc (datum tree))
             (map (lambda (child)
                    (tree-map proc child))
                  (children tree))))

(define (nested-tree-map proc tree)
  (cond
    ((null? tree) '())
    ((pair? tree)
     (cons (nested-tree-map proc (car tree))
           (nested-tree-map proc (cdr tree))))
    (else (proc tree))))

The full source file has a highlighted view at /examples/r7rs/intro/lesson-05/ and a raw source link from that page.

When working from the source tree used to build this site, run the example suite with:

scripts/test_examples_chicken.sh

Exercises

These exercises are original and deliberately small.

What Comes Next

The next material will use the tree representations from this lesson to ask how a program should visit the nodes: depth first, breadth first, preorder, postorder, and inorder for binary trees.

Before moving on, make sure you can explain these ideas without guessing: