Goal
After this lesson you should be able to define a small abstract data type, use constructor and selector procedures, explain what an abstraction barrier protects, read a box-and-pointer diagram for pairs and lists, and avoid confusing a list's meaning with its pair representation.
The examples are deliberately small. The point is not to build a complete card library, geometry library, or arithmetic package. The point is to learn the habit of giving data a vocabulary before writing procedures that depend on it.
Diagrams At The Origin
This lesson introduces box-and-pointer diagrams near the beginning because data abstraction is easier to discuss when you can see the representation that is being hidden.
The diagrams are generated from small source blocks. For pair diagrams, a cell has a left field and a right field. In Scheme terms, these correspond to the two parts selected by car and cdr.
The source above is not executable R7RS Scheme. It is a diagram notation used by this site. The important visual conventions are these:
- A box is a pair.
- The left half represents the value selected by
car. - The right half represents the value selected by
cdr. - A slash in a field represents the empty list.
- A top-level arrow shows where the structure begins.
- An arrow points to a whole object. It does not point to half of a pair.
The physical direction of an arrow does not matter. Arrows are drawn to keep the page readable; the arrowhead is what tells you the relationship.
What Data Abstraction Means
A program often needs to talk about compound data: a playing card, a rational number, a point, a line segment, an interval, a hand of cards, or a list of results.
Data abstraction means that the rest of the program talks about the data in terms of its meaning, not in terms of the current representation.
For example, a card has a rank and a suit. Those are the words that belong in procedures that score cards. The implementation may store a card as a pair, as a number, as a vector, or in some other form later. Code that uses cards should not need to know that.
A constructor builds a value of an abstract data type. A selector extracts a meaningful component from that value.
(define (make-card rank suit)
(cons rank suit))
(define (card-rank card)
(car card))
(define (card-suit card)
(cdr card))
The representation here is simple: a card is a pair whose car is the rank and whose cdr is the suit. But the rest of the program should say card-rank and card-suit, not car and cdr, when it is thinking about cards.
A Hand Of Cards
A hand is an ordered collection of cards. In this lesson we represent a hand as a proper list of card values.
(define (make-hand . cards)
cards)
(define (first-card hand)
(car hand))
(define (rest-cards hand)
(cdr hand))
Now a total procedure can use the vocabulary of cards and hands.
(define (hand-total hand)
(if (null? hand)
0
(+ (card-rank (first-card hand))
(hand-total (rest-cards hand)))))
This version is longer than writing directly with car and cdr, but it says what each operation means. The car used by card-rank means "the rank part of a card." The car used by first-card means "the first card in a hand." Those are not the same idea, even if the current implementation uses the same primitive selector for both.
Changing The Representation
Suppose a later version stores each card as a number from 1 through 52. The public operations can keep the same meaning.
(define (make-card-number rank suit)
(cond
((eq? suit 'heart) rank)
((eq? suit 'spade) (+ rank 13))
((eq? suit 'diamond) (+ rank 26))
((eq? suit 'club) (+ rank 39))
(else (error "unknown suit" suit))))
(define (card-number-rank card)
(let ((rank (remainder card 13)))
(if (= rank 0) 13 rank)))
(define (card-number-suit card)
(cond
((<= 1 card 13) 'heart)
((<= 14 card 26) 'spade)
((<= 27 card 39) 'diamond)
((<= 40 card 52) 'club)
(else (error "card out of range" card))))
The point is not that this representation is better. The point is that a representation can change while the operations keep the same external promises.
The abstraction barrier is the boundary between code that knows the representation and code that only knows the operations. Constructors and selectors sit on that boundary.
Code below the barrier may use cons, car, cdr, arithmetic, or any other representation technique. Code above the barrier should use names such as make-card, card-rank, and card-suit.
One Pair, Many Meanings
A pair is just two fields. Many abstract data types can use that same shape.
The boxes have the same representation shape, but their meanings are different. A rational number is not a point. An interval is not a rational number. The abstraction vocabulary tells the rest of the program how to interpret the pair.
Rational Numbers
Here is a small rational-number abstraction. The constructor reduces the fraction and keeps the denominator positive.
(define (integer-gcd a b)
(if (= b 0)
a
(integer-gcd b (remainder a b))))
(define (make-rat numerator denominator)
(if (= denominator 0)
(error "make-rat: zero denominator")
(let* ((sign (if (< denominator 0) -1 1))
(n (* sign numerator))
(d (* sign denominator))
(g (integer-gcd (abs n) d)))
(cons (/ n g) (/ d g)))))
(define (rat-numerator rat)
(car rat))
(define (rat-denominator rat)
(cdr rat))
Once those operations exist, arithmetic procedures should use them rather than reaching into the pair directly.
(define (rat-add a b)
(make-rat (+ (* (rat-numerator a) (rat-denominator b))
(* (rat-numerator b) (rat-denominator a)))
(* (rat-denominator a) (rat-denominator b))))
If make-rat later stores rationals differently, rat-add only needs the selectors to keep their promises.
Points, Segments, And Respecting Layers
A point can be represented as a pair of coordinates.
(define (make-point x y)
(cons x y))
(define (point-x point)
(car point))
(define (point-y point)
(cdr point))
A line segment can be represented as a pair of points.
(define (make-segment start end)
(cons start end))
(define (segment-start segment)
(car segment))
(define (segment-end segment)
(cdr segment))
The diagram has three pairs: one segment pair and two point pairs.
Respecting the abstraction means that a segment procedure uses segment-start and segment-end to get points, then uses point-x and point-y to get coordinates from those points.
(define (midpoint-segment segment)
(let ((start (segment-start segment))
(end (segment-end segment)))
(make-point (/ (+ (point-x start) (point-x end)) 2)
(/ (+ (point-y start) (point-y end)) 2))))
This is a nested abstraction. A segment contains points; it is not merely a flat pile of four numbers.
Intervals
An interval is another two-part abstraction. It can use the same pair shape with different names.
(define (make-interval lower upper)
(if (> lower upper)
(error "make-interval: lower is greater than upper" lower upper)
(cons lower upper)))
(define (interval-lower interval)
(car interval))
(define (interval-upper interval)
(cdr interval))
(define (interval-width interval)
(/ (- (interval-upper interval)
(interval-lower interval))
2))
The implementation is simple, but procedures that use intervals should still use interval words. If you see car inside interval-lower, that is representation code. If you see car inside a procedure that is supposed to reason about interval width, that procedure is crossing the abstraction barrier.
Pairs Do Not Have To Be Primitive
Scheme provides pairs as primitive data. The constructor is cons; the selectors are car and cdr.
The idea of a pair, however, can be described by behavior: if you construct a pair from x and y, car gives back x and cdr gives back y.
With first-class procedures, we can make a pair-like value as a procedure that responds to messages.
(define (functional-cons x y)
(lambda (message)
(cond
((eq? message 'car) x)
((eq? message 'cdr) y)
(else (error "unknown pair message" message)))))
(define (functional-car pair)
(pair 'car))
(define (functional-cdr pair)
(pair 'cdr))
This is not how Scheme implementations normally represent pairs. Real pairs are primitive for efficiency and interoperability. But this example shows a deeper point: an abstraction is defined by the operations that work on it.
Proper Lists As Pair Chains
A proper list is either the empty list or a pair whose cdr is a proper list. The printed list '(a b c) has a three-pair backbone.
This is why car gets the first element and cdr gets the rest of the list.
A nested list uses more pairs. The top-level list ((a b) c (d (e f))) has three top-level elements:
- the list
(a b) - the symbol
c - the list
(d (e f))
Draw the three-pair top-level backbone first. Then draw the pairs needed by the sublists.
This diagram is also a warning. When a program manipulates a list, it is often better to ask about the list structure first: is it empty, and if not, what is its first element and its rest? Only after that should you worry about whether an element is itself a list.
Lists And Sentences
Some classic Scheme courses introduce a sentence abstraction before exposing pairs. That abstraction is useful pedagogically because it lets early programs process flat sequences of words without thinking about cons cells.
R7RS does not standardize sentence, word, first, butfirst, or butlast. This course uses standard proper lists as the baseline. When we refer to older exercises that use sentence operations, we will translate the idea carefully rather than pretending those procedures are part of R7RS.
The important distinction is this: a list can contain any Scheme value, including another list. A sentence abstraction normally restricts its elements so that it stays flat.
First Complete Example
The published example source for this lesson checks cards, rational numbers, points, segments, intervals, functional pairs, and nested list structure.
(define (hand-total hand)
(if (null? hand)
0
(+ (card-rank (first-card hand))
(hand-total (rest-cards hand)))))
(define (midpoint-segment segment)
(let ((start (segment-start segment))
(end (segment-end segment)))
(make-point (/ (+ (point-x start) (point-x end)) 2)
(/ (+ (point-y start) (point-y end)) 2))))
The full source file has a highlighted view at /examples/r7rs/intro/lesson-04/ and a raw source link from that page.
When working from the source tree used to build this site, run the example suite with:
scripts/test_examples_chicken.sh
Exercises
These exercises are original and deliberately small.
- Define
make-book,book-title, andbook-authorusing pairs. - Write
same-suit?using onlycard-suit, notcdr. - Rewrite
hand-totalso it uses an internal tail-recursive helper, while still usingcard-rank,first-card, andrest-cards. - Explain which procedures are below the card abstraction barrier and which are above it.
- Define
rat-negateandrat-subtractusing the rational selectors and constructor. - Draw a pair diagram for
(cons 'x 'y). Is it a proper list? - Draw the pair backbone for
'(red green blue). - Draw the top-level backbone for
'((red green) blue), then add the sublist. - Explain why a segment represented as a pair of points should not be treated as four adjacent numbers.
- Use
functional-cons,functional-car, andfunctional-cdrto show that the pair behavior can be implemented with procedures.
What Comes Next
The next lesson is Lesson 5: hierarchical data and tree shapes. It uses the pair and list representations from this lesson to model roots, branches, leaves, and subtrees.
Before moving on, make sure you can explain these ideas without guessing:
- What constructor and selector mean.
- Why
card-rankcan be better than writingcardirectly. - What an abstraction barrier protects.
- Why the same pair shape can represent different abstract data types.
- Why a line segment is naturally a pair of points, not just four numbers.
- How a proper list is built from pairs and the empty list.
- Why the top-level arrow in a box-and-pointer diagram matters.