Lesson 4 ======== - Data abstraction - Constructors and selectors - Abstraction barriers - Pairs as representation - Box-and-pointer diagrams - Proper lists and nested structure - Functional pairs Use constructors and selectors to separate what data means from how pairs and lists represent it. ## Goal After this lesson you should be able to define a small abstract data type, use constructor and selector procedures, explain what an abstraction barrier protects, read a box-and-pointer diagram for pairs and lists, and avoid confusing a list's meaning with its pair representation. The examples are deliberately small. The point is not to build a complete card library, geometry library, or arithmetic package. The point is to learn the habit of giving data a vocabulary before writing procedures that depend on it. ## Diagrams At The Origin This lesson introduces box-and-pointer diagrams near the beginning because data abstraction is easier to discuss when you can see the representation that is being hidden. The diagrams are generated from small source blocks. For pair diagrams, a cell has a left field and a right field. In Scheme terms, these correspond to the two parts selected by `car` and `cdr`. ```scheme (pair-diagram (root value p1) (cell p1 :left a :right p2) (cell p2 :left b :right empty)) ``` The source above is not executable R7RS Scheme. It is a diagram notation used by this site. The important visual conventions are these: - A box is a pair. - The left half represents the value selected by `car`. - The right half represents the value selected by `cdr`. - A slash in a field represents the empty list. - A top-level arrow shows where the structure begins. - An arrow points to a whole object. It does not point to half of a pair. The physical direction of an arrow does not matter. Arrows are drawn to keep the page readable; the arrowhead is what tells you the relationship. ## What Data Abstraction Means A program often needs to talk about compound data: a playing card, a rational number, a point, a line segment, an interval, a hand of cards, or a list of results. Data abstraction means that the rest of the program talks about the data in terms of its meaning, not in terms of the current representation. For example, a card has a rank and a suit. Those are the words that belong in procedures that score cards. The implementation may store a card as a pair, as a number, as a vector, or in some other form later. Code that uses cards should not need to know that. A constructor builds a value of an abstract data type. A selector extracts a meaningful component from that value. ```scheme (define (make-card rank suit) (cons rank suit)) (define (card-rank card) (car card)) (define (card-suit card) (cdr card)) ``` The representation here is simple: a card is a pair whose `car` is the rank and whose `cdr` is the suit. But the rest of the program should say `card-rank` and `card-suit`, not `car` and `cdr`, when it is thinking about cards. ```scheme (define (make-card rank suit) (cons rank suit)) (define (card-rank card) (car card)) (define (card-suit card) (cdr card)) (card-rank (make-card 10 'heart)) ``` ## A Hand Of Cards A hand is an ordered collection of cards. In this lesson we represent a hand as a proper list of card values. ```scheme (define (make-hand . cards) cards) (define (first-card hand) (car hand)) (define (rest-cards hand) (cdr hand)) ``` Now a total procedure can use the vocabulary of cards and hands. ```scheme (define (hand-total hand) (if (null? hand) 0 (+ (card-rank (first-card hand)) (hand-total (rest-cards hand))))) ``` This version is longer than writing directly with `car` and `cdr`, but it says what each operation means. The `car` used by `card-rank` means "the rank part of a card." The `car` used by `first-card` means "the first card in a hand." Those are not the same idea, even if the current implementation uses the same primitive selector for both. ```scheme (define (make-card rank suit) (cons rank suit)) (define (card-rank card) (car card)) (define (make-hand . cards) cards) (define (first-card hand) (car hand)) (define (rest-cards hand) (cdr hand)) (define (hand-total hand) (if (null? hand) 0 (+ (card-rank (first-card hand)) (hand-total (rest-cards hand))))) (hand-total (make-hand (make-card 3 'heart) (make-card 10 'club) (make-card 4 'diamond))) ``` ## Changing The Representation Suppose a later version stores each card as a number from `1` through `52`. The public operations can keep the same meaning. ```scheme (define (make-card-number rank suit) (cond ((eq? suit 'heart) rank) ((eq? suit 'spade) (+ rank 13)) ((eq? suit 'diamond) (+ rank 26)) ((eq? suit 'club) (+ rank 39)) (else (error "unknown suit" suit)))) (define (card-number-rank card) (let ((rank (remainder card 13))) (if (= rank 0) 13 rank))) (define (card-number-suit card) (cond ((<= 1 card 13) 'heart) ((<= 14 card 26) 'spade) ((<= 27 card 39) 'diamond) ((<= 40 card 52) 'club) (else (error "card out of range" card)))) ``` The point is not that this representation is better. The point is that a representation can change while the operations keep the same external promises. The abstraction barrier is the boundary between code that knows the representation and code that only knows the operations. Constructors and selectors sit on that boundary. Code below the barrier may use `cons`, `car`, `cdr`, arithmetic, or any other representation technique. Code above the barrier should use names such as `make-card`, `card-rank`, and `card-suit`. ## One Pair, Many Meanings A pair is just two fields. Many abstract data types can use that same shape. ```scheme (pair-diagram (root rat rational :x 30 :y 54) (root int interval :x 30 :y 164) (root pt point :x 30 :y 274) (cell rational :label rational :left numerator :right denominator :x 110 :y 32) (cell interval :label interval :left lower :right upper :x 110 :y 142) (cell point :label point :left x :right y :x 110 :y 252)) ``` The boxes have the same representation shape, but their meanings are different. A rational number is not a point. An interval is not a rational number. The abstraction vocabulary tells the rest of the program how to interpret the pair. ## Rational Numbers Here is a small rational-number abstraction. The constructor reduces the fraction and keeps the denominator positive. ```scheme (define (integer-gcd a b) (if (= b 0) a (integer-gcd b (remainder a b)))) (define (make-rat numerator denominator) (if (= denominator 0) (error "make-rat: zero denominator") (let* ((sign (if (< denominator 0) -1 1)) (n (* sign numerator)) (d (* sign denominator)) (g (integer-gcd (abs n) d))) (cons (/ n g) (/ d g))))) (define (rat-numerator rat) (car rat)) (define (rat-denominator rat) (cdr rat)) ``` Once those operations exist, arithmetic procedures should use them rather than reaching into the pair directly. ```scheme (define (rat-add a b) (make-rat (+ (* (rat-numerator a) (rat-denominator b)) (* (rat-numerator b) (rat-denominator a))) (* (rat-denominator a) (rat-denominator b)))) ``` If `make-rat` later stores rationals differently, `rat-add` only needs the selectors to keep their promises. ```scheme (define (integer-gcd a b) (if (= b 0) a (integer-gcd b (remainder a b)))) (define (make-rat numerator denominator) (if (= denominator 0) (error "make-rat: zero denominator") (let* ((sign (if (< denominator 0) -1 1)) (n (* sign numerator)) (d (* sign denominator)) (g (integer-gcd (abs n) d))) (cons (/ n g) (/ d g))))) (define (rat-numerator rat) (car rat)) (define (rat-denominator rat) (cdr rat)) (define (rat-add a b) (make-rat (+ (* (rat-numerator a) (rat-denominator b)) (* (rat-numerator b) (rat-denominator a))) (* (rat-denominator a) (rat-denominator b)))) (rat-add (make-rat 1 2) (make-rat 1 3)) ``` ## Points, Segments, And Respecting Layers A point can be represented as a pair of coordinates. ```scheme (define (make-point x y) (cons x y)) (define (point-x point) (car point)) (define (point-y point) (cdr point)) ``` A line segment can be represented as a pair of points. ```scheme (define (make-segment start end) (cons start end)) (define (segment-start segment) (car segment)) (define (segment-end segment) (cdr segment)) ``` The diagram has three pairs: one segment pair and two point pairs. ```scheme (pair-diagram (root segment seg :x 110 :y 52) (cell seg :label segment :left p1 :right p2 :x 210 :y 32) (cell p1 :label start-point :left x1 :right y1 :x 70 :y 150) (cell p2 :label end-point :left x2 :right y2 :x 350 :y 150)) ``` Respecting the abstraction means that a segment procedure uses `segment-start` and `segment-end` to get points, then uses `point-x` and `point-y` to get coordinates from those points. ```scheme (define (midpoint-segment segment) (let ((start (segment-start segment)) (end (segment-end segment))) (make-point (/ (+ (point-x start) (point-x end)) 2) (/ (+ (point-y start) (point-y end)) 2)))) ``` This is a nested abstraction. A segment contains points; it is not merely a flat pile of four numbers. ## Intervals An interval is another two-part abstraction. It can use the same pair shape with different names. ```scheme (define (make-interval lower upper) (if (> lower upper) (error "make-interval: lower is greater than upper" lower upper) (cons lower upper))) (define (interval-lower interval) (car interval)) (define (interval-upper interval) (cdr interval)) (define (interval-width interval) (/ (- (interval-upper interval) (interval-lower interval)) 2)) ``` The implementation is simple, but procedures that use intervals should still use interval words. If you see `car` inside `interval-lower`, that is representation code. If you see `car` inside a procedure that is supposed to reason about interval width, that procedure is crossing the abstraction barrier. ## Pairs Do Not Have To Be Primitive Scheme provides pairs as primitive data. The constructor is `cons`; the selectors are `car` and `cdr`. The idea of a pair, however, can be described by behavior: if you construct a pair from `x` and `y`, `car` gives back `x` and `cdr` gives back `y`. With first-class procedures, we can make a pair-like value as a procedure that responds to messages. ```scheme (define (functional-cons x y) (lambda (message) (cond ((eq? message 'car) x) ((eq? message 'cdr) y) (else (error "unknown pair message" message))))) (define (functional-car pair) (pair 'car)) (define (functional-cdr pair) (pair 'cdr)) ``` This is not how Scheme implementations normally represent pairs. Real pairs are primitive for efficiency and interoperability. But this example shows a deeper point: an abstraction is defined by the operations that work on it. ```scheme (define (functional-cons x y) (lambda (message) (cond ((eq? message 'car) x) ((eq? message 'cdr) y) (else (error "unknown pair message" message))))) (define (functional-car pair) (pair 'car)) (define (functional-cdr pair) (pair 'cdr)) (functional-cdr (functional-cons 'left 'right)) ``` ## Proper Lists As Pair Chains A proper list is either the empty list or a pair whose `cdr` is a proper list. The printed list `'(a b c)` has a three-pair backbone. ```scheme (pair-diagram (root value n1 :x 38 :y 54) (cell n1 :label first-pair :left a :right n2 :x 120 :y 32) (cell n2 :label second-pair :left b :right n3 :x 270 :y 32) (cell n3 :label third-pair :left c :right empty :x 420 :y 32)) ``` This is why `car` gets the first element and `cdr` gets the rest of the list. ```scheme (car '(a b c)) ``` ```scheme (cdr '(a b c)) ``` A nested list uses more pairs. The top-level list `((a b) c (d (e f)))` has three top-level elements: - the list `(a b)` - the symbol `c` - the list `(d (e f))` Draw the three-pair top-level backbone first. Then draw the pairs needed by the sublists. ```scheme (pair-diagram (root value top1 :x 36 :y 54) (cell top1 :label top-1 :left ab1 :right top2 :x 118 :y 32) (cell top2 :label top-2 :left c :right top3 :x 278 :y 32) (cell top3 :label top-3 :left dlist1 :right empty :x 438 :y 32) (cell ab1 :label "(a b)" :left a :right ab2 :x 118 :y 170) (cell ab2 :label rest :left b :right empty :x 278 :y 170) (cell dlist1 :label "(d (e f))" :left d :right dlist2 :x 438 :y 170) (cell dlist2 :label rest :left ef1 :right empty :x 598 :y 170) (cell ef1 :label "(e f)" :left e :right ef2 :x 438 :y 308) (cell ef2 :label rest :left f :right empty :x 598 :y 308)) ``` This diagram is also a warning. When a program manipulates a list, it is often better to ask about the list structure first: is it empty, and if not, what is its first element and its rest? Only after that should you worry about whether an element is itself a list. ## Lists And Sentences Some classic Scheme courses introduce a `sentence` abstraction before exposing pairs. That abstraction is useful pedagogically because it lets early programs process flat sequences of words without thinking about `cons` cells. R7RS does not standardize `sentence`, `word`, `first`, `butfirst`, or `butlast`. This course uses standard proper lists as the baseline. When we refer to older exercises that use sentence operations, we will translate the idea carefully rather than pretending those procedures are part of R7RS. The important distinction is this: a list can contain any Scheme value, including another list. A sentence abstraction normally restricts its elements so that it stays flat. ```scheme (list 'a '(b c) 'd) ``` ## First Complete Example The published example source for this lesson checks cards, rational numbers, points, segments, intervals, functional pairs, and nested list structure. ```scheme (define (hand-total hand) (if (null? hand) 0 (+ (card-rank (first-card hand)) (hand-total (rest-cards hand))))) (define (midpoint-segment segment) (let ((start (segment-start segment)) (end (segment-end segment))) (make-point (/ (+ (point-x start) (point-x end)) 2) (/ (+ (point-y start) (point-y end)) 2)))) ``` The full source file has a highlighted view at [`/examples/r7rs/intro/lesson-04/`](/examples/r7rs/intro/lesson-04/) and a raw source link from that page. When working from the source tree used to build this site, run the example suite with: ```sh scripts/test_examples_chicken.sh ``` ## Exercises These exercises are original and deliberately small. - Define `make-book`, `book-title`, and `book-author` using pairs. - Write `same-suit?` using only `card-suit`, not `cdr`. - Rewrite `hand-total` so it uses an internal tail-recursive helper, while still using `card-rank`, `first-card`, and `rest-cards`. - Explain which procedures are below the card abstraction barrier and which are above it. - Define `rat-negate` and `rat-subtract` using the rational selectors and constructor. - Draw a pair diagram for `(cons 'x 'y)`. Is it a proper list? - Draw the pair backbone for `'(red green blue)`. - Draw the top-level backbone for `'((red green) blue)`, then add the sublist. - Explain why a segment represented as a pair of points should not be treated as four adjacent numbers. - Use `functional-cons`, `functional-car`, and `functional-cdr` to show that the pair behavior can be implemented with procedures. ## What Comes Next The next lesson is [Lesson 5: hierarchical data and tree shapes](/learn/intro-r7rs/lesson-05/). It uses the pair and list representations from this lesson to model roots, branches, leaves, and subtrees. Before moving on, make sure you can explain these ideas without guessing: - What constructor and selector mean. - Why `card-rank` can be better than writing `car` directly. - What an abstraction barrier protects. - Why the same pair shape can represent different abstract data types. - Why a line segment is naturally a pair of points, not just four numbers. - How a proper list is built from pairs and the empty list. - Why the top-level arrow in a box-and-pointer diagram matters.