Derivation of the Y-Combinator

Douglas Blank

Based on work by Jim Marshall

This is the derivation of the applicative-order Y-combinator from scratch, in Scheme. The following derivation is similar in flavor to the derivation found in The Little LISPer by Friedman/Felleisen, but uses a slightly different starting approach (for one thing, I begin with the "length" function). Maybe this version will be a little easier to follow.

Step 0. We wish to write the recursive function "length" without having to give it a name. Then we could invoke it on a list as follows:

((lambda (l)
   (if (null? l)
       0
       (+ 1 (??? (cdr l)))))
 '(any old list you like))

The problem, of course, is that we can't plug in the function expression itself directly in place of the ??? because that immediately leads to an infinite regress. It's like trying to quote an entire sentence inside the sentence itself.

Step 1. This is the only step in the entire derivation that requires any real thought. We can get around the above problem by passing in a copy of the length function as an extra argument and then using that in place of the ???. But we must ensure that the copy of our function looks exactly like the function itself at all times. WE WILL ADHERE TO THIS REQUIREMENT IN EVERY STEP THAT FOLLOWS. Notice that since f is a copy of the function, and the function takes a copy of itself as a second argument, we must also pass f to f (as a second argument). Passing a copy of the function to itself is the whole secret of the Y-combinator. The following expression will evaluate to 5 (the length of the example list):

In [10]:
((lambda (l f)
   (if (null? l)
       0
       (+ 1 (f (cdr l) f))))
 '(any old list you like)
 (lambda (l f)
   (if (null? l)
       0
       (+ 1 (f (cdr l) f)))))
Out[10]:
5

Step 2. We just switch the order of the function arguments, l and f. "(f (cdr l) f)" changes to "(f f (cdr l))", and the arguments to the top-level invocation also switch places:

In [11]:
((lambda (f l)
   (if (null? l)
       0
       (+ 1 (f f (cdr l)))))
 (lambda (f l)
   (if (null? l)
       0
       (+ 1 (f f (cdr l)))))
 '(any old list you like))
Out[11]:
5

Step 3. We simply curry the function so that it takes its two arguments one at a time. Note the extra set of ()'s around the top-level invocation as well. This expression still evaluates to 5:

In [12]:
(((lambda (f)
    (lambda (l)
      (if (null? l)
          0
          (+ 1 ((f f) (cdr l))))))
  (lambda (f)
    (lambda (l)
      (if (null? l)
          0
          (+ 1 ((f f) (cdr l)))))))
 '(any old list you like))
Out[12]:
5

Step 4. The above expression is now of the form "([function] '(any old list you like))", where [function] is now a self-contained recursive version of "length", although still in a clumsy form. We can forget about '(any old list you like) for the remainder of the derivation and concentrate on just the [function] part, since that's what we're interested in. So here it is by itself:

In [13]:
((lambda (f)
   (lambda (l)
     (if (null? l)
         0
         (+ 1 ((f f) (cdr l))))))
 (lambda (f)
   (lambda (l)
     (if (null? l)
         0
         (+ 1 ((f f) (cdr l)))))))
Out[13]:
#<procedure>

Step 5. Notice that in the above expression (f f) returns a function which gets applied to (cdr l). In the same way that the + 1 function is equivalent to the function (lambda (a) (+ 1 a)), the "(f f) function" is equivalent to the function (lambda (a) ((f f) a)). [This is just an inverse eta step to you lambda-calculus pros out there]. This step is necessary to avoid infinite loops, since we're assuming applicative order (i.e, Scheme).

In [14]:
((lambda (f)
   (lambda (l)
     (if (null? l)
         0
         (+ 1 ((lambda (a) ((f f) a)) (cdr l))))))
 (lambda (f)
   (lambda (l)
     (if (null? l)
         0
         (+ 1 ((lambda (a) ((f f) a)) (cdr l)))))))
Out[14]:
#<procedure>

Step 6. Here we just give the (lambda (a) ((f f) a)) function the name "r" using a let-expression. Simple. (Notice how every change to our function requires an identical change to the copy of the function, as mentioned earlier).

In [15]:
((lambda (f)
   (let ([r (lambda (a) ((f f) a))])
     (lambda (l)
       (if (null? l)
           0
           (+ 1 (r (cdr l)))))))
 (lambda (f)
   (let ([r (lambda (a) ((f f) a))])
     (lambda (l)
       (if (null? l)
           0
           (+ 1 (r (cdr l))))))))
Out[15]:
#<procedure>

Step 7. Now we just expand the let-expressions into their equivalent lambda-forms. In general, "(let ([x val]) body)" is equivalent to "((lambda (x) body) val)". This may look complicated but it's not. Just match up the ()'s carefully.

In [16]:
((lambda (f)
   ((lambda (r)
      (lambda (l)
        (if (null? l)
            0
            (+ 1 (r (cdr l))))))
    (lambda (a) ((f f) a))))
 (lambda (f)
   ((lambda (r)
      (lambda (l)
        (if (null? l)
            0
            (+ 1 (r (cdr l))))))
    (lambda (a) ((f f) a)))))
Out[16]:
#<procedure>

Step 8. Now we can give the (lambda (r) (lambda (l) ...)) expression a name also ("m") using a let-expression, since it has no free variables (except for primitives, but they're bound globally anyway). This step is just like Step 6.

In [17]:
(let ([m (lambda (r)
           (lambda (l)
             (if (null? l)
                 0
                 (+ 1 (r (cdr l))))))])
  ((lambda (f)
     (m (lambda (a) ((f f) a))))
   (lambda (f)
     (m (lambda (a) ((f f) a))))))
Out[17]:
#<procedure>

Step 9. Now we replace the let-expression for "m" by its equivalent lambda-form, just like in Step 7, and out pops the applicative-order Y-combinator! The expression below still represents the self-contained recursive length function, but now it's in a nicer form. In particular, the (lambda (m) ...) sub-expression is Y:

In [18]:
((lambda (m)
   ((lambda (f) (m (lambda (a) ((f f) a))))
    (lambda (f) (m (lambda (a) ((f f) a))))))
 (lambda (r)
   (lambda (l)
     (if (null? l)
         0
         (+ 1 (r (cdr l)))))))
Out[18]:
#<procedure>

Step 10. We just pull out the (lambda (m) ...) sub-expression and call it Y, since all of its variables are bound (after all, it's a combinator). Then the expression above for the recursive length function can be rewritten as shown below. The expression passed to Y is a "template" for the recursive length function. Instead of "???", we call the recursive invocation "r", wrap the whole thing with (lambda (r) ...), and hand it over to Y, which returns a self-contained recursive function. You can give it a name with define if you want, but you don't have to.

In [19]:
(define Y
  (lambda (m)
    ((lambda (f) (m (lambda (a) ((f f) a))))
     (lambda (f) (m (lambda (a) ((f f) a)))))))
In [20]:
(Y (lambda (r)
     (lambda (l)
       (if (null? l)
           0
           (+ 1 (r (cdr l)))))))
Out[20]:
#<procedure>
In [21]:
((Y (lambda (r)
     (lambda (l)
       (if (null? l)
           0
           (+ 1 (r (cdr l))))))) '(any old list you like))
Out[21]:
5