P2. Variables, Control Flows, and Functions (Intermediate)

01. Introduction

In the last chapter we implemented a simple calculator. To make it look like a programming language we need to add 3 more aspects:

  1. Variables. For manipulating states.
  2. Control flows. Like conditionals and loops.
  3. Functions. For reusing code.

Here are some sample programs that demonstrate the syntax of our language:

;; define the function `fib` with an argument `n`
(def fib (n)
    ;; if-then-else
    (if (le n 0)
        (then 0)
        (else (+ n (call fib (- n 1))))))   ;; function call

(call fib 5)
;; another version of the `fib` function
(def fib (n) (do
    (var r 0)           ;; variable declaration
    (loop (gt n 0) (do  ;; loop
        (set r (+ r n)) ;; variable assignment
        (set n (- n 1))
    ))
    (return r)          ;; return from a function call
))

(call fib 5)

I’ve also added comments to the language. Comments handled by augmenting the skip_space function:

def skip_space(s, idx):
    while True:
        save = idx
        # try to skip spaces
        while idx < len(s) and s[idx].isspace():
            idx += 1
        # try to skip a line comment
        if idx < len(s) and s[idx] == ';':
            idx += 1
            while idx < len(s) and s[idx] != '\n':
                idx += 1
        # no more spaces or comments
        if idx == save:
            break
    return idx

02. Variables

A program is a sequence of operations that manipulate states; thus, we’ll add a new construct that performs a sequence of operations. The do command evaluates its arguments in order and returns the last argument.

(do a b c ...)

And we add 2 new commands for variable declaration and assignment.

(var a init)    ;; create a new variable with a initial value
(set a value)   ;; assign a new value to a variable

Variables are scoped — they can only be accessed by their sibling expressions. We’ll use a map to store variables while evaluating an expression.

The problem is subexpressions can define variables whose names collide with a parent scope. This is solved by using per-scope mappings rather than a global mapping.

The pl_eval function got a new argument (env) for storing variables.

def pl_eval(env, node):
    # read a variable
    if not isinstance(node, list):
        assert isinstance(node, str)
        return name_loopup(env, node)[node]
    ...

The env argument is a linked list, it contains the map for the current scope and the link to the parent scope. The variable lookup function traverses the list upwards until the name is found.

def name_loopup(env, key):
    while env:
        current, env = env
        if key in current:
            return current
    raise ValueError('undefined name')

The code for evaluating a new scope: create a new map and link it to the current scope. The then and else commands are just syntactic sugars.

def pl_eval(env, node):
    ...
    # new scope
    if node[0] in ('do', 'then', 'else') and len(node) > 1:
        new_env = (dict(), env)
        for val in node[1:]:
            val = pl_eval(new_env, val)
        return val

The code for the var and set commands is now straightforward.

def pl_eval(env, node):
    ...
    # new variable
    if node[0] == 'var' and len(node) == 3:
        _, name, val = node
        scope, _ = env
        if name in scope:
            raise ValueError('duplicated name')
        val = pl_eval(env, val)
        scope[name] = val
        return val
    # write a variable
    if node[0] == 'set' and len(node) == 3:
        _, name, val = node
        scope = name_loopup(env, name)
        val = pl_eval(env, val)
        scope[name] = val
        return val

03. Control Flows

The handling of if-then-else is almost the same as in the last chapter. Except that I created a new scope before evaluating the condition, this allows a variable declaration in the condition, for example: (if (var aaa bbb) (then use aaa here)). This is also just syntactic sugar.

def pl_eval(env, node):
    ...
    # conditional
    if len(node) in (3, 4) and node[0] in ('?', 'if'):
        _, cond, yes, *no = node
        no = no[0] if no else ['val', None]
        new_env = (dict(), env)
        if pl_eval(new_env, cond):
            return pl_eval(new_env, yes)
        else:
            return pl_eval(new_env, no)

The loop is a new command. Its evaluation translates directly into a Python loop. Again, the new scope is for syntactic sugar.

def pl_eval(env, node):
    ...
    # loop
    if node[0] == 'loop' and len(node) == 3:
        _, cond, body = node
        ret = None
        while True:
            new_env = (dict(), env)
            if not pl_eval(new_env, cond):
                break
            try:
                ret = pl_eval(new_env, body)
            except LoopBreak:
                break
            except LoopContinue:
                continue
        return ret
    # break & continue
    if node[0] == 'break' and len(node) == 1:
        raise LoopBreak
    if node[0] == 'continue' and len(node) == 1:
        raise LoopContinue

We are (ab)using Python exceptions for the (break) and (continue) control flows. You can also propagate them explicitly via the return value of the pl_eval if you don’t like such hacks.

class LoopBreak(Exception):
    def __init__(self):
        super().__init__('`break` outside a loop')

class LoopContinue(Exception):
    def __init__(self):
        super().__init__('`continue` outside a loop')

04. Functions

The code for function definition does nothing significant. It just performs some sanity checks and puts the function name into the map. Note that I added the number of arguments to the key, this distinguishes function names from variable names and allows a form of “overloading”.

def pl_eval(env, node):
    ...
    # function definition
    if node[0] == 'def' and len(node) == 4:
        _, name, args, body = node
        for arg_name in args:
            if not isinstance(arg_name, str):
                raise ValueError('bad argument name')
        if len(args) != len(set(args)):
            raise ValueError('duplicated arguments')
        dct, _ = env
        key = (name, len(args))
        if key in dct:
            raise ValueError('duplicated function')
        dct[key] = (args, body, env)
        return

Now the call command is handled. Function arguments are treated as new variables. Just create a new scope, put the arguments in it, and evaluate the body.

def pl_eval(env, node):
    ...
    # function call
    if node[0] == 'call' and len(node) >= 2:
        _, name, *args = node
        key = (name, len(args))
        fargs, fbody, fenv = name_loopup(env, key)[key]
        # args
        new_env = dict()
        for arg_name, arg_val in zip(fargs, args):
            new_env[arg_name] = pl_eval(env, arg_val)
        # call
        try:
            return pl_eval((new_env, fenv), fbody)
        except FuncReturn as ret:
            return ret.val
    # return
    if node[0] == 'return' and len(node) == 1:
        raise FuncReturn(None)
    if node[0] == 'return' and len(node) == 2:
        _, val = node
        raise FuncReturn(pl_eval(env, val))

Special care: the parent scope is not the scope of the caller, but the scope in which the function was defined.

The (return) command is handled like the (break) or (continue).

class FuncReturn(Exception):
    def __init__(self, val):
        super().__init__('`return` outside a function')
        self.val = val

05. Done

Congratulations. We have built an interpreter for a mini programming language.

def test_eval():
    def f(s):
        return pl_eval(None, pl_parse_prog(s))
    assert f('''
        (def fib (n)
            (if (le n 0)
                (then 0)
                (else (+ n (call fib (- n 1))))))
        (call fib 5)
    ''') == 5 + 4 + 3 + 2 + 1
    assert f('''
        (def fib (n) (do
            (var r 0)
            (loop (gt n 0) (do
                (set r (+ r n))
                (set n (- n 1))
            ))
            (return r)
        ))
        (call fib 5)
    ''') == 5 + 4 + 3 + 2 + 1

06. Closing Remarks

Our interpreter is not much more difficult than the calculator in the last chapter. Variables are solved by extra states, control flows are interpreted by Python control flows — it’s pretty much still simple recursion.

The interpreter still depends on an existing language for execution. How do we compile our language into machine code and run it natively on the CPU? That is the challenge of the next chapter.