P1. Expression Evaluation (Beginner)

01. Introduction

What you will learn:

  • Parsing recursive grammars.
  • Expression evaluation.

The first step in building a compiler or interpreter is to parse the source code into tree structures. Most parsing can be done recursively, and you can parse most computer languages once you learn how to do it.

Parsing is the boring part of a compiler, and most production programming languages require a lot of labor work to parse. Fortunately, there is a type of grammar that can be parsed with minimal effort: the S-expression (sexpr). We’ll use it to save the time.

02. The Sexpr

An sexpr is either a nested list enclosed in parentheses, or an indivisible element called an “atom”. Some examples:

;; an atom
a
;; lists
(a)
(a b c d)
(a ((b) c))

It naturally represents a tree, and most parsing is converting a string representation into a tree structure, so why not just use the sexpr for syntax? Especially when the sexpr is simpler and easier to parse than alternatives.

Besides the obvious tree structure, sexpr-based syntax often uses prefix operators instead of infix operators. Consider the following cases:

Infix operators:

1 + 2 * 3

Prefix operators:

(+ 1 (* 2 3))

Infix operators are complicated by different operator precedence — a problem that doesn’t exist with prefix operators. Prefix operators are also easier to handle when we need further parsing.

We’ll implement the following operators:

  1. Binary operators: (op a b).

    Arithmetics:

    • (+ 1 2)
    • (- 1 2)
    • (* 1 2)
    • (/ 1 2)

    Comparisons:

    • (lt 1 2), less than.
    • (gt 1 2), greater than.
    • (eq 1 2), equal.
    • (ne 1 2), not equal.

    Booleans:

    • (and a b)
    • (or a b)
  2. Unary operators:

    • (- a), negative.
    • (not a), boolean not.
  3. Conditional: (? expr-cond expr-then expr-else).

    e.g., (? (lt 1 2) "yes" "no") yields “yes”.

  4. (print a b c)

03. Start Coding

To parse an sexpr, the first step is to determine whether it’s an atom or a list, this is done by looking at the first non-whitespace character.

If it’s a list, parse recursively until the closing parenthesis is met.

def parse_expr(s, idx):
    idx = skip_space(s, idx)
    if s[idx] == '(':
        # a list
        idx += 1
        l = []
        while True:
            idx = skip_space(s, idx)
            if idx >= len(s):
                raise Exception('unbalanced parenthesis')
            if s[idx] == ')':
                idx += 1
                break

            idx, v = parse_expr(s, idx)
            l.append(v)
        return idx, l
    elif s[idx] == ')':
        raise Exception('bad parenthesis')
    else:
        # an atom
        start = idx
        while idx < len(s) and (not s[idx].isspace()) and s[idx] not in '()':
            idx += 1
        if start == idx:
            raise Exception('empty program')
        return idx, parse_atom(s[start:idx])

Any non-whitespace character is considered part of an atom.

def skip_space(s, idx):
    while idx < len(s) and s[idx].isspace():
        idx += 1
    return idx

Since we are going to evaluate the parsed sexpr, it makes sense to extract values such as numbers and strings from atoms.

# bool, number or string
def parse_atom(s):
    # TODO: actually implement this
    import json
    try:
        return ['val', json.loads(s)]
    except json.JSONDecodeError:
        return s

The above function is just a placeholder, we’ll ditch it in later chapters.

As we can see, the pattern of parsing is:

  1. Look at what’s at the beginning and decide on the next step.
  2. Recurse if necessary.

04. Expression Evaluation

Evaluating an sexpr is dead simple:

  1. Look at the first item on the list and decide what the operation is.
  2. Evaluate the rest of the items recursively.
  3. Perform the operation.
def pl_eval(node):
    if len(node) == 0:
        raise ValueError('empty list')

    # bool, number, string and etc
    if len(node) == 2 and node[0] == 'val':
        return node[1]

    # binary operators
    import operator
    binops = {
        '+': operator.add,
        '-': operator.sub,
        '*': operator.mul,
        '/': operator.truediv,
        'eq': operator.eq,
        'ne': operator.ne,
        'ge': operator.ge,
        'gt': operator.gt,
        'le': operator.le,
        'lt': operator.lt,
        'and': operator.and_,
        'or': operator.or_,
    }
    if len(node) == 3 and node[0] in binops:
        op = binops[node[0]]
        return op(pl_eval(node[1]), pl_eval(node[2]))

    # unary operators
    unops = {
        '-': operator.neg,
        'not': operator.not_,
    }
    if len(node) == 2 and node[0] in unops:
        op = unops[node[0]]
        return op(pl_eval(node[1]))

    # conditionals
    if len(node) == 4 and node[0] == '?':
        _, cond, yes, no = node
        if pl_eval(cond):
            return pl_eval(yes)
        else:
            return pl_eval(no)

    # print
    if node[0] == 'print':
        return print(*(pl_eval(val) for val in node[1:]))

    raise ValueError('unknown expression')

The expression evaluator can be used as an (unconventional) calculator.

def test_eval():
    def f(s):
        return pl_eval(pl_parse(s))
    assert f('1') == 1
    assert f('(+ 1 3)') == 4
    assert f('(? (lt 1 3) "yes" "no")') == "yes"
    assert f('(print 1 2 3)') is None

05. Closing Remarks

We have implemented a simple calculator, this is the basis of the next chapter project — a programming language with variables, control flows, and functions.