P1. Expression Evaluation (Beginner)
01. Introduction
What you will learn:
- Parsing recursive grammars.
- Expression evaluation.
The first step in building a compiler or interpreter is to parse the source code into tree structures. Most parsing can be done recursively, and you can parse most computer languages once you learn how to do it.
Parsing is the boring part of a compiler, and most production programming languages require a lot of labor work to parse. Fortunately, there is a type of grammar that can be parsed with minimal effort: the S-expression (sexpr). We’ll use it to save the time.
02. The Sexpr
An sexpr is either a nested list enclosed in parentheses, or an indivisible element called an “atom”. Some examples:
;; an atom
a
;; lists
(a)
(a b c d)
(a ((b) c))
It naturally represents a tree, and most parsing is converting a string representation into a tree structure, so why not just use the sexpr for syntax? Especially when the sexpr is simpler and easier to parse than alternatives.
Besides the obvious tree structure, sexpr-based syntax often uses prefix operators instead of infix operators. Consider the following cases:
Infix operators:
1 + 2 * 3
Prefix operators:
(+ 1 (* 2 3))
Infix operators are complicated by different operator precedence — a problem that doesn’t exist with prefix operators. Prefix operators are also easier to handle when we need further parsing.
We’ll implement the following operators:
Binary operators:
(op a b)
.Arithmetics:
(+ 1 2)
(- 1 2)
(* 1 2)
(/ 1 2)
Comparisons:
(lt 1 2)
, less than.(gt 1 2)
, greater than.(eq 1 2)
, equal.(ne 1 2)
, not equal.
Booleans:
(and a b)
(or a b)
Unary operators:
(- a)
, negative.(not a)
, boolean not.
Conditional:
(? expr-cond expr-then expr-else)
.e.g.,
(? (lt 1 2) "yes" "no")
yields “yes”.(print a b c)
03. Start Coding
To parse an sexpr, the first step is to determine whether it’s an atom or a list, this is done by looking at the first non-whitespace character.
If it’s a list, parse recursively until the closing parenthesis is met.
def parse_expr(s, idx):
= skip_space(s, idx)
idx if s[idx] == '(':
# a list
+= 1
idx = []
l while True:
= skip_space(s, idx)
idx if idx >= len(s):
raise Exception('unbalanced parenthesis')
if s[idx] == ')':
+= 1
idx break
= parse_expr(s, idx)
idx, v
l.append(v)return idx, l
elif s[idx] == ')':
raise Exception('bad parenthesis')
else:
# an atom
= idx
start while idx < len(s) and (not s[idx].isspace()) and s[idx] not in '()':
+= 1
idx if start == idx:
raise Exception('empty program')
return idx, parse_atom(s[start:idx])
Any non-whitespace character is considered part of an atom.
def skip_space(s, idx):
while idx < len(s) and s[idx].isspace():
+= 1
idx return idx
Since we are going to evaluate the parsed sexpr, it makes sense to extract values such as numbers and strings from atoms.
# bool, number or string
def parse_atom(s):
# TODO: actually implement this
import json
try:
return ['val', json.loads(s)]
except json.JSONDecodeError:
return s
The above function is just a placeholder, we’ll ditch it in later chapters.
As we can see, the pattern of parsing is:
- Look at what’s at the beginning and decide on the next step.
- Recurse if necessary.
04. Expression Evaluation
Evaluating an sexpr is dead simple:
- Look at the first item on the list and decide what the operation is.
- Evaluate the rest of the items recursively.
- Perform the operation.
def pl_eval(node):
if len(node) == 0:
raise ValueError('empty list')
# bool, number, string and etc
if len(node) == 2 and node[0] == 'val':
return node[1]
# binary operators
import operator
= {
binops '+': operator.add,
'-': operator.sub,
'*': operator.mul,
'/': operator.truediv,
'eq': operator.eq,
'ne': operator.ne,
'ge': operator.ge,
'gt': operator.gt,
'le': operator.le,
'lt': operator.lt,
'and': operator.and_,
'or': operator.or_,
}if len(node) == 3 and node[0] in binops:
= binops[node[0]]
op return op(pl_eval(node[1]), pl_eval(node[2]))
# unary operators
= {
unops '-': operator.neg,
'not': operator.not_,
}if len(node) == 2 and node[0] in unops:
= unops[node[0]]
op return op(pl_eval(node[1]))
# conditionals
if len(node) == 4 and node[0] == '?':
= node
_, cond, yes, no if pl_eval(cond):
return pl_eval(yes)
else:
return pl_eval(no)
# print
if node[0] == 'print':
return print(*(pl_eval(val) for val in node[1:]))
raise ValueError('unknown expression')
The expression evaluator can be used as an (unconventional) calculator.
def test_eval():
def f(s):
return pl_eval(pl_parse(s))
assert f('1') == 1
assert f('(+ 1 3)') == 4
assert f('(? (lt 1 3) "yes" "no")') == "yes"
assert f('(print 1 2 3)') is None
05. Closing Remarks
We have implemented a simple calculator, this is the basis of the next chapter project — a programming language with variables, control flows, and functions.