P2. Variables, Control Flows, and Functions (Intermediate)
01. Introduction
In the last chapter we implemented a simple calculator. To make it look like a programming language we need to add 3 more aspects:
- Variables. For manipulating states.
- Control flows. Like conditionals and loops.
- Functions. For reusing code.
Here are some sample programs that demonstrate the syntax of our language:
;; define the function `fib` with an argument `n`
(def fib (n)
;; if-then-else
(if (le n 0)
(then 0)
(else (+ n (call fib (- n 1)))))) ;; function call
(call fib 5);; another version of the `fib` function
(def fib (n) (do
(var r 0) ;; variable declaration
(loop (gt n 0) (do ;; loop
(set r (+ r n)) ;; variable assignment
(set n (- n 1))
))
(return r) ;; return from a function call
))
(call fib 5)I’ve also added comments to the language. Comments handled by
augmenting the skip_space function:
def skip_space(s, idx):
while True:
save = idx
# try to skip spaces
while idx < len(s) and s[idx].isspace():
idx += 1
# try to skip a line comment
if idx < len(s) and s[idx] == ';':
idx += 1
while idx < len(s) and s[idx] != '\n':
idx += 1
# no more spaces or comments
if idx == save:
break
return idx02. Variables
A program is a sequence of operations that manipulate states; thus,
we’ll add a new construct that performs a sequence of operations. The
do command evaluates its arguments in order and returns the
last argument.
(do a b c ...)And we add 2 new commands for variable declaration and assignment.
(var a init) ;; create a new variable with a initial value
(set a value) ;; assign a new value to a variableVariables are scoped — they can only be accessed by their sibling expressions. We’ll use a map to store variables while evaluating an expression.
The problem is subexpressions can define variables whose names collide with a parent scope. This is solved by using per-scope mappings rather than a global mapping.
The pl_eval function got a new argument
(env) for storing variables.
def pl_eval(env, node):
# read a variable
if not isinstance(node, list):
assert isinstance(node, str)
return name_loopup(env, node)[node]
...The env argument is a linked list, it contains the map
for the current scope and the link to the parent scope. The variable
lookup function traverses the list upwards until the name is found.
def name_loopup(env, key):
while env:
current, env = env
if key in current:
return current
raise ValueError('undefined name')The code for evaluating a new scope: create a new map and link it to
the current scope. The then and else commands
are just syntactic sugars.
def pl_eval(env, node):
...
# new scope
if node[0] in ('do', 'then', 'else') and len(node) > 1:
new_env = (dict(), env)
for val in node[1:]:
val = pl_eval(new_env, val)
return valThe code for the var and set commands is
now straightforward.
def pl_eval(env, node):
...
# new variable
if node[0] == 'var' and len(node) == 3:
_, name, val = node
scope, _ = env
if name in scope:
raise ValueError('duplicated name')
val = pl_eval(env, val)
scope[name] = val
return val
# write a variable
if node[0] == 'set' and len(node) == 3:
_, name, val = node
scope = name_loopup(env, name)
val = pl_eval(env, val)
scope[name] = val
return val03. Control Flows
The handling of if-then-else is almost the same as in the last
chapter. Except that I created a new scope before evaluating the
condition, this allows a variable declaration in the condition, for
example: (if (var aaa bbb) (then use aaa here)). This is
also just syntactic sugar.
def pl_eval(env, node):
...
# conditional
if len(node) in (3, 4) and node[0] in ('?', 'if'):
_, cond, yes, *no = node
no = no[0] if no else ['val', None]
new_env = (dict(), env)
if pl_eval(new_env, cond):
return pl_eval(new_env, yes)
else:
return pl_eval(new_env, no)The loop is a new command. Its evaluation translates
directly into a Python loop. Again, the new scope is for syntactic
sugar.
def pl_eval(env, node):
...
# loop
if node[0] == 'loop' and len(node) == 3:
_, cond, body = node
ret = None
while True:
new_env = (dict(), env)
if not pl_eval(new_env, cond):
break
try:
ret = pl_eval(new_env, body)
except LoopBreak:
break
except LoopContinue:
continue
return ret
# break & continue
if node[0] == 'break' and len(node) == 1:
raise LoopBreak
if node[0] == 'continue' and len(node) == 1:
raise LoopContinueWe are (ab)using Python exceptions for the (break) and
(continue) control flows. You can also propagate them
explicitly via the return value of the pl_eval if you don’t
like such hacks.
class LoopBreak(Exception):
def __init__(self):
super().__init__('`break` outside a loop')
class LoopContinue(Exception):
def __init__(self):
super().__init__('`continue` outside a loop')04. Functions
The code for function definition does nothing significant. It just performs some sanity checks and puts the function name into the map. Note that I added the number of arguments to the key, this distinguishes function names from variable names and allows a form of “overloading”.
def pl_eval(env, node):
...
# function definition
if node[0] == 'def' and len(node) == 4:
_, name, args, body = node
for arg_name in args:
if not isinstance(arg_name, str):
raise ValueError('bad argument name')
if len(args) != len(set(args)):
raise ValueError('duplicated arguments')
dct, _ = env
key = (name, len(args))
if key in dct:
raise ValueError('duplicated function')
dct[key] = (args, body, env)
returnNow the call command is handled. Function arguments are
treated as new variables. Just create a new scope, put the arguments in
it, and evaluate the body.
def pl_eval(env, node):
...
# function call
if node[0] == 'call' and len(node) >= 2:
_, name, *args = node
key = (name, len(args))
fargs, fbody, fenv = name_loopup(env, key)[key]
# args
new_env = dict()
for arg_name, arg_val in zip(fargs, args):
new_env[arg_name] = pl_eval(env, arg_val)
# call
try:
return pl_eval((new_env, fenv), fbody)
except FuncReturn as ret:
return ret.val
# return
if node[0] == 'return' and len(node) == 1:
raise FuncReturn(None)
if node[0] == 'return' and len(node) == 2:
_, val = node
raise FuncReturn(pl_eval(env, val))Special care: the parent scope is not the scope of the caller, but the scope in which the function was defined.
The (return) command is handled like the
(break) or (continue).
class FuncReturn(Exception):
def __init__(self, val):
super().__init__('`return` outside a function')
self.val = val05. Done
Congratulations. We have built an interpreter for a mini programming language.
def test_eval():
def f(s):
return pl_eval(None, pl_parse_prog(s))
assert f('''
(def fib (n)
(if (le n 0)
(then 0)
(else (+ n (call fib (- n 1))))))
(call fib 5)
''') == 5 + 4 + 3 + 2 + 1
assert f('''
(def fib (n) (do
(var r 0)
(loop (gt n 0) (do
(set r (+ r n))
(set n (- n 1))
))
(return r)
))
(call fib 5)
''') == 5 + 4 + 3 + 2 + 106. Closing Remarks
Our interpreter is not much more difficult than the calculator in the last chapter. Variables are solved by extra states, control flows are interpreted by Python control flows — it’s pretty much still simple recursion.
The interpreter still depends on an existing language for execution. How do we compile our language into machine code and run it natively on the CPU? That is the challenge of the next chapter.