P2. Variables, Control Flows, and Functions (Intermediate)
01. Introduction
In the last chapter we implemented a simple calculator. To make it look like a programming language we need to add 3 more aspects:
- Variables. For manipulating states.
- Control flows. Like conditionals and loops.
- Functions. For reusing code.
Here are some sample programs that demonstrate the syntax of our language:
;; define the function `fib` with an argument `n`
def fib (n)
(;; if-then-else
if (le n 0)
(0)
(then + n (call fib (- n 1)))))) ;; function call
(else (
5) (call fib
;; another version of the `fib` function
def fib (n) (do
(0) ;; variable declaration
(var r loop (gt n 0) (do ;; loop
(set r (+ r n)) ;; variable assignment
(set n (- n 1))
(
));; return from a function call
(return r)
))
5) (call fib
I’ve also added comments to the language. Comments handled by
augmenting the skip_space
function:
def skip_space(s, idx):
while True:
= idx
save # try to skip spaces
while idx < len(s) and s[idx].isspace():
+= 1
idx # try to skip a line comment
if idx < len(s) and s[idx] == ';':
+= 1
idx while idx < len(s) and s[idx] != '\n':
+= 1
idx # no more spaces or comments
if idx == save:
break
return idx
02. Variables
A program is a sequence of operations that manipulate states; thus,
we’ll add a new construct that performs a sequence of operations. The
do
command evaluates its arguments in order and returns the
last argument.
do a b c ...) (
And we add 2 new commands for variable declaration and assignment.
;; create a new variable with a initial value
(var a init) set a value) ;; assign a new value to a variable (
Variables are scoped — they can only be accessed by their sibling expressions. We’ll use a map to store variables while evaluating an expression.
The problem is subexpressions can define variables whose names collide with a parent scope. This is solved by using per-scope mappings rather than a global mapping.
The pl_eval
function got a new argument
(env
) for storing variables.
def pl_eval(env, node):
# read a variable
if not isinstance(node, list):
assert isinstance(node, str)
return name_loopup(env, node)[node]
...
The env
argument is a linked list, it contains the map
for the current scope and the link to the parent scope. The variable
lookup function traverses the list upwards until the name is found.
def name_loopup(env, key):
while env:
= env
current, env if key in current:
return current
raise ValueError('undefined name')
The code for evaluating a new scope: create a new map and link it to
the current scope. The then
and else
commands
are just syntactic sugars.
def pl_eval(env, node):
...# new scope
if node[0] in ('do', 'then', 'else') and len(node) > 1:
= (dict(), env)
new_env for val in node[1:]:
= pl_eval(new_env, val)
val return val
The code for the var
and set
commands is
now straightforward.
def pl_eval(env, node):
...# new variable
if node[0] == 'var' and len(node) == 3:
= node
_, name, val = env
scope, _ if name in scope:
raise ValueError('duplicated name')
= pl_eval(env, val)
val = val
scope[name] return val
# write a variable
if node[0] == 'set' and len(node) == 3:
= node
_, name, val = name_loopup(env, name)
scope = pl_eval(env, val)
val = val
scope[name] return val
03. Control Flows
The handling of if-then-else is almost the same as in the last
chapter. Except that I created a new scope before evaluating the
condition, this allows a variable declaration in the condition, for
example: (if (var aaa bbb) (then use aaa here))
. This is
also just syntactic sugar.
def pl_eval(env, node):
...# conditional
if len(node) in (3, 4) and node[0] in ('?', 'if'):
*no = node
_, cond, yes, = no[0] if no else ['val', None]
no = (dict(), env)
new_env if pl_eval(new_env, cond):
return pl_eval(new_env, yes)
else:
return pl_eval(new_env, no)
The loop
is a new command. Its evaluation translates
directly into a Python loop. Again, the new scope is for syntactic
sugar.
def pl_eval(env, node):
...# loop
if node[0] == 'loop' and len(node) == 3:
= node
_, cond, body = None
ret while True:
= (dict(), env)
new_env if not pl_eval(new_env, cond):
break
try:
= pl_eval(new_env, body)
ret except LoopBreak:
break
except LoopContinue:
continue
return ret
# break & continue
if node[0] == 'break' and len(node) == 1:
raise LoopBreak
if node[0] == 'continue' and len(node) == 1:
raise LoopContinue
We are (ab)using Python exceptions for the (break)
and
(continue)
control flows. You can also propagate them
explicitly via the return value of the pl_eval
if you don’t
like such hacks.
class LoopBreak(Exception):
def __init__(self):
super().__init__('`break` outside a loop')
class LoopContinue(Exception):
def __init__(self):
super().__init__('`continue` outside a loop')
04. Functions
The code for function definition does nothing significant. It just performs some sanity checks and puts the function name into the map. Note that I added the number of arguments to the key, this distinguishes function names from variable names and allows a form of “overloading”.
def pl_eval(env, node):
...# function definition
if node[0] == 'def' and len(node) == 4:
= node
_, name, args, body for arg_name in args:
if not isinstance(arg_name, str):
raise ValueError('bad argument name')
if len(args) != len(set(args)):
raise ValueError('duplicated arguments')
= env
dct, _ = (name, len(args))
key if key in dct:
raise ValueError('duplicated function')
= (args, body, env)
dct[key] return
Now the call
command is handled. Function arguments are
treated as new variables. Just create a new scope, put the arguments in
it, and evaluate the body.
def pl_eval(env, node):
...# function call
if node[0] == 'call' and len(node) >= 2:
*args = node
_, name, = (name, len(args))
key = name_loopup(env, key)[key]
fargs, fbody, fenv # args
= dict()
new_env for arg_name, arg_val in zip(fargs, args):
= pl_eval(env, arg_val)
new_env[arg_name] # call
try:
return pl_eval((new_env, fenv), fbody)
except FuncReturn as ret:
return ret.val
# return
if node[0] == 'return' and len(node) == 1:
raise FuncReturn(None)
if node[0] == 'return' and len(node) == 2:
= node
_, val raise FuncReturn(pl_eval(env, val))
Special care: the parent scope is not the scope of the caller, but the scope in which the function was defined.
The (return)
command is handled like the
(break)
or (continue)
.
class FuncReturn(Exception):
def __init__(self, val):
super().__init__('`return` outside a function')
self.val = val
05. Done
Congratulations. We have built an interpreter for a mini programming language.
def test_eval():
def f(s):
return pl_eval(None, pl_parse_prog(s))
assert f('''
(def fib (n)
(if (le n 0)
(then 0)
(else (+ n (call fib (- n 1))))))
(call fib 5)
''') == 5 + 4 + 3 + 2 + 1
assert f('''
(def fib (n) (do
(var r 0)
(loop (gt n 0) (do
(set r (+ r n))
(set n (- n 1))
))
(return r)
))
(call fib 5)
''') == 5 + 4 + 3 + 2 + 1
06. Closing Remarks
Our interpreter is not much more difficult than the calculator in the last chapter. Variables are solved by extra states, control flows are interpreted by Python control flows — it’s pretty much still simple recursion.
The interpreter still depends on an existing language for execution. How do we compile our language into machine code and run it natively on the CPU? That is the challenge of the next chapter.