If you just want to use m JavaScript interpreter, see Using
the JavaScript Debug Interpreter.
For a deep theoretical understanding of how to program interpreters and compilers, you should read my book (with James Peters), The Art of Compiler Design (Prentice Hall, ISBN 0-13-048190-4). There are additional links here. This introduction will touch on a few aspects, as relevant to the JavaScript interpreter written in JavaScript. For line numbers see the file JSdoingJS.html, where the interpreter interprets itself, with line numbers in comments every eight lines.
A compiler is a computer program that translates computer programs from a high-level programming language like Java or C++ to the ones and zeros of machine language, or else to another programming language. The output from a compiler is another computer program that does the same thing. An interpreter is like a compiler in construction, and does many of the same things, but instead of producing a program to do the same thing, it just does it. Consider this simple program, in English-like HyperCard:
put 123 into myVariableWe could compile this program snippet into JavaScript (or C) and the result would look like this:
put 7 into ink
repeat 5 times
add ink to myVariable
add 1 to ink
end repeat
divide myVariable by 9
myVariable = 123;An interpreter, on the other hand would create a variable "myVariable" somewhere in its memory, set that variable to 123, create another variable "ink" with the value 7, then repeatedly add the value of variable "ink" to variable "myVariable" each time incrementing ink, then finally divide myVariable by 9 and put the quotient back into myVariable. There is no translation going on, the interpreter knows how to do each step and does it immediately.
ink = 7;
for (i=0; i<5; i++)
myVariable += ink++;
myVariable = myVariable/9;
Both interpreters and compilers need to correctly understand what the
source program means (was intended to do, assuming there are no bugs),
but the compiler then says the same thing in another language, while the
interpreter does that action immediately. Compilers often try to produce
an efficient output program, as for example using the auto-post-increment
on the variable ink above, but interpreters often don't worry
much about efficiency. Both interpreters and compilers are able to detect
when a source program fails to conform to the rules of its language, and
issue error messages in those cases.
prog -> func*This looks a little like a programming language itself -- and in my book it is! -- but the formality helps us make fewer mistakes in defining how the compiler or interpreter must work. In this partial grqammar, a program consists of zero or more functions (the star means any number of whatever it follows; the question mark means zero or one, but not more). A function starts with the keyword "function" then a name (identifier, not specified in this grammar fragment), then a left parenthesis, some parameters (again not specified: in a full grammar, everything is spelled out), a right parenthesis, then a block of statements enclosed in braces. Every item in quotes is required exactly as spelled.
func -> "function" ident "(" params ")" blok
blok -> "{" stmt* "}"
stmt -> blok
-> "if" "(" expn ")" stmt ( "else" stmt )?
-> "while" "(" expn ")" stmt
-> expn ";"
-> ";"
expn -> (lexp "=")? cmpr
cmpr -> smpl (("==" | "!=" | ">" | "<") smpl )?
smpl -> term (("+" | "-") term )*
term -> fact (("*" | "/") fact )*
fact -> "(" expn ")"
-> "-" fact
-> lexp | num | quote
-> ident "(" args ")"
lexp -> ident ( "[" expn "]" )?
What is less obvious in this grammar is that the operator precedence is also fully spelled out. A simple expression is a sequence of one or more terms separated by plusses and minuses; the terms are each a sequence of one or more factors separated by multiply and divide operators. Thus multiplication and division takes precedence over (is done first, before) addition and subtraction.
Furthermore, because a single line in the grammar contains both the left and right parentheses in each case, a correct program cannot mismatch them, they are both required when that grammar line is activated.
With each line of this grammar, a compiler would associate the appropriate output code to generate for that construct. Conversely, with the same grammar, an interpreter would spell out what actions to take. The grammar defines the language, not what is to be done for it. In my book, I extended the grammar to specify the actions also.
The easy way to convert a grammar into a program for compiling or interpreting the language is to make each nonterminal (the left side of an arroe in a grammar rule) into a function. Stars turn into while loops; question marks, multiple arrows, and vertical alternation bars turn into if statements. Quoted items test the source code for that exact string of characters, and fail if it doesn't match. Sometimes failure is OK if there is another alternation to test; otherwise (for example, a missing right parenthesis) results in an error if it fails.
Notice that the grammar is recursive. A block can have any number of
statements inside its braces, and any of those statements might be another
block. Most programming languages allow functional recursion. JavaScript
does too, but not mutual recursion as we need here (there is a way, but
it's different from C). I solved that in my interpreter by making one giant
"expn" function for all the recursive rules -- and then ran into
the recursion limit. Recursion can be simulated using a stack, and
JavaScript has explicit stack push and pop methods, so
my interpreter now does the recursion that way.
Let's look at how a stack enforces operator precedence in expression evaluation. Suppose we have an expression to evaluate:
3+4*5-(6+1) // []Some of these numbers could be variable references or function calls, but that complication does not change the way things are done. We initially have an empty stack, represented here within the brackets. The interpreter calls expn, which calls each nonterminal in turn down to fact, which recognizes a number, and pushes it onto the stack:
3+4*5-(6+1) // [3] -- expn,cmpr,smpl,term,factThe scanner looking at the source program is now ready to process the plus (in red). Nonterminal fact has nothing else to do so it returns to term, which is hoping for an optional star or slash, but finding neither, returns to smpl, which is hoping for a plus or minus. It accepts and remembers the plus, then immediately calls term a second time which calls fact to accept and push the four:
3+4*5-(6+1) // [4,3] -- expn,cmpr,smpl,termNow term, which looks again for a star or slash, finds it and immediately turns around and calls fact for another factor:
3+4*5-(6+1) // [5,4,3] -- expn,cmpr,smpl,termBack in term, the remembered star reminds it to pop off the top two items from the stack, multiply them together, then push the product:
3+4*5-(6+1) // [20,3] -- expn,cmpr,smpl,termThere is nothing else in the input for term to do anything with, so it returns to smpl, which has a remembered plus to work on. It pops off two numbers, adds them, and pushes the sum. Then it loops back around to see and accept the minus, then calls term, which calls fact, which sees and accepts the left parenthesis and calls expn recursively:
3+4*5-(6+1) // [23] -- expn,cmpr,smpl,term,fact,expnThis continues down to fact to accept and push the 6:
3+4*5-(6+1) // [6,23] -- expn,cmpr,smpl,term,fact,expn,cmpr,smplBack out to smpl, the next plus is accepted, which sernds us -- you see where this is going -- all the way back down to fact for the one, which is added in smpl. Now smpl does not know anything about right parentheses, so it returns to cmpr, to expn, and finally back to fact, which does want that right parenthesis, and consumes it before returning back out to smpl to subtract the 7 from the 23, leaving 16 on the stack at the final exit.
Yes, that's a lot of work, but computers are fast. In a compiler, all
this running up and down the nonterminal chain happens at compile time.
Instead of pushing values onto some stack, it generates code to load them
into a register, so all we are left with is a few loads and adds and a
multiply. But we are doing an interpreter, which is slower. The names are
different -- JavaScript has a lot more precedence levels, so I numbered
them, p2 through p11. In the most detailed logging mode
(=7) you can see them grow and shrink on the InfoStak in the trace.
0 Main Green + Side RedThe default state is 0. Every second the input is examined for cars waiting on the side street, and if so (and no cars on the main street), it goes to state 1 and starts the 5-second timer. When that time runs out, it goes to state 2. When there are no cars on the side street, it goes to state 3 and starts the timer, then advances to state 0 when it runs out. The timer is often used also in states 0 and 2, but set to 30 seconds (or some such), so that even if cars continue to arrive on the side street sensor, if there are cars on the main street and the time has run out, it goes to the next state anyway, and vice-versa. Each state looks only at its own state number and the three input signals (main & side sensors, timer expired) to decide when or if to change state.
1 Main Yellow + Side Red
2 Main Red + Side Green
3 Main Red + Side Yellow
My interpreter has some eighty states, and its input tape is the source
code of the program being interpreted, which is further simplified
by identifying tokens consisting of one name or reserved word or operator
or literal. If the state is looking for that token, it accepts it (calls
internal function GeToken, line 241) and then advances to some
other state. The actions to be taken in each state are selected by a giant
switch
statement (line 674), which is embedded in a while(true) loop
(line 649 through 1781). Nonterminal calls and returns are simulated by
breaking out of the switch with a new state number in variable GoNext
(or zero to return); continuing the loop using the continue command
stays in the same nonterminal but a new state.
So instead, I have a "skipping" mode which suppresses execution, but continues to parse for correct syntax. When you execute an if-statement, if the condition is false execution is turned off, and then back on when it finishes the statement. At the else-part, the execution is toggled, unless it was already off before entering the if. Breaking out of a while-loop similarly turns off execution until the end of the loop body. Because interpretation proceeds left-to-right, we can record the front of loops in a local variable Breakable, so continue can jump directly there. Function calls work similarly. When making a function call, the current interpretation position is recorded, so the function exit or return can come back here. The processing stack and the variable names are also checkpointed, then cut back on function exit.
The switch statement is the most complicated of all, and I did not even attempt it until the rest of the interpreter was working. The switch expression is evaluated, then instead of a jump table (which is how hardware computers and other interpreters do it), I run through a loop processing statements with execution turned off, but looking for case (or default) lines. The first match turns execution back on and depends on break to turn it off at the end of the case. Because continue can refer to an outer loop, I also had to add a separate inLoop variable to retain access to both kinds of jump. Note that these variables are pushed onto the stack whenever a new loop or switch is started, or a function is called, and then restored at exit. Thus break always breaks out of the innermost loop or switch, and is disabled (reported as an error) if you try to break out of a function called from within a loop.
The interpreter is pretty complicated, and you should not expect to understand all of it the first day. Much of that complexity is there to facilitate debugging -- both for me, trying to make the interpreter work properly, and subsequently you, trying to make your interpreted scripts work. Take your time studying it: you might even find a bug or two (please tell me).
Enjoy!
Tom Pittman
rev. 2010 December 24