Skip to content
Advertisement

How does a JavaScript parser work?

I’m trying to understand how JS is actually parsed. But my searches either return some ones very vaguely documented project of a “parser/generator” (i don’t even know what that means), or how to parse JS using a JS Engine using the magical “parse” method. I don’t want to scan through a bunch of code and try all my life to understand (although i can, it would take too long).

i want to know how an arbitrary string of JS code is actually turned into objects, functions, variables etc. I also want to know the procedures, and techniques that turns that string into stuff, gets stored, referenced, executed.

Are there any documentation/references for this?

Advertisement

Answer

Parsers probably work in all sorts of ways, but fundamentally they first go through a stage of tokenisation, then give the result to the compiler, which turns it into a program if it can. For example, given:

function foo(a) {
  alert(a);
}

the parser will remove any leading whitespace to the first character, the letter “f”. It will collect characters until it gets something that doesn’t belong, the whitespace, that indicates the end of the token. It starts again with the “f” of “foo” until it gets to the “(“, so it now has the tokens “function” and “foo”. It knows “(” is a token on its own, so that’s 3 tokens. It then gets the “a” followed by “)” which are two more tokens to make 5, and so on.

The only need for whitespace is between tokens that are otherwise ambiguous (e.g. there must be either whitespace or another token between “function” and “foo”).

Once tokenisation is complete, it goes to the compiler, which sees “function” as an identifier, and interprets it as the keyword “function”. It then gets “foo”, an identifier that the language grammar tells it is the function name. Then the “(” indicates an opening grouping operator and hence the start of a formal parameter list, and so on.

Compilers may deal with tokens one at a time, or may grab them in chunks, or do all sorts of weird things to make them run faster.

You can also read How do C/C++ parsers work?, which gives a few more clues. Or just use Google.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement