Skip to content
Advertisement

Why does an expression like `(!”foo” .*)` generate arrays of `[undefined, char]`-values in PEG.js

I’m still pretty new to PEG.js, and I’m guessing this is just a beginner misunderstanding.

In trying to parse something like this:

JavaScript

I can get a grammar to properly read the three section (to be further parsed later, of course.) But it generates that text in an odd format. For instance, in the above, “some text” turns into

JavaScript

I can easily enough convert this to a plain string, but I’m wondering what I’m doing to give it that awful format. This is my grammar so far:

JavaScript

I can fix it by replacing {return defs} with {return combine(defs)} as in the other sections.

My main question is simply why does it generate that output? And is there a simpler way to fix it?


Overall, as I’m still pretty new to PEG.js, and I would love to know if there is a better way to write this grammar. Expressions like (!"nif" .*) seem fairly sketchy.

Advertisement

Answer

  1. Negative look ahead e.g. !Rule, will always return undefined, will fail if the Rule match.
  2. The dot . will always match a single character.
  3. A sequence Rule1 Rule2 ... will create a list with the results of each rule
  4. A repetition Rule+ or Rule* will match Rule as many times as possible and create a list. (+ fails if the first attempt to match rule fails)

Your results are

JavaScript

What you seem to want is to read the text instead, and you can use the operator $Rule for this, it will return the input instead of the produced output.

JavaScript

Will produce

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement