I’m still pretty new to PEG.js, and I’m guessing this is just a beginner misunderstanding.
In trying to parse something like this:
definitions some text if some additonal text to parse here then still more text will go here
I can get a grammar to properly read the three section (to be further parsed later, of course.) But it generates that text in an odd format. For instance, in the above, “some text” turns into
[ [undefined, "s"], [undefined, "o"], [undefined, "m"], [undefined, "e"], [undefined, " "], [undefined, "t"], [undefined, "e"], [undefined, "x"], [undefined, "t"] ]
I can easily enough convert this to a plain string, but I’m wondering what I’m doing to give it that awful format. This is my grammar so far:
{ const combine = (xs) => xs .map (x => x[1]) .join('') } MainObject = _ defs:DefSection _ condition:CondSection _ consequent: ConsequentSection {return {defs, condition, consequent}} DefSection = _ "definitions"i _ defs:(!"nif" .)+ {return defs} CondSection = _ "if"i _ cond:(!"nthen" .)+ {return combine (cond)} ConsequentSection = _ "then"i _ cons:.* {return cons .join ('')} _ "whitespace" = [ tnr]*
I can fix it by replacing {return defs}
with {return combine(defs)}
as in the other sections.
My main question is simply why does it generate that output? And is there a simpler way to fix it?
Overall, as I’m still pretty new to PEG.js, and I would love to know if there is a better way to write this grammar. Expressions like (!"nif" .*)
seem fairly sketchy.
Advertisement
Answer
- Negative look ahead e.g.
!Rule
, will always return undefined, will fail if theRule
match. - The dot
.
will always match a single character. - A sequence
Rule1 Rule2 ...
will create a list with the results of each rule - A repetition
Rule+
orRule*
will matchRule
as many times as possible and create a list. (+
fails if the first attempt to match rule fails)
Your results are
[ // Start (!"nif" .) [undefined // First "nif", "s" // First . ] // first ("nif" .) , [undefined, "o"] // Second (!"nif" .) , [undefined, "m"], [undefined, "e"], [undefined, " "], [undefined, "t"], [undefined, "e"], [undefined, "x"], [undefined, "t"] ] // This list is (!"nif" .)*, all the matches of ("nif" .)
What you seem to want is to read the text instead, and you can use the operator $Rule
for this, it will return the input instead of the produced output.
MainObject = _ defs:DefSection _ condition:CondSection _ consequent: ConsequentSection {return {defs, condition, consequent}} DefSection = _ "definitions"i _ defs:$(!"nif" .)+ {return defs.trim()} CondSection = _ "if"i _ cond:$(!"nthen" .)+ {return cond.trim()} ConsequentSection = _ "then"i _ cons:$(.*) {return cons.trim()} _ "whitespace" = [ tnr]*
Will produce
{ "defs": "some text", "condition": "some additonal text to parse here", "consequent": "still more text will go here" }