I’m currently working on an Adobe inDesign script, part of which is a function that finds measurements and picks them apart. I have a set of regexes that are run first using inDesign’s findGrep()
(not really relevant here), and then using the basic javascript exec()
(because I need to do things with capture groups).
Now, I know that there are differences between these two regex engines, so I’ve been working to the capabilities of the much more limited JS engine (I think inDesign’s scripting language is based on ECMAscript v3), but I’ve recently hit a problem that I can’t seem to figure out.
Here’s the regex I’m currently testing (I’ve broken up the lines to make it a little easier to read –
((?:one|two|three|four|five|six|seven|eight|nine|ten|d{4,}|d{1,3}(?:,d{3})*)(?:.d+)?) (?=-|‑|s|°|º|˚|∙|⁰) (?:[-s](thousand|million|billion|trillion))? (?:[-s](cubic|cu.?|square|sq.?))?
- The first line finds numbers formatted in various different ways.
- The second line is a lookahead that makes sure I’ve reached the end of the numbers.
- The third line finds any multipliers that refer to that number.
- The fourth line is supposed to find any modifiers that go before the unit of measurement.
This is the sample text I was testing it on.
23 sq metres 45-square-metres 16-cubic metres 96 cu metres 409 cu. metres 12 sq metres 24 sq. metres
Now when I run the regex using inDesign’s findGrep()
it works as expected. When I run it using exec()
, however, it does something odd. It will match the numbers and the multipliers just fine, but only “cubic” and “cu” get matched, the “square” and “sq” text is ignored.
To make things more baffling, if I reverse the order of these entries in the regex capture group (so it’s (?:[-s](square|sq.?|cubic|cu.?))?
instead), then it only matches “square” and “sq” and not “cubic” and “cu”.
Am I missing something really obvious here? I’m a javascript newbie, but I’ve been working with regular expressions in xslt for years.
str = `23 sq metres 45-square-metres 16-cubic metres 96 cu metres 409 cu. metres 12 sq metres 24 sq. metres `; patt = /((?:one|two|three|four|five|six|seven|eight|nine|ten|d{4,}|d{1,3}(?:,d{3})*)(?:.d+)?)(?=-|‑|s|°|º|˚|∙|⁰)(?:[-s](thousand|million|billion|trillion))?(?:[-s](cubic|cu.?|square|sq.?))?/gm; while (res = patt.exec(str)) console.log(res);
EDIT:
So, here’s the code as I’m trying to run it right now.
str = `23 sq metres 45-square-metres 16-cubic metres 96 cu metres 409 cu. metres 12 sq metres 24 sq. metres `; var re = '(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\.)+)(?:(\s?(?:-|–)\s?)(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\.)+))?(?:[-\s](thousand|million|billion|trillion))?(?:[-\s](cubic|cu\.?|square|sq\.?))?'; patt = new RegExp(re); while (res = patt.exec(str)) console.log(res);
If I try to run this on my machine, using the inDesign script, it fails to find anything with “square” or “sq”, and when I run it in the code snippet view here it just freezes up. I’m guessing this is something to do with storing regexes as strings, yes?
Advertisement
Answer
I’m not sure if I understand you right. If you want that your second code works in about the same way as your first code does, you probably need just to add "gm"
in the RegeExp constructor:
var patt = new RegExp(re, "gm");
str = `23 sq metres 45-square-metres 16-cubic metres 96 cu metres 409 cu. metres 12 sq metres 24 sq. metres `; var re = '(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\.)+)(?:(\s?(?:-|–)\s?)(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\.)+))?(?:[-\s](thousand|million|billion|trillion))?(?:[-\s](cubic|cu\.?|square|sq\.?))?'; var patt = new RegExp(re, "gm"); while (res = patt.exec(str)) console.log(res[5]);
It gives me this output:
sq square cubic cu cu. sq sq.
Update
I’ve changed (cubic|cu\.?|square|sq\.?)
with (cubic|cu\.|cu|square|sq\.|sq)
and it seems work in InDesign now:
str = "23 sq metresn45-square-metresn16-cubic metresn96 cu metresn409 cu. metresn12 sq metresn24 sq. metres"; var re = '(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\.)+)(?:(\s?(?:-|–)\s?)(one|two|three|four|five|six|seven|eight|nine|ten|(?:[0-9]|,|\.)+))?(?:[-\s](thousand|million|billion|trillion))?(?:[-\s](cubic|cu\.|cu|square|sq\.|sq))?'; var patt = new RegExp(re, "gm"); var msg = ""; while (res = patt.exec(str)) msg += res[0] + " : " + res[5] + "n"; alert(msg);
Probably these ?
inside (foo|bar)
are too much for InDesign script model.