I am trying to match a particular pattern in a lengthy string:
NEW ZEALAND AND (data.operator1:”SHELL AND AMP” AND data.field:”NEW ZEALAND”) OR (data.operator:purpose AND data.field:crank) OR (data.operator:REGULATOR AND data.field:HELICOPTOR)
- I want to select all the below values followed by : but not the AND/OR/NOT operator.
- I am trying to use look ahead and look after/behind feature in Regex but unable to achieve it
Basically a combination of /(?<!AND)(?<!OR)s+(?!AND)(?!OR)/g and :”[a-zA-Z ]“
I want to change the strings to title case so that I can clearly distinguish AND/OR/NOT.
New Zealand AND (data.operator1:”Shell And Amp” AND data.field:”New Zealand”) OR (data.operator:purpose AND data.field:crank) OR (data.operator:Regulator AND data.field:Helicoptor)
Advertisement
Answer
You can easily express lexers using regular expressions with named groups, for example:
const MY_LEXER = String.raw`
(?<string> "[^"]*")
|
(?<operator> and|or|AND|OR)
|
(?<word> w+)
|
(?<punct> [().:])
|
(?<ws> s+)
`
The next function gets a string and a lexer and returns a list of pairs [token-type, token-value]
:
let tokenize = (str, lexer) =>
[
str.matchAll(
lexer.replace(/s+/g, ''))
]
.flatMap(m =>
Object
.entries(m.groups)
.filter(p => p[1]))
The result will be like
[ 'word', 'NEW' ],
[ 'ws', ' ' ],
[ 'word', 'ZEALAND' ],
[ 'ws', ' ' ],
[ 'operator', 'AND' ],
[ 'ws', ' ' ],
[ 'punct', '(' ],
etc. Now it should be possible to iterate that, transform values as you need and put them back together:
for (let [type, val] of tokenize(myString, MY_LEXER)) {
if (type === 'string' || type === 'word')
val = val.toLowerCase();
output += val;
}