Find words from array in string, whole words only (with hebrew characters)

Question

I have to build a RegExp obejct, that will search words from an array, and will find only whole words match. e.g. I have a words array (&#8216;יל&#8217;,&#8217;ילד&#8217;), and I want the RegExp to find &#8216;a&#8217; or &#8216;יל&#8217; or &#8216;ילד&#8217;, but not &#8216;ילדד&#8217;. This is my code: What…

Accepted Answer

The word boundary b is not Unicode aware. Use XRegExp to build a Unicode word boundary: var text = 'ילד ילדדד יל';var matchWords = ['יל','ילד'];re = XRegExp('(^|[^_0-9\pL])(' + matchWords.join('|') + ')(?![_0-9\pL])','ig');text = XRegExp.replace(text.replace(/n$/g, 'nn'), re, '$1$2');console.log(text); Here, (^|[^_0-9\pL]) is a capturing group with ID=1 that matches either the string start or any char other than a Unicode letter, ASCII digit or _ (a leading word boundary) and (?![_0-9\pL]) fails the match if the word is followed with _, ASCII digit or a Unicode letter.With the modern ECMAScript 2018+ standard support, you can use let text = 'ילד ילדדד יל';const matchWords = ['יל','ילד'];const re = new RegExp('(^|[^_0-9\p{L}])(' + matchWords.join('|') + ')(?![_0-9\p{L}])','igu');text = text.replace(re, '$1$2');console.log(text); Another ECMAScript 2018+ compliant solution that fully emulates Unicode-aware b construct is explained at Replace certain arabic words in text string using Javascript.

Advertisement

Answer