Skip to content
Advertisement

How to detect regex pattern for strings with underscore

I am trying to create a regex for detecting the number of exact occurance of a string in another string.

function countOccurences(string, word) {
  var regex = new RegExp("\b" + word + "\b", "gi");
  return (string.match(regex) || []).length;
}
var str =
  "TEST Testing TeSt case-test case@test <h1>Test</h1> www.test.com TEST_UF_3780_nix_inputs r_test regex-test_";

var asset = "test";
console.log(countOccurences(str, asset));

Here I am getting the exact match for “test” string and nothing else, but it’s ignoring all the “test” strings which have underscore associated with it either front or back(like TEST_UF… or r_test or the regex-test_, the “test” string is not detected). I need help for detecting even those strings.

Advertisement

Answer

b matches a word boundary, which is when a word character (i.e. one matched by w) comes up against a non-word character. Matching word boundaries like this is useful in many contexts because it does not capture a character, but you’re running into the issue that '_' is a word character, so if you’re looking for word boundaries then you’re not going to find '_test'.

Word characters in JavaScript regular expressions are [A-Za-z0-9_]. So long as you are treating numbers in the same way as you are letters, the underscore should be the only unusual character you need to care about. However, as you don’t want to capture the underscore, you’ll want to use a lookahead and a lookbehind.

Try this:

function countOccurences(string, word) {
  var regex = new RegExp("(\b|(?<=_))" + word + "(\b|(?=_))", "gi");
  return (string.match(regex) || []).length;
}
var str =
  "TEST Testing TeSt case-test case@test <h1>Test</h1> www.test.com TEST_UF_3780_nix_inputs r_test regex-test_";

var asset = "test";
console.log(countOccurences(str, asset));

That example finds 9 instances of 'test' in your test string when it’s not part of another word (e.g. 'Testing'), which I believe is what you’re expecting?

However, you should be aware that support for lookbehind syntax was only added in ES 2018. If you need to support non-modern browsers, like IE11, or if you need to support Safari, then this approach won’t work for you.

If you only care about counting occurrences, though, then it doesn’t matter if you match the character, so you could do away with the lookahead and lookbehind syntax and just match for _ directly.

function countOccurences(string, word) {
  var regex = new RegExp("(\b|_)" + word + "(\b|_)", "gi");
  return (string.match(regex) || []).length;
}
var str =
  "TEST Testing TeSt case-test case@test <h1>Test</h1> www.test.com TEST_UF_3780_nix_inputs r_test regex-test_";

var asset = "test";
console.log(countOccurences(str, asset));
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement