Skip to content
Advertisement

Regex: How do I match all non letter characters no matter where they are in the string?

I am not sure if there is an elegant solution to this. The regex should only consider letters in a search string and additionally match all other characters no matter where they appear in between the characters of the search string, e.g.:

The search string My Moms house should match with the -> <- marked segments:

text 123 ->My Mom's house<- jidjio

bla bla ->My8Mo2ms231#43house<- bla bla

Test string ->My Mom's' house<- further text

etc.

Advertisement

Answer

Matching

So, from your question I believe that you are looking either for this

M.*?y.*?M.*?o.*?m.*?s.*?h.*?o.*?u.*?s.*?e

or

M[^a-zA-Z]*?y[^a-zA-Z]*?M[^a-zA-Z]*?o[^a-zA-Z]*?m[^a-zA-Z]*?s[^a-zA-Z]*?h[^a-zA-Z]*?o[^a-zA-Z]*?u[^a-zA-Z]*?s[^a-zA-Z]*?e

The first one matches the search string plus any characters in between the characters of the search string (as stated in your question body, see regex101), the second one does the same for non-alphabetic characters (as your question title suggests, see regex101).

Each of these is just built from the characters of the search string with a pattern to lazily match either any character (case 1) or any non-alphabetic character (case 2).

Note: If you want the second one to also exclude “special” word characters, like e.g. é, ü or ô, you need to take care of them accordingly in the regex pattern that you use, e.g. by using the unicode category P{L}.

MP{L}*?yP{L}*?MP{L}*?oP{L}*?mP{L}*?sP{L}*?hP{L}*?oP{L}*?uP{L}*?sP{L}*?e

p{L} matches a single code point in the category “letter”, and P{L} matches the opposite (see regex101).

Building the expression

Whatever your exact expression, you can easily build your final regex string by joining each character of your search string with the expression you choose to match content in between.

Python example

Here is a python example (since your question was not tagged with a programming language):

import regex

text = ["text 123 ->My Mom's house<- jidjio", 
        "bla bla ->My8Mo2ms231#43house<- bla bla", 
        "Test string ->My Mom's' house<- further text", 
        "wkashhasMdykMomLsfheoousssswQseBswenksd", 
        "textMy?M?om*s?*hou?*seorsomethingelse",
        "thisIs3MôyMäoméshouseEFSAcasw!"]

search_string = "MyMomshouse"

regex_string = r'.*?'.join(str(c) for c in search_string)
regex_string2 = r'[^a-zA-Z]*?'.join(str(c) for c in search_string)
regex_string3 = r'P{L}*?'.join(str(c) for c in search_string)

print('n--- regex 1 ---')
for t in text:
    print(regex.search(regex_string, t))

print('n--- regex 2 ---')
for t in text:
    print(regex.search(regex_string2, t))

print('n--- regex 3 ---')
for t in text:
    print(regex.search(regex_string3, t))

Output:

--- regex 1 ---
<regex.Match object; span=(11, 25), match="My Mom's house">
<regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'>
<regex.Match object; span=(14, 29), match="My Mom's' house">
<regex.Match object; span=(8, 31), match='MdykMomLsfheoousssswQse'>
<regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'>
<regex.Match object; span=(7, 21), match='MôyMäoméshouse'>

--- regex 2 ---
<regex.Match object; span=(11, 25), match="My Mom's house">
<regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'>
<regex.Match object; span=(14, 29), match="My Mom's' house">
None
<regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'>
<regex.Match object; span=(7, 21), match='MôyMäoméshouse'>

--- regex 3 ---
<regex.Match object; span=(11, 25), match="My Mom's house">
<regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'>
<regex.Match object; span=(14, 29), match="My Mom's' house">
None
<regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'>
None

Note:

  • I used the python regex module instead of the re module because it supports the p{L} pattern.
  • If your search string includes characters that have a special meaning in regex, you need to escape them when building the pattern, e.g. '.*?'.join(regex.escape(str(c)) for c in search_string)
  • I used the search string MyMomshouse (no spaces) instead of the one you specified, since yours would not match in the second of your example strings.

JavaScript example:

The same is possible in JavaScript, or in principle, any language. See also this JS fiddle:

const text = ["text 123 ->My Mom's house<- jidjio", 
        "bla bla ->My8Mo2ms231#43house<- bla bla", 
        "Test string ->My Mom's' house<- further text", 
        "wkashhasMdykMomLsfheoousssswQseBswenksd", 
        "textMy?M?om*s?*hou?*seorsomethingelse",
        "thisIs3MôyMäoméshouseEFSAcasw!"];
      
const search_string = "MyMomshouse";

const regex_string = Array.from(search_string).join('.*?')

console.log(regex_string)

text.forEach((entry) => {
    console.log(entry.search(regex_string));
});

However the unicode character group support is not always available, see this SO questions and its answers for possible solutions.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement