Regex: How do I match all non letter characters no matter where they are in the string?

Question

I am not sure if there is an elegant solution to this. The regex should only consider letters in a search string and additionally match all other characters no matter where they appear in between the characters of the search string, e.g.: The search string My Moms house should match with the -> <- marke…

Accepted Answer

MatchingSo, from your question I believe that you are looking either for thisM.*?y.*?M.*?o.*?m.*?s.*?h.*?o.*?u.*?s.*?eorM[^a-zA-Z]*?y[^a-zA-Z]*?M[^a-zA-Z]*?o[^a-zA-Z]*?m[^a-zA-Z]*?s[^a-zA-Z]*?h[^a-zA-Z]*?o[^a-zA-Z]*?u[^a-zA-Z]*?s[^a-zA-Z]*?eThe first one matches the search string plus any characters in between the characters of the search string (as stated in your question body, see regex101), the second one does the same for non-alphabetic characters (as your question title suggests, see regex101).Each of these is just built from the characters of the search string with a pattern to lazily match either any character (case 1) or any non-alphabetic character (case 2).Note: If you want the second one to also exclude “special” word characters, like e.g. é, ü or ô, you need to take care of them accordingly in the regex pattern that you use, e.g. by using the unicode category P{L}.MP{L}*?yP{L}*?MP{L}*?oP{L}*?mP{L}*?sP{L}*?hP{L}*?oP{L}*?uP{L}*?sP{L}*?ep{L} matches a single code point in the category “letter”, and P{L} matches the opposite (see regex101).Building the expressionWhatever your exact expression, you can easily build your final regex string by joining each character of your search string with the expression you choose to match content in between.Python exampleHere is a python example (since your question was not tagged with a programming language):import regextext = ["text 123 ->My Mom's house<- jidjio", "bla bla ->My8Mo2ms231#43house<- bla bla", "Test string ->My Mom's' house<- further text", "wkashhasMdykMomLsfheoousssswQseBswenksd", "textMy?M?om*s?*hou?*seorsomethingelse", "thisIs3MôyMäoméshouseEFSAcasw!"]search_string = "MyMomshouse"regex_string = r'.*?'.join(str(c) for c in search_string)regex_string2 = r'[^a-zA-Z]*?'.join(str(c) for c in search_string)regex_string3 = r'P{L}*?'.join(str(c) for c in search_string)print('n--- regex 1 ---')for t in text: print(regex.search(regex_string, t))print('n--- regex 2 ---')for t in text: print(regex.search(regex_string2, t))print('n--- regex 3 ---')for t in text: print(regex.search(regex_string3, t))Output:--- regex 1 ------ regex 2 ---None--- regex 3 ---NoneNoneNote:I used the python regex module instead of the re module because it supports the p{L} pattern.If your search string includes characters that have a special meaning in regex, you need to escape them when building the pattern, e.g. '.*?'.join(regex.escape(str(c)) for c in search_string)I used the search string MyMomshouse (no spaces) instead of the one you specified, since yours would not match in the second of your example strings.JavaScript example:The same is possible in JavaScript, or in principle, any language. See also this JS fiddle:const text = ["text 123 ->My Mom's house<- jidjio", "bla bla ->My8Mo2ms231#43house<- bla bla", "Test string ->My Mom's' house<- further text", "wkashhasMdykMomLsfheoousssswQseBswenksd", "textMy?M?om*s?*hou?*seorsomethingelse", "thisIs3MôyMäoméshouseEFSAcasw!"]; const search_string = "MyMomshouse";const regex_string = Array.from(search_string).join('.*?')console.log(regex_string)text.forEach((entry) => { console.log(entry.search(regex_string));});However the unicode character group support is not always available, see this SO questions and its answers for possible solutions.

Advertisement

Answer

Matching

Building the expression

Python example

JavaScript example: