I am not sure if there is an elegant solution to this. The regex should only consider letters in a search string and additionally match all other characters no matter where they appear in between the characters of the search string, e.g.:
The search string My Moms house
should match with the -> <- marked segments:
text 123 ->My Mom's house<- jidjio bla bla ->My8Mo2ms231#43house<- bla bla Test string ->My Mom's' house<- further text
etc.
Advertisement
Answer
Matching
So, from your question I believe that you are looking either for this
M.*?y.*?M.*?o.*?m.*?s.*?h.*?o.*?u.*?s.*?e
or
M[^a-zA-Z]*?y[^a-zA-Z]*?M[^a-zA-Z]*?o[^a-zA-Z]*?m[^a-zA-Z]*?s[^a-zA-Z]*?h[^a-zA-Z]*?o[^a-zA-Z]*?u[^a-zA-Z]*?s[^a-zA-Z]*?e
The first one matches the search string plus any characters in between the characters of the search string (as stated in your question body, see regex101), the second one does the same for non-alphabetic characters (as your question title suggests, see regex101).
Each of these is just built from the characters of the search string with a pattern to lazily match either any character (case 1) or any non-alphabetic character (case 2).
Note: If you want the second one to also exclude “special” word characters, like e.g. é
, ü
or ô
, you need to take care of them accordingly in the regex pattern that you use, e.g. by using the unicode category P{L}
.
MP{L}*?yP{L}*?MP{L}*?oP{L}*?mP{L}*?sP{L}*?hP{L}*?oP{L}*?uP{L}*?sP{L}*?e
p{L}
matches a single code point in the category “letter”, and P{L}
matches the opposite (see regex101).
Building the expression
Whatever your exact expression, you can easily build your final regex string by joining each character of your search string with the expression you choose to match content in between.
Python example
Here is a python example (since your question was not tagged with a programming language):
import regex text = ["text 123 ->My Mom's house<- jidjio", "bla bla ->My8Mo2ms231#43house<- bla bla", "Test string ->My Mom's' house<- further text", "wkashhasMdykMomLsfheoousssswQseBswenksd", "textMy?M?om*s?*hou?*seorsomethingelse", "thisIs3MôyMäoméshouseEFSAcasw!"] search_string = "MyMomshouse" regex_string = r'.*?'.join(str(c) for c in search_string) regex_string2 = r'[^a-zA-Z]*?'.join(str(c) for c in search_string) regex_string3 = r'P{L}*?'.join(str(c) for c in search_string) print('n--- regex 1 ---') for t in text: print(regex.search(regex_string, t)) print('n--- regex 2 ---') for t in text: print(regex.search(regex_string2, t)) print('n--- regex 3 ---') for t in text: print(regex.search(regex_string3, t))
Output:
--- regex 1 --- <regex.Match object; span=(11, 25), match="My Mom's house"> <regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'> <regex.Match object; span=(14, 29), match="My Mom's' house"> <regex.Match object; span=(8, 31), match='MdykMomLsfheoousssswQse'> <regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'> <regex.Match object; span=(7, 21), match='MôyMäoméshouse'> --- regex 2 --- <regex.Match object; span=(11, 25), match="My Mom's house"> <regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'> <regex.Match object; span=(14, 29), match="My Mom's' house"> None <regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'> <regex.Match object; span=(7, 21), match='MôyMäoméshouse'> --- regex 3 --- <regex.Match object; span=(11, 25), match="My Mom's house"> <regex.Match object; span=(10, 29), match='My8Mo2ms231#43house'> <regex.Match object; span=(14, 29), match="My Mom's' house"> None <regex.Match object; span=(4, 22), match='My?M?om*s?*hou?*se'> None
Note:
- I used the python
regex
module instead of there
module because it supports thep{L}
pattern. - If your search string includes characters that have a special meaning in regex, you need to escape them when building the pattern, e.g.
'.*?'.join(regex.escape(str(c)) for c in search_string)
- I used the search string
MyMomshouse
(no spaces) instead of the one you specified, since yours would not match in the second of your example strings.
JavaScript example:
The same is possible in JavaScript, or in principle, any language. See also this JS fiddle:
const text = ["text 123 ->My Mom's house<- jidjio", "bla bla ->My8Mo2ms231#43house<- bla bla", "Test string ->My Mom's' house<- further text", "wkashhasMdykMomLsfheoousssswQseBswenksd", "textMy?M?om*s?*hou?*seorsomethingelse", "thisIs3MôyMäoméshouseEFSAcasw!"]; const search_string = "MyMomshouse"; const regex_string = Array.from(search_string).join('.*?') console.log(regex_string) text.forEach((entry) => { console.log(entry.search(regex_string)); });
However the unicode character group support is not always available, see this SO questions and its answers for possible solutions.