I’m currently trying to filter out any bad char from a string to only allow alphanumeric ones but I need to include Chinese, Japanese and all that non-Latin languages as well. After some hours of reading RegEx, I’m more confused than informed. Currently I have:
let string = 'Test=😕查看 ' + ''; string = string.replace(/[^A-Za-zdp{Han}]+$/ug,' '); console.log(string);
Without the {Han} everything works well, but no Chinese chars. Any idea? I want to keep it simple, but this seems to be impossible.
Advertisement
Answer
I suggest removing all chars other than letters and digits:
let string = 'Test=😕查看 '; string = string.replace(/[^p{L}p{N}]+/ug,' ').trim(); console.log(string);
If you need to allow diacritics add p{M}
there:
string.replace(/[^p{L}p{N}p{M}]+/ug,' ').trim();