I’m currently trying to filter out any bad char from a string to only allow alphanumeric ones but I need to include Chinese, Japanese and all that non-Latin languages as well. After some hours of reading RegEx, I’m more confused than informed. Currently I have:
JavaScript
x
7
1
let string = 'Test=😕查看 ' +
2
'';
3
4
string = string.replace(/[^A-Za-zdp{Han}]+$/ug,' ');
5
6
console.log(string);
7
Without the {Han} everything works well, but no Chinese chars. Any idea? I want to keep it simple, but this seems to be impossible.
Advertisement
Answer
I suggest removing all chars other than letters and digits:
JavaScript
1
3
1
let string = 'Test=😕查看 ';
2
string = string.replace(/[^p{L}p{N}]+/ug,' ').trim();
3
console.log(string);
If you need to allow diacritics add p{M}
there:
JavaScript
1
2
1
string.replace(/[^p{L}p{N}p{M}]+/ug,' ').trim();
2