Skip to content
Advertisement

How can I allow only alphanumeric including Chinese, Japanese and all that cryptographic languages?

I’m currently trying to filter out any bad char from a string to only allow alphanumeric ones but I need to include Chinese, Japanese and all that non-Latin languages as well. After some hours of reading RegEx, I’m more confused than informed. Currently I have:

let string = 'Test=😕查看         ' +
    '';

string = string.replace(/[^A-Za-zdp{Han}]+$/ug,' ');

console.log(string);

Without the {Han} everything works well, but no Chinese chars. Any idea? I want to keep it simple, but this seems to be impossible.

Advertisement

Answer

I suggest removing all chars other than letters and digits:

let string = 'Test=😕查看         ';
string = string.replace(/[^p{L}p{N}]+/ug,' ').trim();
console.log(string);

If you need to allow diacritics add p{M} there:

string.replace(/[^p{L}p{N}p{M}]+/ug,' ').trim();
Advertisement