How to count the correct length of a string with emojis in javascript?

Tags: , ,



I’ve a little problem.

I’m using NodeJS as backend. Now, an user has a field “biography”, where the user can write something about himself.

Suppose that this field has 220 maxlength, and suppose this as input:

👶ðŸ»ðŸ‘¦ðŸ»ðŸ‘§ðŸ»ðŸ‘¨ðŸ»ðŸ‘©ðŸ»ðŸ‘±ðŸ»â€â™€ï¸ðŸ‘±ðŸ»ðŸ‘´ðŸ»ðŸ‘µðŸ»ðŸ‘²ðŸ»ðŸ‘³ðŸ»â€â™€ï¸ðŸ‘³ðŸ»ðŸ‘®ðŸ»â€â™€ï¸ðŸ‘®ðŸ»ðŸ‘·ðŸ»â€â™€ï¸ðŸ‘·ðŸ»ðŸ’‚ðŸ»â€â™€ï¸ðŸ’‚ðŸ»ðŸ•µðŸ»â€â™€ï¸ðŸ‘©ðŸ»â€âš•ï¸ðŸ‘¨ðŸ»â€âš•ï¸ðŸ‘©ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ðŸ‘¨ðŸ»â€ðŸŒ¾ 

As you can see there aren’t 220 emojis (there are 37 emojis), but if I do in my nodejs server

console.log(bio.length)

where bio is the input text, I got 221. How could I “parse” the string input to get the correct length? Is it a problem about unicode?

SOLVED

I used this library: https://github.com/orling/grapheme-splitter

I tried that:

var Grapheme = require('grapheme-splitter');
var splitter = new Grapheme();
console.log(splitter.splitGraphemes(bio).length);

and the length is 37. It works very well!

Answer

  1. str.length gives the count of UTF-16 units.

  2. Unicode-proof way to get string length in codepoints (in characters) is [...str].length as iterable protocol split the string to codepoints.

  3. If we need the length in graphemes (grapheme clusters), we have these native ways:

    a. Unicode property escapes in RegExp. See for example: Unicode-aware version of w or Matching emoji.

    b. Intl.Segmenter — coming soon, probably in ES2021. Can be tested with a flag in the last V8 versions (realization was synced with the last spec in V8 86).

See also:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

What every JavaScript developer should know about Unicode

JavaScript has a Unicode problem

Unicode-aware regular expressions in ES2015

ES6 Strings (and Unicode, â¤) in Depth

JavaScript for impatient programmers. Unicode – a brief introduction



Source: stackoverflow