Skip to content
Advertisement

Javascript iterate over Unicode when Emoji has a skincolor

The problem I am facing to right now is that when there are Emojis with a different skincolor than yellow that Javascript splits it in different chars instead of one.

When I have emojis like this there is no problem an I get the results I want to have.

let strs = [..."πŸ˜‚πŸ˜„πŸ€©πŸ™„πŸ˜πŸ˜£πŸ€©"]

console.log(strs)
console.log(strs.length)

But if I have emojis like this there is a problem because javascript don’t let me use the […] operator with this emojis:

let strs = [..."πŸ§‘πŸΎπŸ‘¨πŸ»πŸ‘§πŸΌπŸ‘¦πŸ½πŸ§’πŸΏ"]

console.log(strs)
console.log(strs.length)

How can I tell Javascript that these is only one Emoji with the length of one and not two or more Emojis like in this example:

let strs = [..."πŸ‘©β€β€οΈβ€πŸ’‹β€πŸ‘©"]

console.log(strs)
console.log(strs.length)

Advertisement

Answer

The iterator of strings (invoked via the spread syntax ...) iterates over the code points of the string. Some emojis are made up of multiple code points which causes them to split unintentionally as you have seen. In more recent versions of lodash, you can use _.split() which is able to handle emojis and ZWJ characters:

const r1 = _.split("πŸ‘©β€β€οΈβ€πŸ’‹β€πŸ‘©", '');
const r2 = _.split("πŸ§‘πŸΎπŸ‘¨πŸ»πŸ‘§πŸΌπŸ‘¦πŸ½πŸ§’πŸΏ", '');

// See browser console for output: 
console.log(r1, r1.length);
console.log(r2, r2.length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.21/lodash.min.js" integrity="sha512-WFN04846sdKMIP5LKNphMaWzU7YpMyCU245etK3g/2ARYbPK9Ub18eG+ljU96qKRCWh+quCY7yefSmlkQw1ANQ==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>

Note that you don’t need to include the entire lodash library to use this method, instead, you can include the method specifically.


There is also a stage 4 proposal for Intl.Segmenter, which is an API that will allow you to split/segment your string by specifying a granularity. It involves creating a segmenter which can split strings up based on its graphemes (ie: the visual emoji characters). When you use the segmenter on your string, you’ll get an iterator, which you can then convert into an array of characters using Array.from():

const graphemeSplit = str => {
  const segmenter = new Intl.Segmenter("en", {granularity: 'grapheme'});
  const segitr = segmenter.segment(str);
  return Array.from(segitr, ({segment}) => segment);
}
// See browser console for output
console.log(graphemeSplit("πŸ‘©β€β€οΈβ€πŸ’‹β€πŸ‘©")); // ["πŸ‘©β€β€οΈβ€πŸ’‹β€πŸ‘©"]
console.log(graphemeSplit("πŸ§‘πŸΎπŸ‘¨πŸ»πŸ‘§πŸΌπŸ‘¦πŸ½πŸ§’πŸΏ")); // ["πŸ§‘πŸΎ", "πŸ‘¨πŸ»", "πŸ‘§πŸΌ", "πŸ‘¦πŸ½", "πŸ§’πŸΏ"]
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement