JavaScript strings – UTF-16 vs UCS-2?

Question

I&#8217;ve read in some places that JavaScript strings are UTF-16, and in other places they&#8217;re UCS-2. I did some searching around to try to figure out the difference and found this: Q: What is the difference between UCS-2 and UTF-16? A: UCS-2 is obsolete terminology which refers to a Unicode implementat…

Accepted Answer

JavaScript, strictly speaking, ECMAScript, pre-dates Unicode 2.0, so in some cases you may find references to UCS-2 simply because that was correct at the time the reference was written. Can you point us to specific citations of JavaScript being &#8220;UCS-2&#8221;?Specifications for ECMAScript versions 3 and 5 at least both explicitly declare a String to be a collection of unsigned 16-bit integers and that if those integer values are meant to represent textual data, then they are UTF-16 code units. Seesection 8.4 of the ECMAScript Language Specification in version 5.1or section 6.1.4 in version 13.0.EDIT: I&#8217;m no longer sure my answer is entirely correct. See the excellent article mentioned above, which in essence says that while a JavaScript engine may use UTF-16 internally, and most do, the language itself effectively exposes those characters as if they were UCS-2.

Advertisement

Answer