Skip to content

Tag: utf-8

Using Javascript’s atob to decode base64 doesn’t properly decode utf-8 strings

I’m using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I’m getting ASCII-encoded characters back (like ⢠instead of ™). How can I properly handle the incoming base64-encoded stream so that it’s decoded as utf-8? Answer The Unicode Problem Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII,

Convert integer array to string at javascript

Here is php code: the output is: 中文chinese Here is javascript code: the output is: 中æchinese So how should I process the array at javascript? AdvertisementAnswer JavaScript strings consist of UTF-16 code units, yet the numbers in your array are the bytes of a UTF-8 string. Here is one way to convert the string, which […]

Extract substring by utf-8 byte positions

I have a string and start and length with which to extract a substring. Both positions (start and length) are based on the byte offsets in the original UTF8 string. However, there is a problem: The start and length are in bytes, so I cannot use “substring”. The UTF8 string contains several multi-byte characters. Is there a hyper-efficient way of