Skip to content
Advertisement

Javascript hexadecimal to ASCII with latin extended symbols

I am getting a hexadecimal value of my string that looks like this:

String has letters with diacritics: č,š,ř, ...

Hexadecimal value of this string is:

0053007400720069006E006700200068006100730020006C0065007400740065007200730020007700690074006800200064006900610063007200690074006900630073003A0020010D002C00200161002C00200159002C0020002E002E002E

The problem is that when i try to convert this value back to ascii it poorly converts the č,š,ř,.. and returns symbol of little box with question mark in it instead of these symbols.

My code for converting hex to ascii:

function convertHexadecimal(hexx){

  let index = hexx.indexOf("~");
  let strInfo = hexx.substring(0, index+1);
  let strMessage = hexx.substring(index+1); 
  var hex  = strMessage.toString();
  var str = '';     
  for (var i = 0; i < hex.length; i += 2){     
      str += String.fromCharCode(parseInt(hex.substr(i, 2), 16));     
  }
  console.log("Zpráva: " + str);
  var strFinal = strInfo + str;
  return strFinal; 
}

Can somebody help me with this?

Advertisement

Answer

First an example solution:

let demoHex = `0053007400720069006E006700200068006100730020006C0065007400740065007200730020007700690074006800200064006900610063007200690074006900630073003A0020010D002C00200161002C00200159002C0020002E002E002E`;

function hexToString(hex) {
    let str="";
    for( var i = 0; i < hex.length; i +=4) {
       str += String.fromCharCode( Number("0x" + hex.substr(i,4)));
    }
    return str;
}
console.log("Decoded string: %s", hexToString(demoHex) );

What it’s doing:

It’s treating the hex characters as a sequence of 4 hexadecimal digits that provide the UTF-16 character code of a character.

  • It gets each set of 4 digits in a loop using String.prototype.substr. Note MDN says .substr is deprecated but this is not mentioned in the ECMASript standard – rewrite it to use substring or something else as you wish.

  • Hex characters are prefixed with “0x” to make them a valid number representation in JavaScript and converted to a number object using Number. The number is then converted to a character string using the String.fromCharCode static method.

I guessed the format of the hex string by looking at it, which means a general purpose encoding routine to encode UTF16 characters (not code points) into hex could look like:

const hexEncodeUTF16 =
   str=>str.split('')
  .map( char => char.charCodeAt(0).toString(16).padStart(4,'0'))
  .join('');

console.log( hexEncodeUTF16( "String has letters with diacritics: č, š, ř, ..."));

I hope these examples show what needs doing – there are any number of ways to implement it in code.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement