Skip to content
Advertisement

Generating strings of length l with hamming distance d

I’m working on a variation of the Motif Search Problem and have hit a block on the logic of one of the subparts.

The problem is:

You have the alphabet — ['A', 'C', 'G', 'T']

Now, you are given an array of strings made of these alphabets with length L and are given a hamming distance D.

For example, if we take L=3, d=2 and are given the array [“ACT”,”TGC”,”GTA”].

For each string in the array, we must generate strings of the same length (3 in this case) using the alphabet which have a hamming distance of d. The hamming distance d means that the final string will only have d edits (two if d = 2) as compared to the original string.

So if we take the example of ACT from the array, we must generate GTT, AAG, CGT and so on. And I want to store these generated strings in a list.

A recursive function that breaks down the string seems like the most efficient way to do it but I’m having trouble implementing it. Any ideas for the logic explained with code/pseudocode? I need to be able to extend this problem to L = 15 and d = 5.

Example of Hamming distance d = 1.

enter image description here

Advertisement

Answer

This can get you started. It doesn’t handle creating strings of different length than the source. Perhaps you could add that.

JavaScript code:

function f(alphabet, str, prefix, i, d){
  if (d == 0)
    return [prefix + str.substr(i)];
    
  let words = [];
  
  for (let j=0; j<alphabet.length; j++){
    if (alphabet[j] != str[i]) 
      words = words.concat(
        f(alphabet, str, prefix + alphabet[j], i+1, d-1)
      );
  }

  if (str.length - i > d)
    words = words.concat(
      f(alphabet, str, prefix + str[i], i+1, d)
    )
  
  return words;
}

const alphabet = ['A', 'C', 'G', 'T'];

const strs = ["ACT", "TGC", "GTA"];

for (let str of strs){
  console.log(str);
  console.log(JSON.stringify(f(alphabet, str, '', 0, 2)));
  console.log('');
}
Advertisement