I’m working on a variation of the Motif Search Problem and have hit a block on the logic of one of the subparts.
The problem is:
You have the alphabet — ['A', 'C', 'G', 'T']
Now, you are given an array of strings made of these alphabets with length L and are given a hamming distance D.
For example, if we take L=3
, d=2
and are given the array [“ACT”,”TGC”,”GTA”].
For each string in the array, we must generate strings of the same length (3 in this case) using the alphabet which have a hamming distance of d
. The hamming distance d
means that the final string will only have d edits (two if d = 2) as compared to the original string.
So if we take the example of ACT from the array, we must generate GTT, AAG, CGT and so on. And I want to store these generated strings in a list.
A recursive function that breaks down the string seems like the most efficient way to do it but I’m having trouble implementing it. Any ideas for the logic explained with code/pseudocode? I need to be able to extend this problem to L = 15 and d = 5.
Example of Hamming distance d = 1.
Advertisement
Answer
This can get you started. It doesn’t handle creating strings of different length than the source. Perhaps you could add that.
JavaScript code:
function f(alphabet, str, prefix, i, d){ if (d == 0) return [prefix + str.substr(i)]; let words = []; for (let j=0; j<alphabet.length; j++){ if (alphabet[j] != str[i]) words = words.concat( f(alphabet, str, prefix + alphabet[j], i+1, d-1) ); } if (str.length - i > d) words = words.concat( f(alphabet, str, prefix + str[i], i+1, d) ) return words; } const alphabet = ['A', 'C', 'G', 'T']; const strs = ["ACT", "TGC", "GTA"]; for (let str of strs){ console.log(str); console.log(JSON.stringify(f(alphabet, str, '', 0, 2))); console.log(''); }