I’m working on a variation of the Motif Search Problem and have hit a block on the logic of one of the subparts.
The problem is:
You have the alphabet — ['A', 'C', 'G', 'T']
Now, you are given an array of strings made of these alphabets with length L and are given a hamming distance D.
For example, if we take L=3
, d=2
and are given the array [“ACT”,”TGC”,”GTA”].
For each string in the array, we must generate strings of the same length (3 in this case) using the alphabet which have a hamming distance of d
. The hamming distance d
means that the final string will only have d edits (two if d = 2) as compared to the original string.
So if we take the example of ACT from the array, we must generate GTT, AAG, CGT and so on. And I want to store these generated strings in a list.
A recursive function that breaks down the string seems like the most efficient way to do it but I’m having trouble implementing it. Any ideas for the logic explained with code/pseudocode? I need to be able to extend this problem to L = 15 and d = 5.
Example of Hamming distance d = 1.
Advertisement
Answer
This can get you started. It doesn’t handle creating strings of different length than the source. Perhaps you could add that.
JavaScript code:
function f(alphabet, str, prefix, i, d){
if (d == 0)
return [prefix + str.substr(i)];
let words = [];
for (let j=0; j<alphabet.length; j++){
if (alphabet[j] != str[i])
words = words.concat(
f(alphabet, str, prefix + alphabet[j], i+1, d-1)
);
}
if (str.length - i > d)
words = words.concat(
f(alphabet, str, prefix + str[i], i+1, d)
)
return words;
}
const alphabet = ['A', 'C', 'G', 'T'];
const strs = ["ACT", "TGC", "GTA"];
for (let str of strs){
console.log(str);
console.log(JSON.stringify(f(alphabet, str, '', 0, 2)));
console.log('');
}