Ambiguous Substring With Mismatches
I'm trying to use regular expressions to find a substring in a string of DNA. This substring has ambiguous bases, that like ATCGR, where R could be A or G. Also, the script must al
Solution 1:
If I understand well, you are looking for all three letters substrings that match the pattern T[GA]T
and you allow at worst one error, but I think the error you are looking for is only a character substitution since you never spoke about 2 letters results.
To obtain the expected result, you have to change {e<=1}
to {s<=1}
(or {s<2}
) and to apply it to the whole pattern (and not only the last letter) enclosing it in a group (capturing or not capturing, like you want), otherwise the predicate {s<=1}
is only linked to the last letter:
regex.findall(r'(T[AG]T){s<=1}', s, overlapped=True)
Post a Comment for "Ambiguous Substring With Mismatches"