Skip to content Skip to sidebar Skip to footer

Ambiguous Substring With Mismatches

I'm trying to use regular expressions to find a substring in a string of DNA. This substring has ambiguous bases, that like ATCGR, where R could be A or G. Also, the script must al

Solution 1:

If I understand well, you are looking for all three letters substrings that match the pattern T[GA]T and you allow at worst one error, but I think the error you are looking for is only a character substitution since you never spoke about 2 letters results.

To obtain the expected result, you have to change {e<=1} to {s<=1}(or {s<2}) and to apply it to the whole pattern (and not only the last letter) enclosing it in a group (capturing or not capturing, like you want), otherwise the predicate {s<=1} is only linked to the last letter:

regex.findall(r'(T[AG]T){s<=1}', s, overlapped=True)

Post a Comment for "Ambiguous Substring With Mismatches"