Python Regular Expressions, How To Extract Longest Of Overlapping Groups
Solution 1:
No, that's just how it works, at least in Perl-derived regex flavors like Python, JavaScript, .NET, etc.
Solution 2:
Intrigued to know the right way of doing this, if it helps any you can always build up your regex like:
import re
string_to_look_in = "AUHDASOHDCSIAAOSLINDASOI"
string_to_match = "CSIABC"
re_to_use = "(" + "|".join([string_to_match[0:i] for i in range(len(string_to_match),0,-1)]) + ")"
re_result = re.search(re_to_use,string_to_look_in)
print string_to_look_in[re_result.start():re_result.end()]
Solution 3:
similar functionality is present in vim editor ("sequence of optionally matched atoms"), where e.g. col\%[umn]
matches col
in color
, colum
in columbus
and full column
.
i am not aware if similar functionality in python re,
you can use nested anonymous groups, each one followed by ?
quantifier, for that:
>>> import re
>>> words = ['color', 'columbus', 'column']
>>> rex = re.compile(r'col(?:u(?:m(?:n)?)?)?')
>>> for w in words: print rex.findall(w)
['col']
['colum']
['column']
Solution 4:
As Alan says, the patterns will be matched in the order you specified them.
If you want to match on the longest of overlapping literal strings, you need the longest one to appear first. But you can organize your strings longest-to-shortest automatically, if you like:
>>> '|'.join(sorted('cs csi miami vice'.split(), key=len, reverse=True))
'miami|vice|csi|cs'
Post a Comment for "Python Regular Expressions, How To Extract Longest Of Overlapping Groups"