Removing Version Numbers With Regular Expression
Solution 1:
The trick is that you can have things in the pattern that aren't returned in a match group (i.e., they will be part of group(0), but not any other group). Here is what I worked out:
# put the lines to clean in a string
s='''Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
Windows UPnP Browser 0.1.01
CamStudio
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0'''# use findall to return the parts we wantprint(re.findall(r'(.+?)(?: (?:[\d\.]+))*(?:\n|\Z)', s))
Explanation of the regex: (.+?)
is a non-greedy capture of a bunch of characters.
(?: [\d\.]+)*
is a non-capturing group, repeated zero or more times, that starts with a space and has only digits or '.' following (in each repeat).
(?:\n|\Z)
matches a newline or the end of the string. If your string might have carriage returns, you could use \r?(?:\n|\Z)
instead.
For a regex that has only one capturing group, re.findall
returns group(1) of each match in the string, which is exactly what you want. The other parts of the regex must be matched, but since they are not captured, they will not be returned.
Post a Comment for "Removing Version Numbers With Regular Expression"