Skip to content Skip to sidebar Skip to footer

Removing Version Numbers With Regular Expression

I want to replace the version number in a string, e.g., Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148 Microsoft Visual C++ 2008 Redistributable - x8

Solution 1:

The trick is that you can have things in the pattern that aren't returned in a match group (i.e., they will be part of group(0), but not any other group). Here is what I worked out:

# put the lines to clean in a string
s='''Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0
Windows UPnP Browser 0.1.01
CamStudio
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.4148 9.0.30729.4148
Microsoft Visual C++ 2008 Redistributable - x86 9.0.30729.6161 9.0.30729.6161
Microsoft_VC80_DebugCRT_x86_x64 1.0.0
Microsoft_VC80_DebugCRT_x86 1.0.0'''# use findall to return the parts we wantprint(re.findall(r'(.+?)(?: (?:[\d\.]+))*(?:\n|\Z)', s))

Explanation of the regex: (.+?) is a non-greedy capture of a bunch of characters. (?: [\d\.]+)* is a non-capturing group, repeated zero or more times, that starts with a space and has only digits or '.' following (in each repeat). (?:\n|\Z) matches a newline or the end of the string. If your string might have carriage returns, you could use \r?(?:\n|\Z) instead.

For a regex that has only one capturing group, re.findall returns group(1) of each match in the string, which is exactly what you want. The other parts of the regex must be matched, but since they are not captured, they will not be returned.

Post a Comment for "Removing Version Numbers With Regular Expression"