Skip to content Skip to sidebar Skip to footer

How Do I Use Format() In Re.compile

I want to write a regex that orders python to return items in a list that have sequence of vowels, defined by len=2. >>> chars = 'aeiou' >>> len = 2 >>>

Solution 1:

Your misuse of formatting has nothing to do with regular expressions. It appears that on top of everything else, you are incorrectly trying to use an f-string along with formatting. Among other things, you need to prefix an f-string with f and you can invoke methods with a period, not a comma.

The two formatting operations are interchangeable, and have a clearly defined evaluation order (f-string, then format method). However, it is generally better to use one or the other, not both. Things get unnecessarily complicated otherwise.

Using f-strings:

regex = re.compile(f"[{chars}]{{{len}}}")

Double braces are interpreted as literal braces in format strings. You need another, third set, to indicate that len is a formatted expression.

Using format:

regex = re.compile("[{}]{{{}}}".format(chars, len))
regex = re.compile("[{chars}]{{{len}}}".format(chars= chars, len=len))
regex = re.compile("[{0}]{{{len}}}".format(chars, len=len))

Using both (for completeness):

regex = re.compile(f"[{{}}]{{{{{len}}}}}".format(chars))

In no case do you need + inside your character class. In square brackets, + is matched against literal plus character. It does not act as some magical quantifier. Also, repeating characters in a character class is pointlessly redundant.

Since your string does not have any backslashes in it, it doesn't need to be a raw string, and doesn't need the r prefix.

Solution 2:

You can use an f-string by adding an f before the quotes of the string literal so that you can use one pair of curly brackets around len to evaluate its value as part of the string, and use a . (rather than a ,) to invoke the format method of the string. But since the f-string is evaluated first before being passed to str.format for formatting, in order for the empty curly brackets {} to be preserved literally by the f-string parser you would have to use double curly brackets to escape them. But then since you need curly brackets around the value of len in order for it to be a quantifier in your regex, you need to escape them once again by doubling them for str.format to preserve the curly brackets:

regex = re.compile(fr"[+{{}}+]{{{{{len}}}}}".format(chars))

Since curly brackets have special meanings in all of f-strings, str.format and regex, I would suggest that you format your string with a string formatting operator % instead so you don't have to deal with the escape hell above:

regex = re.compile(r'[+%s+]{%d}' % (chars, len))

Post a Comment for "How Do I Use Format() In Re.compile"