Python Split On Multiple Delimiters Bug?
I was looking at the responses to this earlier-asked question: Split Strings with Multiple Delimiters? For my variant of this problem, I wanted to split on everything that wasn't
Solution 1:
The '-/
inside a character class created a range that includes a comma:
When you need to put a literal hyphen in a Python re
pattern, put it:
- at the start:
[-A-Z]
(matches an uppercase ASCII letter and-
) - at the end:
[A-Z()-]
(matches an uppercase ASCII letter,(
,)
or-
) - after a valid range:
[A-Z-+]
(matches an uppercase ASCII letter,-
or+
) - or just escape it.
You cannot put it after a shorthand, right before a standalone symbol (as in [\w-+]
, it will cause a bad character range error). This is valid in .NET and some other regex flavors, but is not valid in Python re
.
Put the hyphen at the end of it, or escape it.
Use
re.split(r"[^a-zA-Z0-9_'/-]+", b)
In Python 2.7, you may even contract it to
re.split(r"[^\w'/-]+", b)
Solution 2:
The '-/
is interpreted as range having ascii value from 39 to 47 which includes ,
having ascii value 44.
You will have to put -
either at beginning or end or character class.
Post a Comment for "Python Split On Multiple Delimiters Bug?"