Skip to content Skip to sidebar Skip to footer

Returning The Lowest Index For The First Non Whitespace Character In A String In Python

What's the shortest way to do this in Python? string = ' xyz' must return index = 3

Solution 1:

>>>s = "   xyz">>>len(s) - len(s.lstrip())
3

Solution 2:

>>>next(i for i, j inenumerate('   xyz') if j.strip())
3

or

>>>next(i for i, j inenumerate('   xyz') if j notin string.whitespace)
3

in versions of Python < 2.5 you'll have to do:

(...).next()

Solution 3:

Looks like the "regexes can do anything" brigade have taken the day off, so I'll fill in:

>>>tests = [u'foo', u' foo', u'\xA0foo']>>>import re>>>for test in tests:...printlen(re.match(r"\s*", test, re.UNICODE).group(0))...
0
1
1
>>>

FWIW: time taken is O(the_answer), not O(len(input_string))

Solution 4:

Many of the previous solutions are iterating at several points in their proposed solutions. And some make copies of the data (the string). re.match(), strip(), enumerate(), isspace()are duplicating behind the scene work. The

next(idx for idx, chrinenumerate(string) ifnotchr.isspace())
next(idx for idx, chrinenumerate(string) ifnotchr.whitespace)

are good choices for testing strings against various leading whitespace types such as vertical tabs and such, but that adds costs too.

However if your string uses just a space characters or tab charachers then the following, more basic solution, clear and fast solution also uses the less memory.

defget_indent(astr):

    """Return index of first non-space character of a sequence else False."""try:
        iter(astr)
    except:
        raise# OR for not raising exceptions at all# if hasattr(astr,'__getitem__): return False

    idx = 0while idx < len(astr) and astr[idx] == ' ':
        idx += 1if astr[0] <> ' ':
        returnFalsereturn idx

Although this may not be the absolute fastest or simpliest visually, some benefits with this solution are that you can easily transfer this to other languages and versions of Python. And is likely the easiest to debug, as there is little magic behavior. If you put the meat of the function in-line with your code instead of in a function you'd remove the function call part and would make this solution similar in byte code to the other solutions.

Additionally this solution allows for more variations. Such as adding a test for tabs

or astr[idx] == '\t':

Or you can test the entire data as iterable once instead of checking if each line is iterable. Remember things like ""[0] raises an exception whereas ""[0:] does not.

If you wanted to push the solution to inline you could go the non-Pythonic route:

i = 0while i < len(s) and s[i] == ' ': i += 1print i
3

. .

Solution 5:

import re
defprefix_length(s):
   m = re.match('(\s+)', s)
   if m:
      returnlen(m.group(0))
   return0

Post a Comment for "Returning The Lowest Index For The First Non Whitespace Character In A String In Python"