Skip to content Skip to sidebar Skip to footer

Python Collections.counter Efficiency

I am using the following code to implement a function which finds all anagrams of string p in a string s. class Solution(object): def findAnagrams(self, s, p): '''

Solution 1:

To see why some code runs faster than other code, you should profile it. In Python, the easiest way to get started with profiling is to run:

python -m cProfile <script.py>

In my case, I wrote a simple script that calls either the slow solution or the fast solution:

# Pasted code from original question.# Also renamed the slow version to `SlowSolution` and the fast version to `FastSolution`.
...

# solution = FastSolution()
solution = SlowSolution()

print(solution.findAnagrams('abcdefg' + 'a' * 10000, 'gfedcba' + 'a' * 10000))

Then I just ran the script using SlowSolution and FastSolution. Here's the output of my profiler results using SlowSolution:

$ python -m cProfile counter.py
[0]
         100204functioncalls (100192 primitive calls) in 2.557 secondsOrderedby: standardnamencallstottimepercallcumtimepercallfilename:lineno(function)
    10008    0.015    0.000    2.538    0.000 __init__.py:516(__init__)
    10008    0.009    0.000    2.522    0.000 __init__.py:585(update)
        7    0.000    0.000    0.000    0.000 _collections_abc.py:392(__subclasshook__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:16(__init__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:20(__enter__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:26(__exit__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:36(__init__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:52(_commit_removals)
        9    0.000    0.000    0.000    0.000 _weakrefset.py:58(__iter__)
    20022    0.007    0.000    0.007    0.000 _weakrefset.py:70(__contains__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:81(add)
    10008    0.010    0.000    0.017    0.000 abc.py:178(__instancecheck__)
      7/1    0.000    0.000    0.000    0.000 abc.py:194(__subclasscheck__)
        1    0.000    0.000    2.557    2.557 counter.py:1(<module>)
        1    0.000    0.000    0.000    0.000 counter.py:17(FastSolution)
        1    0.000    0.000    0.000    0.000 counter.py:3(SlowSolution)
        1    0.017    0.017    2.556    2.556 counter.py:4(findAnagrams)
    10008    2.490    0.000    2.490    0.000 {built-in method _collections._count_elements}
        20.0000.0000.0000.000 {built-in method builtins.__build_class__}
        10.0000.0002.5572.557 {built-in method builtins.exec}
        70.0000.0000.0000.000 {built-in method builtins.getattr}
    100080.0050.0000.0220.000 {built-in method builtins.isinstance}
      8/20.0000.0000.0000.000 {built-in method builtins.issubclass}
    300240.0030.0000.0030.000 {built-in method builtins.len}
        10.0000.0000.0000.000 {built-in method builtins.print}
        70.0000.0000.0000.000 {method '__subclasses__' of 'type' objects}
       140.0000.0000.0000.000 {method 'add' of 'set' objects}
        10.0000.0000.0000.000 {method 'append' of 'list' objects}
        10.0000.0000.0000.000 {method 'disable' of '_lsprof.Profiler' objects}
        70.0000.0000.0000.000 {method 'remove' of 'set' objects}

and FastSolution:

$ python -m cProfile counter.py
[0]
         146functioncalls (134 primitive calls) in 0.005 secondsOrderedby: standardnamencallstottimepercallcumtimepercallfilename:lineno(function)
        2    0.000    0.000    0.001    0.000 __init__.py:516(__init__)
        7    0.000    0.000    0.000    0.000 __init__.py:536(__missing__)
        2    0.000    0.000    0.001    0.000 __init__.py:585(update)
        7    0.000    0.000    0.000    0.000 _collections_abc.py:392(__subclasshook__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:16(__init__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:20(__enter__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:26(__exit__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:36(__init__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:52(_commit_removals)
        9    0.000    0.000    0.000    0.000 _weakrefset.py:58(__iter__)
        8    0.000    0.000    0.000    0.000 _weakrefset.py:70(__contains__)
        7    0.000    0.000    0.000    0.000 _weakrefset.py:81(add)
        1    0.000    0.000    0.000    0.000 abc.py:178(__instancecheck__)
      7/1    0.000    0.000    0.000    0.000 abc.py:194(__subclasscheck__)
        1    0.000    0.000    0.005    0.005 counter.py:1(<module>)
        1    0.000    0.000    0.000    0.000 counter.py:17(FastSolution)
        1    0.004    0.004    0.005    0.005 counter.py:18(findAnagrams)
        1    0.000    0.000    0.000    0.000 counter.py:3(SlowSolution)
        1    0.001    0.001    0.001    0.001 {built-in method _collections._count_elements}
        20.0000.0000.0000.000 {built-in method builtins.__build_class__}
        10.0000.0000.0050.005 {built-in method builtins.exec}
        70.0000.0000.0000.000 {built-in method builtins.getattr}
        10.0000.0000.0000.000 {built-in method builtins.isinstance}
      8/20.0000.0000.0000.000 {built-in method builtins.issubclass}
        60.0000.0000.0000.000 {built-in method builtins.len}
        10.0000.0000.0000.000 {built-in method builtins.print}
        70.0000.0000.0000.000 {method '__subclasses__' of 'type' objects}
       140.0000.0000.0000.000 {method 'add' of 'set' objects}
        10.0000.0000.0000.000 {method 'append' of 'list' objects}
        10.0000.0000.0000.000 {method 'disable' of '_lsprof.Profiler' objects}
        70.0000.0000.0000.000 {method 'remove' of 'set' objects}

The output can be a little strange to read at first, but we're really interested in the tottime column. That tells us how much total time we spent inside of a particular function.

As you can see, the script spends almost all of its time inside of {built-in method _collections._count_elements}. That's an internal method used by a Counter which we can infer gets called every time you create a counter (like collections.Counter(p)).

To make the code faster, you should call collections.Counter(...) fewer times and/or with shorter strings. In the slow version, you're counting len(p) characters len(s) times. This has a runtime of O(sp) which is quadratic and explainswhy it's so slow on large inputs.

On the other hand, the faster solution counts each character of s exactly once which gives it a runtime of O(s + p). This is much faster and will scale with much larger inputs.

For more info on profiling in Python, see How can you profile a python script?

Solution 2:

There is more overhead in creating, counting, and comparing Counter objects than for lists. That is the essence of what you are seeing. If you want a faster method still, you can accomplish the anagram finder by building the permutations of p as a tuple, then checking the slices of s against the tuple.

classSolution(object):
    deffindAnagrams(self, s, p):
        """
        :type s: str
        :type p: str
        :rtype: List[int]
        """
        ans = list()
        pcnt = collections.Counter(p)
        for i inrange(len(s)):
            if collections.Counter(s[i:i+len(p)]) == pcnt:
                ans.append(i)
        return ans

    deffindAnagrams2(self, s, p):
        ls, lp = len(s), len(p)
        cp = collections.Counter(p)
        cs = collections.Counter()
        ans = []
        for i inrange(ls):
            cs[s[i]] += 1if i >= lp:
                cs[s[i - lp]] -= 1if cs[s[i - lp]] == 0:
                    del cs[s[i - lp]]
            if cs == cp:
                ans.append(i - lp + 1)
        return ans

    deffindAnagrams3(self, s, p):
        p_all = tuple(''.join(x) for x in permutations(p,len(p)))
        return [i for i inrange(len(s)) if s[i:i+len(p)] in p_all]

Here is a short comparison of the 3 methods using timeit in IPython:

In [33]: %%timeit
    ...: sol.findAnagrams('hello world he said eh', 'he')
    ...:
1000 loops, best of 3: 259 µs per loop

In [34]: %%timeit
    ...: sol.findAnagrams2('hello world he said eh', 'he')
    ...:
10000 loops, best of 3: 102 µs per loop

In [35]: %%timeit
    ...: sol.findAnagrams3('hello world he said eh', 'he')
    ...:
100000 loops, best of 3: 15.5 µs per loop

Post a Comment for "Python Collections.counter Efficiency"