Python Collections.counter Efficiency
Solution 1:
To see why some code runs faster than other code, you should profile it. In Python, the easiest way to get started with profiling is to run:
python -m cProfile <script.py>
In my case, I wrote a simple script that calls either the slow solution or the fast solution:
# Pasted code from original question.# Also renamed the slow version to `SlowSolution` and the fast version to `FastSolution`.
...
# solution = FastSolution()
solution = SlowSolution()
print(solution.findAnagrams('abcdefg' + 'a' * 10000, 'gfedcba' + 'a' * 10000))
Then I just ran the script using SlowSolution
and FastSolution
. Here's the output of my profiler results using SlowSolution
:
$ python -m cProfile counter.py
[0]
100204functioncalls (100192 primitive calls) in 2.557 secondsOrderedby: standardnamencallstottimepercallcumtimepercallfilename:lineno(function)
10008 0.015 0.000 2.538 0.000 __init__.py:516(__init__)
10008 0.009 0.000 2.522 0.000 __init__.py:585(update)
7 0.000 0.000 0.000 0.000 _collections_abc.py:392(__subclasshook__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:16(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:20(__enter__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:26(__exit__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:52(_commit_removals)
9 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__)
20022 0.007 0.000 0.007 0.000 _weakrefset.py:70(__contains__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:81(add)
10008 0.010 0.000 0.017 0.000 abc.py:178(__instancecheck__)
7/1 0.000 0.000 0.000 0.000 abc.py:194(__subclasscheck__)
1 0.000 0.000 2.557 2.557 counter.py:1(<module>)
1 0.000 0.000 0.000 0.000 counter.py:17(FastSolution)
1 0.000 0.000 0.000 0.000 counter.py:3(SlowSolution)
1 0.017 0.017 2.556 2.556 counter.py:4(findAnagrams)
10008 2.490 0.000 2.490 0.000 {built-in method _collections._count_elements}
20.0000.0000.0000.000 {built-in method builtins.__build_class__}
10.0000.0002.5572.557 {built-in method builtins.exec}
70.0000.0000.0000.000 {built-in method builtins.getattr}
100080.0050.0000.0220.000 {built-in method builtins.isinstance}
8/20.0000.0000.0000.000 {built-in method builtins.issubclass}
300240.0030.0000.0030.000 {built-in method builtins.len}
10.0000.0000.0000.000 {built-in method builtins.print}
70.0000.0000.0000.000 {method '__subclasses__' of 'type' objects}
140.0000.0000.0000.000 {method 'add' of 'set' objects}
10.0000.0000.0000.000 {method 'append' of 'list' objects}
10.0000.0000.0000.000 {method 'disable' of '_lsprof.Profiler' objects}
70.0000.0000.0000.000 {method 'remove' of 'set' objects}
and FastSolution
:
$ python -m cProfile counter.py
[0]
146functioncalls (134 primitive calls) in 0.005 secondsOrderedby: standardnamencallstottimepercallcumtimepercallfilename:lineno(function)
2 0.000 0.000 0.001 0.000 __init__.py:516(__init__)
7 0.000 0.000 0.000 0.000 __init__.py:536(__missing__)
2 0.000 0.000 0.001 0.000 __init__.py:585(update)
7 0.000 0.000 0.000 0.000 _collections_abc.py:392(__subclasshook__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:16(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:20(__enter__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:26(__exit__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:36(__init__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:52(_commit_removals)
9 0.000 0.000 0.000 0.000 _weakrefset.py:58(__iter__)
8 0.000 0.000 0.000 0.000 _weakrefset.py:70(__contains__)
7 0.000 0.000 0.000 0.000 _weakrefset.py:81(add)
1 0.000 0.000 0.000 0.000 abc.py:178(__instancecheck__)
7/1 0.000 0.000 0.000 0.000 abc.py:194(__subclasscheck__)
1 0.000 0.000 0.005 0.005 counter.py:1(<module>)
1 0.000 0.000 0.000 0.000 counter.py:17(FastSolution)
1 0.004 0.004 0.005 0.005 counter.py:18(findAnagrams)
1 0.000 0.000 0.000 0.000 counter.py:3(SlowSolution)
1 0.001 0.001 0.001 0.001 {built-in method _collections._count_elements}
20.0000.0000.0000.000 {built-in method builtins.__build_class__}
10.0000.0000.0050.005 {built-in method builtins.exec}
70.0000.0000.0000.000 {built-in method builtins.getattr}
10.0000.0000.0000.000 {built-in method builtins.isinstance}
8/20.0000.0000.0000.000 {built-in method builtins.issubclass}
60.0000.0000.0000.000 {built-in method builtins.len}
10.0000.0000.0000.000 {built-in method builtins.print}
70.0000.0000.0000.000 {method '__subclasses__' of 'type' objects}
140.0000.0000.0000.000 {method 'add' of 'set' objects}
10.0000.0000.0000.000 {method 'append' of 'list' objects}
10.0000.0000.0000.000 {method 'disable' of '_lsprof.Profiler' objects}
70.0000.0000.0000.000 {method 'remove' of 'set' objects}
The output can be a little strange to read at first, but we're really interested in the tottime
column. That tells us how much total time we spent inside of a particular function.
As you can see, the script spends almost all of its time inside of {built-in method _collections._count_elements}
. That's an internal method used by a Counter
which we can infer gets called every time you create a counter (like collections.Counter(p)
).
To make the code faster, you should call collections.Counter(...)
fewer times and/or with shorter strings. In the slow version, you're counting len(p)
characters len(s)
times. This has a runtime of O(sp)
which is quadratic and explainswhy it's so slow on large inputs.
On the other hand, the faster solution counts each character of s
exactly once which gives it a runtime of O(s + p)
. This is much faster and will scale with much larger inputs.
For more info on profiling in Python, see How can you profile a python script?
Solution 2:
There is more overhead in creating, counting, and comparing Counter
objects than for lists. That is the essence of what you are seeing. If you want a faster method still, you can accomplish the anagram finder by building the permutations of p
as a tuple, then checking the slices of s
against the tuple.
classSolution(object):
deffindAnagrams(self, s, p):
"""
:type s: str
:type p: str
:rtype: List[int]
"""
ans = list()
pcnt = collections.Counter(p)
for i inrange(len(s)):
if collections.Counter(s[i:i+len(p)]) == pcnt:
ans.append(i)
return ans
deffindAnagrams2(self, s, p):
ls, lp = len(s), len(p)
cp = collections.Counter(p)
cs = collections.Counter()
ans = []
for i inrange(ls):
cs[s[i]] += 1if i >= lp:
cs[s[i - lp]] -= 1if cs[s[i - lp]] == 0:
del cs[s[i - lp]]
if cs == cp:
ans.append(i - lp + 1)
return ans
deffindAnagrams3(self, s, p):
p_all = tuple(''.join(x) for x in permutations(p,len(p)))
return [i for i inrange(len(s)) if s[i:i+len(p)] in p_all]
Here is a short comparison of the 3 methods using timeit
in IPython:
In [33]: %%timeit
...: sol.findAnagrams('hello world he said eh', 'he')
...:
1000 loops, best of 3: 259 µs per loop
In [34]: %%timeit
...: sol.findAnagrams2('hello world he said eh', 'he')
...:
10000 loops, best of 3: 102 µs per loop
In [35]: %%timeit
...: sol.findAnagrams3('hello world he said eh', 'he')
...:
100000 loops, best of 3: 15.5 µs per loop
Post a Comment for "Python Collections.counter Efficiency"