Finding Intersection/difference Between Python Lists
I have two python lists: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)] b = ['the', 'when', 'send', 'we', 'us'] I need to filter out all the elements from
Solution 1:
A list comprehension will work.
a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
filtered = [i for i in a if not i[0] in b]
>>>print(filtered)
[('why', 4), ('throw', 9), ('you', 1)]
Solution 2:
A list comprehension should work:
c = [item for item in a if item[0] not in b]
Or with a dictionary comprehension:
d = dict(a)
c = {key: value for key in d.iteritems() if key not in b}
Solution 3:
in
is nice, but you should use sets at least for b
. If you have numpy, you could also try np.in1d
of course, but if it is faster or not, you should probably try.
# ruthless copy, but use the set...
b = set(b)
filtered = [i for i in a if not i[0] in b]
# with numpy (note if you create the array like this, you must already put
# the maximum string length, here 10), otherwise, just use an object array.
# its slower (likely not worth it), but safe.
a = np.array(a, dtype=[('key', 's10'), ('val', int)])
b = np.asarray(b)
mask = ~np.in1d(a['key'], b)
filtered = a[mask]
Sets also have have the methods difference
, etc. which probably are not to useful here, but in general probably are.
Solution 4:
As this is tagged with numpy
, here is a numpy solution using numpy.in1d
benchmarked against the list comprehension:
In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
In [2]: b = ['the', 'when', 'send', 'we', 'us']
In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])
In [4]: b_ar = np.array(b)
In [5]: %timeit filtered = [i for i in a if not i[0] in b]
1000000 loops, best of 3: 778 ns per loop
In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
10000 loops, best of 3: 31.4 us per loop
So for 5 records the list comprehension is faster.
However for large data sets the numpy solution is twice as fast as the list comprehension:
In [7]: a = a * 1000
In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])
In [9]: %timeit filtered = [i for i in a if not i[0] in b]
1000 loops, best of 3: 647 us per loop
In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
1000 loops, best of 3: 302 us per loop
Solution 5:
Try this :
a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
c=[]
for x in a:
if x[0] not in b:
c.append(x)
print c
Demo: http://ideone.com/zW7mzY
Post a Comment for "Finding Intersection/difference Between Python Lists"