Finding Intersection/difference Between Python Lists

August 16, 2022 Post a Comment

I have two python lists: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)] b = ['the', 'when', 'send', 'we', 'us'] I need to filter out all the elements from

Solution 1:

A list comprehension will work.

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]
b = ['the', 'when', 'send', 'we', 'us']
filtered = [i for i in a if not i[0] in b]

>>>print(filtered)
[('why', 4), ('throw', 9), ('you', 1)]

Solution 2:

A list comprehension should work:

c = [item for item in a if item[0] not in b]

Or with a dictionary comprehension:

d = dict(a)
c = {key: value for key in d.iteritems() if key not in b}

Solution 3:

in is nice, but you should use sets at least for b. If you have numpy, you could also try np.in1d of course, but if it is faster or not, you should probably try.

# ruthless copy, but use the set...
b = set(b)
filtered = [i for i in a if not i[0] in b]

# with numpy (note if you create the array like this, you must already put
# the maximum string length, here 10), otherwise, just use an object array.
# its slower (likely not worth it), but safe.
a = np.array(a, dtype=[('key', 's10'), ('val', int)])
b = np.asarray(b)

mask = ~np.in1d(a['key'], b)
filtered = a[mask]

Sets also have have the methods difference, etc. which probably are not to useful here, but in general probably are.

Solution 4:

As this is tagged with numpy, here is a numpy solution using numpy.in1d benchmarked against the list comprehension:

Baca Juga

In [1]: a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

In [2]: b = ['the', 'when', 'send', 'we', 'us']

In [3]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])

In [4]: b_ar = np.array(b)

In [5]: %timeit filtered = [i for i in a if not i[0] in b]
1000000 loops, best of 3: 778 ns per loop

In [6]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
10000 loops, best of 3: 31.4 us per loop

So for 5 records the list comprehension is faster.

However for large data sets the numpy solution is twice as fast as the list comprehension:

In [7]: a = a * 1000

In [8]: a_ar = np.array(a, dtype=[('string','|S5'), ('number',float)])

In [9]: %timeit filtered = [i for i in a if not i[0] in b]
1000 loops, best of 3: 647 us per loop

In [10]: %timeit filtered = a_ar[-np.in1d(a_ar['string'], b_ar)]
1000 loops, best of 3: 302 us per loop

Solution 5:

Try this :

a = [('when', 3), ('why', 4), ('throw', 9), ('send', 15), ('you', 1)]

b = ['the', 'when', 'send', 'we', 'us']

c=[]

for x in a:
    if x[0] not in b:
        c.append(x)
print c