Split List By Tuple Separator

December 20, 2023 Post a Comment

I have list: print (L) [('I', 'WW'), ('am', 'XX'), ('newbie', 'YY'), ('.', 'ZZ'), ('You', 'WW'), ('are', 'XX'), ('cool', 'YY'), ('.', 'ZZ')] I want split list to sublists with s

Solution 1:

The for-loop approach will be faster, this requires only one-pass:

>>> defjuan(L, sep):
...     L2 = []
...     sub = []
... for x in L:
...         sub.append(x)
... if x == sep:
...             L2.append(sub)
...             sub = []
... if sub:
...         L2.append(sub)
... return L2
...
>>> juan(L, sep)
[[('I', 'WW'), ('am', 'XX'), ('newbie', 'YY'), ('.', 'ZZ')], [('You', 'WW'), ('are', 'XX'), ('cool', 'YY'), ('.', 'ZZ')]]

Some comparisons:

>>>defjezrael(L, sub):...return [list(g) + [sep] for k, g in groupby(L, lambda x: x==sep) ifnot k]...>>>defcoldspeed(L, sep):...    L2 = []...for i inreversed(L):...if i == sep:...            L2.append([])...        L2[-1].append(i)...return [x[::-1] for x inreversed(L2)]...>>>defpm2ring(L, sep):...    seplist = [sep]...return [list(g) + seplist for k, g in groupby(L, sep.__eq__) ifnot k]...>>>setup = "from __main__ import L, sep, juan, coldspeed, pm2ring, jezrael"

Edit: more timings

>>>defbuzzycoder(L, sep):...    a = []...    length = len(L)...    start = 0...    end = L.index(sep)...if start < length: a.append(L[start:end+1])...    start = end + 1...while start < length:...        end = L.index(sep, start) + 1...        a.append(L[start:end])...        start = end...return a...>>>defsplitList(l, s):...''' l is list, s is separator, simular to split, but keep separator'''...    i = 0...for _ inrange(l.count(s)): # break using slices...        e = l.index(s,i)...yield l[i:e+1] # sublist generator value...        i = e+1...if e+1 < len(l): yield l[e+1:] # pick up...>>>defbharath(x,sep):...    n = [0] + [i+1for i,j inenumerate(x) if j == sep]...    m= list()...for first, last inzip(n, n[1:]):...        m.append(x[first:last])...return m...

And the results:

>>>timeit.timeit("jezrael(L, sep)", setup)
4.1499102029483765
>>>timeit.timeit("pm2ring(L, sep)", setup)
3.3499899921007454
>>>timeit.timeit("coldspeed(L, sep)", setup)
2.868469718960114
>>>timeit.timeit("juan(L, sep)", setup)
1.5428746730322018
>>>timeit.timeit("buzzycoder(L, sep)", setup)
1.5942967369919643
>>>timeit.timeit("list(splitList(L, sep))", setup)
2.7872562300181016
>>>timeit.timeit("bharath(L, sep)", setup)
2.9842335029970855

With a bigger list:

>>>L = L*100000>>>timeit.timeit("jezrael(L, sep)", setup, number=10)
3.3555950550362468
>>>timeit.timeit("pm2ring(L, sep)", setup, number=10)
2.337177241919562
>>>timeit.timeit("coldspeed(L, sep)", setup, number=10)
2.2037084710318595
>>>timeit.timeit("juan(L, sep)", setup, number=10)
1.3625159269431606
>>>timeit.timeit("buzzycoder(L, sep)", setup, number=10)
1.4375156159512699
>>>timeit.timeit("list(splitList(L, sep))", setup, number=10)
1.6824725979240611
>>>timeit.timeit("bharath(L, sep)", setup, number=10)
1.5603888860205188

Caveat

The results do not address performance given the proportion of sep in L, which will affect timings a lot for some of these solutions.

Solution 2:

Your code looks OK to me, but you can speed it up a little by getting rid of that lambda, eg

groupby(L, sep.__eq__)

Not only is the code shorter, it saves the overheads of creating the lambda function, and the relatively slow Python function call.

You could also build [sep] outside the loop, that might save a few microseconds. ;)

from  itertools importgroupbyL= [('I', 'WW'), ('am', 'XX'), ('newbie', 'YY'), ('.', 'ZZ'), 
    ('You', 'WW'), ('are', 'XX'), ('cool', 'YY'), ('.', 'ZZ')]

sep = ('.','ZZ')
seplist = [sep]
new_L = [list(g) + seplist for k, g in groupby(L, sep.__eq__)if not k] 
for row in new_L:
    print(row)

output

[('I', 'WW'), ('am', 'XX'), ('newbie', 'YY'), ('.', 'ZZ')][('You', 'WW'), ('are', 'XX'), ('cool', 'YY'), ('.', 'ZZ')]

Solution 3:

A vanilla for loop should be faster than a groupby.

Baca Juga

L2 = []
for i in L[::-1]:
     if i == ('.','ZZ'):
         L2.append([])

     L2[-1].append(i)

L2 = [x[::-1] for x in L2[::-1]]

A small tweak (may/may-not improve performance - but is more memory efficient) involves the use of reversed:

L2 = []
sep = ('.','ZZ')
for i in reversed(L):
     ifi== sep:
         L2.append([])

     L2[-1].append(i)

L2 = [x[::-1] for x in reversed(L2)]

Another improvement is to reduce the L[-1] reference using another reference:

cache = []
L2 = cachesep= ('.','ZZ')
for i in reversed(L):
     ifi== sep:
         cache = []
         L2.append(cache)

     cache.append(i)

L2 = [x[::-1] for x in reversed(L2)]

Performance

Small

len(L)
8

100000loops,best of 3:5.11µsperloop# groupby

100000 loops, best of 3: 3.54 µs per loop   # loop

Large

len(L)
800000

1loop, best of 3: 435 ms per loop    # groupby

1loop, best of 3: 310 ms per loop    # PM 2Ring's groupby

1loop, best of 3: 250 ms per loop    # loop1loop, best of 3: 235 ms per loop    # loop w/ reverse

Solution 4:

My solution is:

from  itertools import groupby

sep = ('.','ZZ')
new_L = [list(g) + [sep] for k, g in groupby(L, lambda x: x==sep) ifnot k] 
print (new_L)
[[('I', 'WW'), ('am', 'XX'), ('newbie', 'YY'), ('.', 'ZZ')], 
 [('You', 'WW'), ('are', 'XX'), ('cool', 'YY'), ('.', 'ZZ')]]

But I believe better / faster solutions exist too.

Solution 5:

a = list()
start=0
while start< len(l) and (l.index(sep, start) !=-1):
    end= l.index(sep, start) +1
    a.append(l[start:end])
    start=end

This would be my solution. It is simple and readable.

Learn Python Tutorials