How To Chunk A Csv (dict)reader Object In Python 3.2?
I try to use Pool from the multiprocessing module to speed up reading in large csv files. For this, I adapted an example (from py2k), but it seems like the csv.dictreader object ha
Solution 1:
From the csv.DictReader
documentation (and the csv.reader
class it subclasses), the class returns an iterator. The code should have thrown a TypeError
when you called len()
.
You can still chunk the data, but you'll have to read it entirely into memory. If you're concerned about memory you can switch from csv.DictReader
to csv.reader
and skip the overhead of the dictionaries csv.DictReader
creates. To improve readability in csv2nodes()
, you can assign constants to address each field's index:
CELL = 0
SEQ_EI = 1
DAT_DEB_OCCUPATION = 4
DAT_FIN_OCCUPATION = 5
I also recommend using a different variable than id
, since that's a built-in function name.
Post a Comment for "How To Chunk A Csv (dict)reader Object In Python 3.2?"