Skip to content Skip to sidebar Skip to footer

Python Function To Read Variable Length Blocks Of Data From File While Open

I have data files that contain data for many timesteps, with each timestep formatted in a block like this: TIMESTEP PARTICLES 0.00500103 1262 ID GROUP VOLUME MASS PX PY PZ

Solution 1:

You can use the max_rows argument of numpy.genfromtxt:

withopen("timesteps.dat", "rb") as f:
    whileTrue:
        line = f.readline()
        iflen(line) == 0:
            # End of filebreak# Skip blank lineswhilelen(line.strip()) == 0:
            line = f.readline()
        line2_fields = f.readline().split()
        timestep = float(line2_fields[0])
        particles = int(line2_fields[1])
        data = np.genfromtxt(f, names=True, dtype=None, max_rows=particles)

        print("Timestep:", timestep)
        print("Particles:", particles)
        print("Data:")
        print(data)
        print()

Here's a sample file:

TIMESTEP  PARTICLES
0.005001034
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
65105.23599e-070.000397935-0.084626-0.03478490.0018816400-1.0490343005.23599e-070.000397935-0.0837742-0.04422930.012104600-1.0490338405.23599e-070.000397935-0.0749234-0.03956520.014340100-1.0490397105.23599e-070.000397935-0.0954931-0.01596070.010015500-1.04903
TIMESTEP  PARTICLES
0.005001035
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
97105.23599e-070.000397935-0.0954931-0.01596070.010015500-1.0490365205.23599e-070.000397935-0.084626-0.03478490.0018816400-1.0490343105.23599e-070.000397935-0.0837742-0.04422930.012104600-1.0490338505.23599e-070.000397935-0.0749234-0.03956520.014340100-1.0490397205.23599e-070.000397935-0.0954931-0.01596070.010015500-1.04903

TIMESTEP  PARTICLES
0.005001033
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
22205.23599e-070.000397935-0.0837742-0.04422930.012104600-1.0490333305.23599e-070.000397935-0.0749234-0.03956520.014340100-1.0490344405.23599e-070.000397935-0.0954931-0.01596070.010015500-1.04903

And here is the output:

Timestep: 0.00500103Particles: 4Data:
[ (651, 0, 5.23599e-07, 0.000397935, -0.084626, -0.0347849, 0.00188164, 0, 0, -1.04903)
 (430, 0, 5.23599e-07, 0.000397935, -0.0837742, -0.0442293, 0.0121046, 0, 0, -1.04903)
 (384, 0, 5.23599e-07, 0.000397935, -0.0749234, -0.0395652, 0.0143401, 0, 0, -1.04903)
 (971, 0, 5.23599e-07, 0.000397935, -0.0954931, -0.0159607, 0.0100155, 0, 0, -1.04903)]

Timestep: 0.00500103Particles: 5Data:
[ (971, 0, 5.23599e-07, 0.000397935, -0.0954931, -0.0159607, 0.0100155, 0, 0, -1.04903)
 (652, 0, 5.23599e-07, 0.000397935, -0.084626, -0.0347849, 0.00188164, 0, 0, -1.04903)
 (431, 0, 5.23599e-07, 0.000397935, -0.0837742, -0.0442293, 0.0121046, 0, 0, -1.04903)
 (385, 0, 5.23599e-07, 0.000397935, -0.0749234, -0.0395652, 0.0143401, 0, 0, -1.04903)
 (972, 0, 5.23599e-07, 0.000397935, -0.0954931, -0.0159607, 0.0100155, 0, 0, -1.04903)]

Timestep: 0.00500103Particles: 3Data:
[ (222, 0, 5.23599e-07, 0.000397935, -0.0837742, -0.0442293, 0.0121046, 0, 0, -1.04903)
 (333, 0, 5.23599e-07, 0.000397935, -0.0749234, -0.0395652, 0.0143401, 0, 0, -1.04903)
 (444, 0, 5.23599e-07, 0.000397935, -0.0954931, -0.0159607, 0.0100155, 0, 0, -1.04903)]

Solution 2:

The with does not loop, it will just make sure the file is properly closed afterwards.

To loop you'll need to add a while just after the with statement (see the code below). But before you can do that you'll need to check in the readBlock(f) function for an end of file (EOF). Replace line = f.readline().strip() with this code:

line = f.readline()
ifnot line:
    # EOF: returning None's.returnNone, None, None# We do the strip after the check.# Otherwise a blank line "\n" might be interpreted as EOF.
line = line.strip()

So adding the while loop in the with block and checking if we get None back indicating an EOF and so we can break out of the while loop:

withopen('file1') as file_handle:
    whileTrue:
        startWallTime = time.clock()

        Timestep, numParticles, particleData = readBlock(file_handle)
        if Timestep == None:
            breakprint(Timestep)

        ## Do processing stuff here print("Timestep Processed")

        endWallTime = time.clock()

Solution 3:

Here'a quick-n-dirty test (it worked on the 2nd try!)

import numpy as np

withopen('stack41091659.txt','rb') as f:
    while f.readline():    # read the 'TIMESTEP  PARTICLES' line
        time, n = f.readline().strip().split()
        n = int(n)
        print(time, n)
        ablock = [f.readline()]  # block header linefor i inrange(n):
            ablock.append(f.readline())
        print(len(ablock))
        data = np.genfromtxt(ablock, dtype=None, names=True)
        print(data.shape, data.dtype)

test run:

1458:~/mypy$ python3 stack41091659.py 
b'0.00500103'45
(4,) [('ID', '<i4'), ('GROUP', '<i4'), ('VOLUME', '<f8'), ('MASS', '<f8'), ('PX', '<f8'), ('PY', '<f8'), ('PZ', '<f8'), ('VX', '<i4'), ('VY', '<i4'), ('VZ', '<f8')]
b'0.00500103'34
(3,) [('ID', '<i4'), ('GROUP', '<i4'), ('VOLUME', '<f8'), ('MASS', '<f8'), ('PX', '<f8'), ('PY', '<f8'), ('PZ', '<f8'), ('VX', '<i4'), ('VY', '<i4'), ('VZ', '<f8')]
b'0.00500103'23
(2,) [('ID', '<i4'), ('GROUP', '<i4'), ('VOLUME', '<f8'), ('MASS', '<f8'), ('PX', '<f8'), ('PY', '<f8'), ('PZ', '<f8'), ('VX', '<i4'), ('VY', '<i4'), ('VZ', '<f8')]
b'0.00500103'45
(4,) [('ID', '<i4'), ('GROUP', '<i4'), ('VOLUME', '<f8'), ('MASS', '<f8'), ('PX', '<f8'), ('PY', '<f8'), ('PZ', '<f8'), ('VX', '<i4'), ('VY', '<i4'), ('VZ', '<f8')]

Sample file:

TIMESTEP  PARTICLES
0.005001034
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
65105.23599e-070.000397935-0.084626-0.03478490.0018816400-1.0490343005.23599e-070.000397935-0.0837742-0.04422930.012104600-1.0490338405.23599e-070.000397935-0.0749234-0.03956520.014340100-1.0490397105.23599e-070.000397935-0.0954931-0.01596070.010015500-1.04903
TIMESTEP  PARTICLES
0.005001033
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
65105.23599e-070.000397935-0.084626-0.03478490.0018816400-1.0490343005.23599e-070.000397935-0.0837742-0.04422930.012104600-1.0490338405.23599e-070.000397935-0.0749234-0.03956520.014340100-1.04903
TIMESTEP  PARTICLES
0.005001032
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
38405.23599e-070.000397935-0.0749234-0.03956520.014340100-1.0490397105.23599e-070.000397935-0.0954931-0.01596070.010015500-1.04903
TIMESTEP  PARTICLES
0.005001034
ID  GROUP  VOLUME  MASS  PX  PY  PZ  VX  VY  VZ
65105.23599e-070.000397935-0.084626-0.03478490.0018816400-1.0490343005.23599e-070.000397935-0.0837742-0.04422930.012104600-1.0490338405.23599e-070.000397935-0.0749234-0.03956520.014340100-1.0490397105.23599e-070.000397935-0.0954931-0.01596070.010015500-1.04903

I'm using the fact that genfromtxt is happy with anything that feeds it a block of lines. Here I collect the next block in a list, and pass it to genfromtxt.

And using the max_rows parameter of genfromtxt, I can tell it to read the next n rows directly:

withopen('stack41091659.txt','rb') as f:
    while f.readline():
        time, n = f.readline().strip().split()
        n = int(n)
        print(time, n)
        data = np.genfromtxt(f, dtype=None, names=True, max_rows=n)
        print(data.shape, len(data.dtype.names))

I'm not taking into account that optional blank line. Probably could squeeze that in at the start of the block read. I.e. Readlines until I get one with the valid float int pair of strings.

Post a Comment for "Python Function To Read Variable Length Blocks Of Data From File While Open"