Skip to content Skip to sidebar Skip to footer

Read Files In A Directory Starting By A Specific Date Using Python

I have an automated process and need to perform some operations with files. Another process creates these files and stores them in a directory, I only need to work with recent fil

Solution 1:

There are a lot of redundant operations going on in your code.

For example, the use of fromtimestamp() to calculate total_days_last inside the loop can simply be done once outside of the loop. In fact, the use of datetime functions and mucking about with epoch seems unnecessary because you can simply compare the file ctime values directly.

os.path.getctime() is called twice on every file: once for the sort and a second time to calculate total_days_file.

These repetitive calculations over a large number of files would be part of the performance problem.

Another issue is that, if there are a large number of files, the list files could become very large and require a lot of memory.

if check_empty != "" and check_empty is not None: can simply be written as if check_empty:

Here is a simplified version:

def get_ordered_files():
    last_ctime = os.path.getctime(check_last_successful_file())
    files = glob.glob(files_location + file_extension)
    files.sort(key=os.path.getctime)
    return [f for f in files
                ifos.path.getctime(f) > last_ctime and get_email_account(f)]

This eliminates most of the redundant code but still calls os.path.getctime() twice for each file. To avoid that we can store the ctime for each file on the first occasion it is obtained.

pattern = os.path.join(files_location, file_extension)

def get_ordered_files():
    last_ctime = os.path.getctime(check_last_successful_file())
    files = ((filename, ctime) for filename in glob.iglob(pattern)
                if (ctime := os.path.getctime(filename)) > last_ctime and
                    get_email_account(filename))
    return (filename for filename, _ in sorted(files, key=itemgetter(1)))

Here a generator expression is assigned to files. It uses glob.iglob() which is an iterator version of glob.glob() that does not store all the files at once. Both the file name and its ctime value are stored as tuples. The generator expression filters out files that are too old and files that don't have an associated email account. Finally another generator is returned that sorts the files by ctime. The calling code can then iterate over the generator, or call list() on it to realise it as a list.

Post a Comment for "Read Files In A Directory Starting By A Specific Date Using Python"