Iterating Across Multiple Columns In Pandas Df And Slicing Dynamically
TLDR: How to iterate across all options of multiple columns in a pandas dataframe without specifying the columns or their values explicitly? Long Version: I have a pandas dataframe
Solution 1:
You can use itertools.product
to generate all possible dosage combinations, and DataFrame.query
to do the selection:
from itertools import product
for dosage_comb in product(*dict_of_dose_ranges.values()):
dosage_items = zip(dict_of_dose_ranges.keys(), dosage_comb)
query_str = ' & '.join('{} == {}'.format(*x) for x in dosage_items)
sub_df = dosage_df.query(query_str)
# Do Stuff...
Solution 2:
What about using the underlying numpy array and some boolean logic to build an array containing only the lines you want ?
dosage_df = pd.DataFrame((np.random.rand(40000,10)*100).astype(np.int))
dict_of_dose_ranges={3:[10,11,12,13,15,20],4:[20,22,23,24]}
#combined_doses will be bool array that will select all the lines that match the wanted combinations of doses
combined_doses=np.ones(dosage_df.shape[0]).astype(np.bool)
for item in dict_of_dose_ranges.items():
#item[0] is the kind of dose#item[1] are the values of that kind of dose
next_dose=np.zeros(dosage_df.shape[0]).astype(np.bool)
#we then iterate over the wanted valuesfor value in item[1]:
# we select and "logical or" all lines matching the values
next_dose|=(dosage_df[item[0]] == value)
# we "logical and" all the kinds of dose
combined_doses&=next_dose
print(dosage_df[combined_doses])
Post a Comment for "Iterating Across Multiple Columns In Pandas Df And Slicing Dynamically"