Skip to content Skip to sidebar Skip to footer

Scan Subset Of Pd Dataframe To Obtain Indices Matching Certain Values

I have a dataframe. Some of the columns should have only 0s or 1s. I need to find the columns that have a number other than 0 or 1 and remove that entire row from the original data

Solution 1:

Sample:

data = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'D':[1,0,1,0,1,0],
         'E':[1,0,0,1,2,4],

})

print (data)
   A  B  D  E
0  a  4111  b  5002  c  4103  d  5014  e  5125  f  404

If need only 1 and 0 values use DataFrame.isin with DataFrame.all for test if all Trues per rows:

subset = data.iloc[:,2:]
data3 = data[subset.isin([0,1]).all(axis=1)]
print (data3)

   A  B  D  E
0  a  4111  b  5002  c  4103  d  501

Details:

print (subset.isin([0,1]))
      D      E
0TrueTrue1TrueTrue2TrueTrue3TrueTrue4TrueFalse5TrueFalseprint (subset.isin([0,1]).all(axis=1))
0True1True2True3True4False5False
dtype: bool

Solution 2:

Your subset is a pd.DataFrame, not a pd.Series. The conditional testing you are doing for index would work if subset were a Series (i.e. if you were only checking the condition on a single column, not multiple columns).

So having subset as a DataFrame is fine, but it changes how the conditional slice works. My testing shows your index var returns NaN for 0s and 1s, (rather than leaving them out like a slice of a Series would). Adding dropna() as below should fix your code:

#find indices:index = subset[ (subset!= 0) & (subset!= 1)].dropna().index

#remove rows from orig data set:data = data.drop(index)

Solution 3:

From you code I made a calculated guess that you want to compare for more than 1 columns.

This should do the trick

# Selects only elements that are 0 or 1val = np.isin(subset, np.array([0, 1]))

# Generate indexindex = np.prod(val, axis=1) > 0# Select only desired columnsdata = data[index]

Example

# Data
   a  b  c01111222231334334531# Removing rows that have elements other than 1 or 2
   a  b  c01111222

Solution 4:

Without your data from DataSet.csv, I tried to make a guess.

subset[ (subset!= 0) & (subset!= 1)] basically returns the subset dataframe with values False on (subset!= 0) & (subset!= 1) turning to NaN while those True keeping same values. I.e. this is equivalent to map. It is not a filter.

Therefore, subset[ (subset!= 0) & (subset!= 1)].index is the whole index of your data dataframe

You drop it, so it returns empty dataframe

Post a Comment for "Scan Subset Of Pd Dataframe To Obtain Indices Matching Certain Values"