Scan Subset Of Pd Dataframe To Obtain Indices Matching Certain Values

December 12, 2023 Post a Comment

I have a dataframe. Some of the columns should have only 0s or 1s. I need to find the columns that have a number other than 0 or 1 and remove that entire row from the original data

Solution 1:

Sample:

data = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'D':[1,0,1,0,1,0],
         'E':[1,0,0,1,2,4],

})

print (data)
   A  B  D  E
0  a  4111  b  5002  c  4103  d  5014  e  5125  f  404

If need only 1 and 0 values use DataFrame.isin with DataFrame.all for test if all Trues per rows:

subset = data.iloc[:,2:]
data3 = data[subset.isin([0,1]).all(axis=1)]
print (data3)

   A  B  D  E
0  a  4111  b  5002  c  4103  d  501

Details:

print (subset.isin([0,1]))
      D      E
0TrueTrue1TrueTrue2TrueTrue3TrueTrue4TrueFalse5TrueFalseprint (subset.isin([0,1]).all(axis=1))
0True1True2True3True4False5False
dtype: bool

Solution 2:

Your subset is a pd.DataFrame, not a pd.Series. The conditional testing you are doing for index would work if subset were a Series (i.e. if you were only checking the condition on a single column, not multiple columns).

So having subset as a DataFrame is fine, but it changes how the conditional slice works. My testing shows your index var returns NaN for 0s and 1s, (rather than leaving them out like a slice of a Series would). Adding dropna() as below should fix your code:

#find indices:index = subset[ (subset!= 0) & (subset!= 1)].dropna().index

#remove rows from orig data set:data = data.drop(index)

Solution 3:

From you code I made a calculated guess that you want to compare for more than 1 columns.

This should do the trick

Baca Juga

# Selects only elements that are 0 or 1val = np.isin(subset, np.array([0, 1]))

# Generate indexindex = np.prod(val, axis=1) > 0# Select only desired columnsdata = data[index]

Example

# Data
   a  b  c01111222231334334531# Removing rows that have elements other than 1 or 2
   a  b  c01111222

Solution 4:

Without your data from DataSet.csv, I tried to make a guess.

subset[ (subset!= 0) & (subset!= 1)] basically returns the subset dataframe with values False on (subset!= 0) & (subset!= 1) turning to NaN while those True keeping same values. I.e. this is equivalent to map. It is not a filter.

Therefore, subset[ (subset!= 0) & (subset!= 1)].index is the whole index of your data dataframe

You drop it, so it returns empty dataframe

Learn Python Tutorials

Scan Subset Of Pd Dataframe To Obtain Indices Matching Certain Values

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "Scan Subset Of Pd Dataframe To Obtain Indices Matching Certain Values"