Select Rows From A DataFrame Based On String Values In A Column In Pandas

April 26, 2023 Post a Comment

How to select rows from a DataFrame based on string values in a column in pandas? I just want to display the just States only which are in all CAPS. The states have the total numbe

Solution 1:

You can write a function to be applied to each value in the States/cities column. Have the function return either True or False, and the result of applying the function can act as a Boolean filter on your DataFrame.

This is a common pattern when working with pandas. In your particular case, you could check for each value in States/cities whether it's made of only uppercase letters.

So for example:

def is_state_abbrev(string):
    return string.isupper()

filter = d['States/cities'].apply(is_state_abbrev)
filtered_df = d[filter]

Here filter will be a pandas Series with True and False values.

You can also achieve the same result by using a lambda expression, as in:

filtered_df = d[d['States/cities'].apply(lambda x: x.isupper())]

This does essentially the same thing.

Solution 2:

Consider pandas.Series.str.match passing a regex for only [A-Z]

states[states['States/cities'].str.match('^.*[A-Z]$')]

#   States/cities  B  C  D
# 0            FL  3  5  6
# 4            CA  8  3  2
# 7            WA  4  2  1

Data

from io import StringIO
import pandas as pd

txt = '''"States/cities"           B  C   D
0  FL                   3  5   6
1  Orlando              1  2   3
2  Miami                1  1   3
3  Jacksonville         1  2   0
4  CA                   8  3   2
5  "San diego"            3  1   0
6  "San Francisco"        5  2   2
7  WA                   4  2   1
8  Seattle              3  1   0 
9  Tacoma               1  1   1'''

states = pd.read_table(StringIO(txt), sep="\s+")

Baca Juga

Solution 3:

You can get the rows with all uppercase values in the column States/cities like this:

df.loc[df['States/cities'].str.isupper()]

  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

Just to be safe, you can add a condition so that it only returns the rows where 'States/cities' is uppercase and only 2 characters long (in case you had a value that was SEATTLE or something like that):

df.loc[(df['States/cities'].str.isupper()) & (df['States/cities'].apply(len) == 2)]

Solution 4:

You can use str.contains to filter any row that contains small alphabets

df[~df['States/cities'].str.contains('[a-z]')]

    States/cities   B   C   D
0   FL              3   5   6
4   CA              8   3   2
7   WA              4   2   1

Solution 5:

If we assuming the order is always State followed by the city from the state , we can using where and dropna

df['States/cities']=df['States/cities'].where(df['States/cities'].isin(['FL','CA','WA']))


df.dropna()
df
  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

Or we do str.len

df[df['States/cities'].str.len()==2]
Out[39]: 
  States/cities  B  C  D
0            FL  3  5  6
4            CA  8  3  2
7            WA  4  2  1

Learn Python Tutorials