Discard Points With X,y Coordinate Close To Eachother In Dataframe

November 21, 2023 Post a Comment

I have the following dataframe (it is actually several hundred MB long): X Y Size 0 10 20 5 1 11 21 2 2 9 35 1 3 8 7 7 4 9 19 2 I want discard any

Solution 1:

As the question is stated, the behavior of the desired algorithm is not clear about how to deal with the chaining of distances.

If chaining is allowed, one solution is to cluster the dataset using a density-based clustering algorithm such as DBSCAN.

You just need to set the neighboorhood radius epsto delta and the min_sample parameter to 1 to allow isolated points as clusters. Then, you can find in each group which point has the maximum size.

from sklearn.cluster import DBSCAN
X = df[['X', 'Y']]
db = DBSCAN(eps=3, min_samples=1).fit(X)
df['grp'] = db.labels_
df_new = df.loc[df.groupby('grp').idxmax()['Size']]
print(df_new)

>>>
    X   Y  Size  grp
010205029351138772

Solution 2:

You can use below script and also try improving it.

Baca Juga

#get all euclidean distances using sklearn; #it will create an array of euc distances; #then get index from df whose euclidean distance is less than 3from sklearn.metrics.pairwise import euclidean_distances
Z = df[['X', 'Y']]
euc = euclidean_distances(Z, Z)
idx = [(i, j) for i inrange(len(euc)-1) for j inrange(i+1, len(euc)) if euc[i, j] < 3]

# collect all index of df that has euc dist < 3 and get the max value# then collect all index in df NOT in euc and add the row with max size# create a new called df_new by combining the rest in df and row with max sizefrom itertools import chain
df_idx  = list(set(chain(*idx)))
df2 = df.iloc[df_idx]
idx_max = df2[df2['Size'] == df2['Size'].max()].index.tolist()
df_new = pd.concat([df.iloc[~df.index.isin(df_idx)], df2.iloc[idx_max]])  
df_new

Result:

    X   Y  Size
2   9   35  1
3   8    7  7
0   10  20  5

Learn Python Tutorials

Discard Points With X,y Coordinate Close To Eachother In Dataframe

Solution 1:

Solution 2:

Post a Comment for "Discard Points With X,y Coordinate Close To Eachother In Dataframe"