Sum In Spark Gone Bad

August 29, 2023 Post a Comment

Based on Unbalanced factor of KMeans?, I am trying to compute the Unbalanced Factor, but I fail. Every element of the RDD r2_10 is a pair, where the key is cluster and the value is

Solution 1:

The problem is because you missed to count the number of points grouped in each cluster, thus you have to change how pdd was created.

pdd = r2_10.map(lambda x: (x[0], len(x[1]))).reduceByKey(lambda a, b: a + b)

However, You could obtain the same result in a single pass (without computing pdd), by mapping the values of the RDD and then reducing by using sum.

total = r2_10.map(lambda x: len(x[1])).sum()

Baca Juga

How To Convert From Atan To Atan2?
Running Function 5 Seconds After Pygtk Widget Is Shown
Rnn Model Predicting Only One Class?

Learn Python Tutorials

Sum In Spark Gone Bad

Solution 1:

Post a Comment for "Sum In Spark Gone Bad"