Skip to content Skip to sidebar Skip to footer

'KMeansModel' Object Has No Attribute 'computeCost' In Apache Pyspark

I'm experimenting with a clustering model in pyspark. I'm trying to get the mean squared cost of the cluster fit for different values of K def meanScore(k,df): inputCol = df.colu

Solution 1:

It is deprecated in Spark 3.0.0 Docs suggest using the evaluator.

Note Deprecated in 3.0.0. It will be removed in future versions. 
Use ClusteringEvaluator instead. You can also get the cost on the training dataset in the summary.

Solution 2:

As Erkan sirin mentioned computeCost is deprecated in recent version this may help you solve your problem

# Make predictions 
predictions = model.transform(dataset)
from pyspark.ml.evaluation import ClusteringEvaluator
# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))

I hope this helps, you can check official docs for more informations


Solution 3:

As for the current version 3.1.2.

Using KMeans as example, after importing

from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator

Loading data and training, then it's just call 'ClusteringEvaluator()':

# Make predictions
predictions = model.transform(dataset)

# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()

silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))

Solution 4:

Evaluate clustering by computing Silhouette score:

in Spark 3.0.1 and above

print('Silhouette with squared euclidean distance:')
pdt = model.transform(final_data)
from pyspark.ml.evaluation import ClusteringEvaluator
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(pdt)
print(silhouette)

Evaluate clustering With in set sum of squre errors(wssse):

spark 2.2 to 3.0.0

cost = model.computeCost(dataset)
print("Within Set Sum of Squared Errors = " + str(cost))

Post a Comment for "'KMeansModel' Object Has No Attribute 'computeCost' In Apache Pyspark"