'KMeansModel' Object Has No Attribute 'computeCost' In Apache Pyspark
I'm experimenting with a clustering model in pyspark. I'm trying to get the mean squared cost of the cluster fit for different values of K def meanScore(k,df): inputCol = df.colu
Solution 1:
It is deprecated in Spark 3.0.0 Docs suggest using the evaluator.
Note Deprecated in 3.0.0. It will be removed in future versions.
Use ClusteringEvaluator instead. You can also get the cost on the training dataset in the summary.
Solution 2:
As Erkan sirin mentioned computeCost is deprecated in recent version this may help you solve your problem
# Make predictions
predictions = model.transform(dataset)
from pyspark.ml.evaluation import ClusteringEvaluator
# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))
I hope this helps, you can check official docs for more informations
Solution 3:
As for the current version 3.1.2.
Using KMeans as example, after importing
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator
Loading data and training, then it's just call 'ClusteringEvaluator()':
# Make predictions
predictions = model.transform(dataset)
# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))
Solution 4:
Evaluate clustering by computing Silhouette score:
in Spark 3.0.1 and above
print('Silhouette with squared euclidean distance:')
pdt = model.transform(final_data)
from pyspark.ml.evaluation import ClusteringEvaluator
evaluator = ClusteringEvaluator()
silhouette = evaluator.evaluate(pdt)
print(silhouette)
Evaluate clustering With in set sum of squre errors(wssse):
spark 2.2 to 3.0.0
cost = model.computeCost(dataset)
print("Within Set Sum of Squared Errors = " + str(cost))
Post a Comment for "'KMeansModel' Object Has No Attribute 'computeCost' In Apache Pyspark"