Unsupportedoperationexception: Cannot Evalute Expression: .. When Adding New Column Withcolumn() And Udf()
Solution 1:
allright..I think I understand the problem...The cause is because my dataFrame just had a lot of data loaded in memory causing show()
action to fail.
The way I realize it is that what is causing the exception :
Py4JJavaError: An error occurred while calling o2108.showString.
: java.lang.UnsupportedOperationException: Cannot evaluate expression:
is really the df.show()
action.
I could confirm that by executing the code snippet from : Convert pyspark string to date format
from datetime import datetime
from pyspark.sql.functions import col,udf, unix_timestamp
from pyspark.sql.types import DateType
# Creation of a dummy dataframe:
df1 = sqlContext.createDataFrame([("11/25/1991","11/24/1991","11/30/1991"),
("11/25/1391","11/24/1992","11/30/1992")], schema=['first', 'second', 'third'])
# Setting an user define function:
# This function converts the string cell into a date:
func = udf (lambda x: datetime.strptime(x, '%M/%d/%Y'), DateType())
df = df1.withColumn('test', func(col('first')))
df.show()
df.printSchema()
which worked! But it still did not work with my dataFrame df_src
.
The cause is because I am loading a lot a lot of data in memory from my database server (like over 8-9 millions of rows) it seems that spark is unable to perform the execution within udf when .show()
(which displays 20 entries by default) of the results loaded in a dataFrame.
Even if show(n=1) is called, same exception would be thrown.
But if printSchema() is called, you will see that the new column is effectively added.
One way to see if the new column is added it would be simply to call the action print dataFrame.take(10)
instead.
Finally, one way to make it work is to affect a new dataframe and not call .show()
when calling udf in a select() as :
df_to_string = df_src.select('*',
u_parse_df_to_string(df_src['year'], df_src['month'], df_src['day'], df_src['hour'], df_src['minute'])
)
Then cache it :
df_to_string.cache
Now .show()
can be called with no issues :
df_to_string.show()
Post a Comment for "Unsupportedoperationexception: Cannot Evalute Expression: .. When Adding New Column Withcolumn() And Udf()"