Skip to content Skip to sidebar Skip to footer

Unsupportedoperationexception: Cannot Evalute Expression: .. When Adding New Column Withcolumn() And Udf()

So what I am trying to do is simply to convert fields: year, month, day, hour, minute (which are of type integer as seen below) into a string type. So I have a dataframe df_src of

Solution 1:

allright..I think I understand the problem...The cause is because my dataFrame just had a lot of data loaded in memory causing show() action to fail.

The way I realize it is that what is causing the exception :

Py4JJavaError: An error occurred while calling o2108.showString.
: java.lang.UnsupportedOperationException: Cannot evaluate expression: 

is really the df.show() action.

I could confirm that by executing the code snippet from : Convert pyspark string to date format

from datetime import datetime
from pyspark.sql.functions import col,udf, unix_timestamp
from pyspark.sql.types import DateType



# Creation of a dummy dataframe:
df1 = sqlContext.createDataFrame([("11/25/1991","11/24/1991","11/30/1991"), 
                            ("11/25/1391","11/24/1992","11/30/1992")], schema=['first', 'second', 'third'])

# Setting an user define function:
# This function converts the string cell into a date:
func =  udf (lambda x: datetime.strptime(x, '%M/%d/%Y'), DateType())

df = df1.withColumn('test', func(col('first')))

df.show()

df.printSchema()

which worked! But it still did not work with my dataFrame df_src.

The cause is because I am loading a lot a lot of data in memory from my database server (like over 8-9 millions of rows) it seems that spark is unable to perform the execution within udf when .show() (which displays 20 entries by default) of the results loaded in a dataFrame.

Even if show(n=1) is called, same exception would be thrown.

But if printSchema() is called, you will see that the new column is effectively added.

One way to see if the new column is added it would be simply to call the action print dataFrame.take(10) instead.

Finally, one way to make it work is to affect a new dataframe and not call .show() when calling udf in a select() as :

df_to_string = df_src.select('*', 
          u_parse_df_to_string(df_src['year'], df_src['month'], df_src['day'], df_src['hour'], df_src['minute'])
         )

Then cache it :

df_to_string.cache

Now .show() can be called with no issues :

df_to_string.show()

Post a Comment for "Unsupportedoperationexception: Cannot Evalute Expression: .. When Adding New Column Withcolumn() And Udf()"