How To Explode An Array Without Duplicate Records

April 16, 2024 Post a Comment

This is continuation to the question here in pyspark sql Add different Qtr start_date, End_date for exploded rows. Thanks. I have the following dataframe which has a array list as

Solution 1:

Instead of exploding the array, you can pick the values from the array based on it's position.

This position can be dynamically generated using row_number as shown below.

from pyspark.sql.functions import row_number, expr
from pyspark.sql import Windowwindow= Window.partitionBy('customer_number').orderBy('new_sdt')

df.withColumn('row_num', row_number().over(window)).\
withColumn('cf_new', expr("cf_values[row_num - 1]")).\
drop('row_num').show()

Output:

Baca Juga

+---------------+------------+----------+----------+---+------------+----------+----------+------+
|customer_number|sales_target|start_date|  end_date|noq|   cf_values|   new_sdt| new_edate|cf_new|
+---------------+------------+----------+----------+---+------------+----------+----------+------+
|        A011021|          15|2020-01-01|2020-12-31|  4|[4, 4, 4, 3]|2020-01-01|2020-03-31|     4|
|        A011021|          15|2020-01-01|2020-12-31|  4|[4, 4, 4, 3]|2020-04-01|2020-06-30|     4|
|        A011021|          15|2020-01-01|2020-12-31|  4|[4, 4, 4, 3]|2020-07-01|2020-09-30|     4|
|        A011021|          15|2020-01-01|2020-12-31|  4|[4, 4, 4, 3]|2020-10-01|2020-12-31|     3|
+---------------+------------+----------+----------+---+------------+----------+----------+------+

Learn Python Tutorials

How To Explode An Array Without Duplicate Records

Solution 1:

Post a Comment for "How To Explode An Array Without Duplicate Records"