pyspark.sql.functions.array_remove#
- pyspark.sql.functions.array_remove(col, element)[source]#
Array function: Remove all elements that equal to element from the given array.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str name of column containing array
- element
element to be removed from the array
- col
- Returns
Column
A new column that is an array excluding the given value from the input column.
Examples
Example 1: Removing a specific value from a simple array
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2, 3, 1, 1],)], ['data']) >>> df.select(sf.array_remove(df.data, 1)).show() +---------------------+ |array_remove(data, 1)| +---------------------+ | [2, 3]| +---------------------+
Example 2: Removing a specific value from multiple arrays
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2, 3, 1, 1],), ([4, 5, 5, 4],)], ['data']) >>> df.select(sf.array_remove(df.data, 5)).show() +---------------------+ |array_remove(data, 5)| +---------------------+ | [1, 2, 3, 1, 1]| | [4, 4]| +---------------------+
Example 3: Removing a value that does not exist in the array
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 2, 3],)], ['data']) >>> df.select(sf.array_remove(df.data, 4)).show() +---------------------+ |array_remove(data, 4)| +---------------------+ | [1, 2, 3]| +---------------------+
Example 4: Removing a value from an array with all identical values
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([([1, 1, 1],)], ['data']) >>> df.select(sf.array_remove(df.data, 1)).show() +---------------------+ |array_remove(data, 1)| +---------------------+ | []| +---------------------+
Example 5: Removing a value from an empty array
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import ArrayType, IntegerType, StructType, StructField >>> schema = StructType([ ... StructField("data", ArrayType(IntegerType()), True) ... ]) >>> df = spark.createDataFrame([([],)], schema) >>> df.select(sf.array_remove(df.data, 1)).show() +---------------------+ |array_remove(data, 1)| +---------------------+ | []| +---------------------+