pyspark.sql.DataFrame.drop¶
-
DataFrame.
drop
(*cols: ColumnOrName) → DataFrame[source]¶ Returns a new
DataFrame
without specified columns. This is a no-op if the schema doesn’t contain the given column name(s).New in version 1.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- cols: str orclass:Column
a name of the column, or the
Column
to drop
- Returns
DataFrame
DataFrame without given columns.
Notes
When an input is a column name, it is treated literally without further interpretation. Otherwise, will try to match the equivalent expression. So that dropping column by its name drop(colName) has different semantic with directly dropping the column drop(col(colName)).
Examples
>>> from pyspark.sql import Row >>> from pyspark.sql.functions import col, lit >>> df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) >>> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
>>> df.drop('age').show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+ >>> df.drop(df.age).show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+
Drop the column that joined both DataFrames on.
>>> df.join(df2, df.name == df2.name, 'inner').drop('name').sort('age').show() +---+------+ |age|height| +---+------+ | 14| 80| | 16| 85| +---+------+
>>> df3 = df.join(df2) >>> df3.show() +---+-----+------+----+ |age| name|height|name| +---+-----+------+----+ | 14| Tom| 80| Tom| | 14| Tom| 85| Bob| | 23|Alice| 80| Tom| | 23|Alice| 85| Bob| | 16| Bob| 80| Tom| | 16| Bob| 85| Bob| +---+-----+------+----+
Drop two column by the same name.
>>> df3.drop("name").show() +---+------+ |age|height| +---+------+ | 14| 80| | 14| 85| | 23| 80| | 23| 85| | 16| 80| | 16| 85| +---+------+
Can not drop col(‘name’) due to ambiguous reference.
>>> df3.drop(col("name")).show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [AMBIGUOUS_REFERENCE] Reference...
>>> df4 = df.withColumn("a.b.c", lit(1)) >>> df4.show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+
>>> df4.drop("a.b.c").show() +---+-----+ |age| name| +---+-----+ | 14| Tom| | 23|Alice| | 16| Bob| +---+-----+
Can not find a column matching the expression “a.b.c”.
>>> df4.drop(col("a.b.c")).show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+