pyspark.pandas.DataFrame.drop

DataFrame.drop(labels: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]], None] = None, axis: Union[int, str] = 1, columns: Union[Any, Tuple[Any, …], List[Union[Any, Tuple[Any, …]]]] = None) → pyspark.pandas.frame.DataFrame[source]

Drop specified labels from columns.

Remove columns by specifying label names and axis=1 or columns. When specifying both labels and columns, only labels will be dropped. Removing rows is yet to be implemented.

Parameters
labelssingle label or list-like

Column labels to drop.

axis{1 or ‘columns’}, default 1
columnssingle label or list-like

Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).

Returns
droppedDataFrame

See also

Series.dropna

Notes

Currently only axis = 1 is supported in this function, axis = 0 is yet to be implemented.

Examples

>>> df = ps.DataFrame({'x': [1, 2], 'y': [3, 4], 'z': [5, 6], 'w': [7, 8]},
...                   columns=['x', 'y', 'z', 'w'])
>>> df
   x  y  z  w
0  1  3  5  7
1  2  4  6  8
>>> df.drop('x', axis=1)
   y  z  w
0  3  5  7
1  4  6  8
>>> df.drop(['y', 'z'], axis=1)
   x  w
0  1  7
1  2  8
>>> df.drop(columns=['y', 'z'])
   x  w
0  1  7
1  2  8

Also support for MultiIndex

>>> df = ps.DataFrame({'x': [1, 2], 'y': [3, 4], 'z': [5, 6], 'w': [7, 8]},
...                   columns=['x', 'y', 'z', 'w'])
>>> columns = [('a', 'x'), ('a', 'y'), ('b', 'z'), ('b', 'w')]
>>> df.columns = pd.MultiIndex.from_tuples(columns)
>>> df  
   a     b
   x  y  z  w
0  1  3  5  7
1  2  4  6  8
>>> df.drop('a')  
   b
   z  w
0  5  7
1  6  8