pyspark.pandas.window.Rolling.mean#

Rolling.mean()[source]#

Calculate the rolling mean of the values.

Note

the current implementation of this API uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.

Returns

Series or DataFrame: Returned object type is determined by the caller of the rolling calculation.

See also

pyspark.pandas.Series.rolling: Calling object with Series data.
pyspark.pandas.DataFrame.rolling: Calling object with DataFrames.
pyspark.pandas.Series.mean: Equivalent method for Series.
pyspark.pandas.DataFrame.mean: Equivalent method for DataFrame.

Examples

>>> s = ps.Series([4, 3, 5, 2, 6])
>>> s
0    4
1    3
2    5
3    2
4    6
dtype: int64

>>> s.rolling(2).mean()
  NaN
  3.5
  4.0
  3.5
  4.0
dtype: float64

>>> s.rolling(3).mean()
       NaN
       NaN
  4.000000
  3.333333
  4.333333
dtype: float64

For DataFrame, each rolling mean is computed column-wise.

>>> df = ps.DataFrame({"A": s.to_numpy(), "B": s.to_numpy() ** 2})
>>> df
   A   B
0  4  16
1  3   9
2  5  25
3  2   4
4  6  36

>>> df.rolling(2).mean()
     A     B
NaN   NaN
3.5  12.5
4.0  17.0
3.5  14.5
4.0  20.0

>>> df.rolling(3).mean()
          A          B
     NaN        NaN
     NaN        NaN
4.000000  16.666667
3.333333  12.666667
4.333333  21.666667