pyspark.pandas.window.Rolling.mean#

Rolling.mean()[source]#

Calculate the rolling mean of the values.

Note

the current implementation of this API uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.

Returns
Series or DataFrame

Returned object type is determined by the caller of the rolling calculation.

See also

pyspark.pandas.Series.rolling

Calling object with Series data.

pyspark.pandas.DataFrame.rolling

Calling object with DataFrames.

pyspark.pandas.Series.mean

Equivalent method for Series.

pyspark.pandas.DataFrame.mean

Equivalent method for DataFrame.

Examples

>>> s = ps.Series([4, 3, 5, 2, 6])
>>> s
0    4
1    3
2    5
3    2
4    6
dtype: int64
>>> s.rolling(2).mean()
0    NaN
1    3.5
2    4.0
3    3.5
4    4.0
dtype: float64
>>> s.rolling(3).mean()
0         NaN
1         NaN
2    4.000000
3    3.333333
4    4.333333
dtype: float64

For DataFrame, each rolling mean is computed column-wise.

>>> df = ps.DataFrame({"A": s.to_numpy(), "B": s.to_numpy() ** 2})
>>> df
   A   B
0  4  16
1  3   9
2  5  25
3  2   4
4  6  36
>>> df.rolling(2).mean()
     A     B
0  NaN   NaN
1  3.5  12.5
2  4.0  17.0
3  3.5  14.5
4  4.0  20.0
>>> df.rolling(3).mean()
          A          B
0       NaN        NaN
1       NaN        NaN
2  4.000000  16.666667
3  3.333333  12.666667
4  4.333333  21.666667