Rolling Average Functions - Part 2

Introduction

Wait didn’t you already have one of these tutorial posts?

The short answer is yes, in a previous post I reviewed a much more manual way to apply a rolling average function to a set of time series data. I will be leaving that as is since there are still valuable techniques to gain from it (markedly using the floor_date function), but intend for this post to expand and simplify those initial concepts. In the previous post I did not touch on zoo or the since-released slider functions which will be the focus of this post.

The shorter answer is…well we all learn better ways of doing things and are trying to improve right?

shrug

I will be evaluating a single dataset randomly generated with time and nominal values, time.df. If you wish to follow along with the same data, specify the same set.seed value as shown below.

library(dplyr)
library(lubridate)
library(zoo)

# Create two vectors of time and value series data and combine in a tibble
set.seed(seed = 123) # Set seed for same randomized data
time.series <- seq(from = ymd_hms("2020-01-01 00:00:00"), to = ymd_hms("2020-12-31 00:00:00"), by = "day")
val.series <- rnorm(n = length(time.series), mean = 100, sd = 5)
time.df <- tibble(time.series, val.series)

To note, I have chosen to assign the datasets to tibbles. For the most part “tibbles” and dataframes can be used interchangeably. Recently I have gravitated towards using tibbles because they play nicely with the [tidyverse](https://www.tidyverse.org/) and are much easier to visualize. Calling a set tibble will automatically display nice headers, data types, and provide more information readily than calling a dataframe. Check out R for Data Science for more information about tibbles!

Going to the zoo for Rolling Averages

zoo

In my last post I described using dplyr and manually adjusting dates and times to set an artificial floor and essentially “shift” data within a defined window. zoo does all that work for you in a single function. Using rollmean a user can define a vector of data, supply a window, k, to roll through, and an alignment on how the mean should be applied (left, right, or center with “center” as the default). There are also similar functions for rollmedian, rollmax, rollmin, and rollsum.

In my opinion the more useful function is simply to use rollapply and specify what function you want to apply to your dataset. While the former functions are streamlined for simple functional operations, rollapply gives the option to make more complex ones should the need arise. Both are shown using the time.df data below:

# Using rollmean on a vector
roll.vec <- rollmean(x = time.df$val.series, k = 10, align = "left", fill = NA)

# Using rollapply, specify mean, and assign to a new column in time.df
time.df$roll.mean <- rollapply(data = time.df$val.series, FUN = mean, width = 10, fill = NA, align = "left")

Here I have specified I want a window of 10 (in this time-based dataset representative of 10 days), to apply the mean function, and to align all data to the “left”. Additionally, I specified to fill all blank values with NA. While this dataset is complete with data, obviously most data sets are not.

To verify the output of this look at the first value of time.df$roll.mean[1] and compare it to mean(time.df$roll.mean[1:10]). If you used the same set.seed(123) values as me, you’ll find that both values come out to 97.6. You can do an additional sanity check by checking tail(time.df, n = 10) and see that the ending 9 entries are all NA, while the top entry is 99.8 (the same as executing mean(tail(time.df$val.series, 10)) or mean(time.df$val.series[(nrow(time.df)-10):nrow(time.df)])).

Alternative Approach with slider

slide1

The slider package is a relatively recent release from Davis Vaughan advertised as drawing on influence from zoo but incorporating elements of purrr-style grammar. To check out the documentation for this package in detail, check out Vaughan’s Github Repository.

Similar to purrr, functional assignments come into play with the arguments for slide. And similar to zoo’s rollapply function I will be specifying function and window. The key differences here are in how “alignment”, completeness, and steps are assigned. .before and .after are how left/right alignment are specified and a combination of both implies center alignment of the rolling window. For example, specifying .after = 9 indicates that the rolling window applies to the current value plus the 9 values after it, making it left-aligned. Similarly specifying .before = 9 would apply to the current value and the 9 values prior, making it right-aligned. To make a centrally aligned window, specify .before = 9, .after = 9, or basically just make both sides equivalent.

library(slider)

# Using slide, apply similar method to zoo and assign to new column in time.df
time.df$slide.mean <- slide(.x = time.df$val.series, .f = ~mean(.x), .step = 1, .after = 9, .complete = T)

Other portions of this include .step which will apply the rolling window over steps, and .complete which will either continue evaluation or process to NA if the full window cannot be calculated. In the below code .complete is set to TRUE so mimic the output of rollapply. Check the bottom of the tibble to see how the final 9 observations are all NA or NULL. Otherwise I encourage looking at the roll.mean and slide.mean columns to see their identical values.

I also encourage exploring other elements of the slider ecosystem including slide_index and slide_period for working with indices and date/times.