Wait didn’t you already have one of these tutorial posts?
The short answer is yes, in a previous post I reviewed a much more manual way to apply a rolling average function to a set of time series data. I will be leaving that as is since there are still valuable techniques to gain from it (markedly using the
floor_date function), but intend for this post to expand and simplify those initial concepts. In the previous post I did not touch on
zoo or the since-released
slider functions which will be the focus of this post.
The shorter answer is…well we all learn better ways of doing things and are trying to improve right?
I will be evaluating a single dataset randomly generated with time and nominal values,
time.df. If you wish to follow along with the same data, specify the same
set.seed value as shown below.
library(dplyr) library(lubridate) library(zoo) # Create two vectors of time and value series data and combine in a tibble set.seed(seed = 123) # Set seed for same randomized data time.series <- seq(from = ymd_hms("2020-01-01 00:00:00"), to = ymd_hms("2020-12-31 00:00:00"), by = "day") val.series <- rnorm(n = length(time.series), mean = 100, sd = 5) time.df <- tibble(time.series, val.series)
To note, I have chosen to assign the datasets to
tibbles. For the most part “tibbles” and dataframes can be used interchangeably. Recently I have gravitated towards using tibbles because they play nicely with the
[tidyverse](https://www.tidyverse.org/) and are much easier to visualize. Calling a set tibble will automatically display nice headers, data types, and provide more information readily than calling a dataframe. Check out R for Data Science for more information about tibbles!
Going to the
zoo for Rolling Averages
In my last post I described using
dplyr and manually adjusting dates and times to set an artificial floor and essentially “shift” data within a defined window.
zoo does all that work for you in a single function. Using
rollmean a user can define a vector of data, supply a window,
k, to roll through, and an alignment on how the mean should be applied (left, right, or center with “center” as the default). There are also similar functions for
In my opinion the more useful function is simply to use
rollapply and specify what function you want to apply to your dataset. While the former functions are streamlined for simple functional operations,
rollapply gives the option to make more complex ones should the need arise. Both are shown using the
time.df data below:
# Using rollmean on a vector roll.vec <- rollmean(x = time.df$val.series, k = 10, align = "left", fill = NA) # Using rollapply, specify mean, and assign to a new column in time.df time.df$roll.mean <- rollapply(data = time.df$val.series, FUN = mean, width = 10, fill = NA, align = "left")
Here I have specified I want a window of 10 (in this time-based dataset representative of 10 days), to apply the
mean function, and to align all data to the “left”. Additionally, I specified to fill all blank values with
NA. While this dataset is complete with data, obviously most data sets are not.
To verify the output of this look at the first value of
time.df$roll.mean and compare it to
mean(time.df$roll.mean[1:10]). If you used the same
set.seed(123) values as me, you’ll find that both values come out to 97.6. You can do an additional sanity check by checking
tail(time.df, n = 10) and see that the ending 9 entries are all
NA, while the top entry is 99.8 (the same as executing
mean(tail(time.df$val.series, 10)) or
Alternative Approach with
slider package is a relatively recent release from Davis Vaughan advertised as drawing on influence from
zoo but incorporating elements of
purrr-style grammar. To check out the documentation for this package in detail, check out Vaughan’s Github Repository.
purrr, functional assignments come into play with the arguments for
slide. And similar to
rollapply function I will be specifying function and window. The key differences here are in how “alignment”, completeness, and steps are assigned.
.after are how left/right alignment are specified and a combination of both implies center alignment of the rolling window. For example, specifying
.after = 9 indicates that the rolling window applies to the current value plus the 9 values after it, making it left-aligned. Similarly specifying
.before = 9 would apply to the current value and the 9 values prior, making it right-aligned. To make a centrally aligned window, specify
.before = 9, .after = 9, or basically just make both sides equivalent.
library(slider) # Using slide, apply similar method to zoo and assign to new column in time.df time.df$slide.mean <- slide(.x = time.df$val.series, .f = ~mean(.x), .step = 1, .after = 9, .complete = T)
Other portions of this include
.step which will apply the rolling window over steps, and
.complete which will either continue evaluation or process to
NA if the full window cannot be calculated. In the below code
.complete is set to
TRUE so mimic the output of
rollapply. Check the bottom of the tibble to see how the final 9 observations are all
NULL. Otherwise I encourage looking at the
slide.mean columns to see their identical values.
I also encourage exploring other elements of the
slider ecosystem including
slide_period for working with indices and date/times.