Introduction
Wait didn’t you already have one of these tutorial posts?
The short answer is yes, in a previous post I reviewed a much more manual way to apply a rolling average function to a set of time series data. I will be leaving that as is since there are still valuable techniques to gain from it (markedly using the floor_date
function), but intend for this post to expand and simplify those initial concepts. In the previous post I did not touch on zoo
or the since-released slider
functions which will be the focus of this post.
The shorter answer is…well we all learn better ways of doing things and are trying to improve right?
I will be evaluating a single dataset randomly generated with time and nominal values, time.df
. If you wish to follow along with the same data, specify the same set.seed
value as shown below.
library(dplyr)
library(lubridate)
library(zoo)
# Create two vectors of time and value series data and combine in a tibble
set.seed(seed = 123) # Set seed for same randomized data
time.series <- seq(from = ymd_hms("2020-01-01 00:00:00"), to = ymd_hms("2020-12-31 00:00:00"), by = "day")
val.series <- rnorm(n = length(time.series), mean = 100, sd = 5)
time.df <- tibble(time.series, val.series)
To note, I have chosen to assign the datasets to tibble
s. For the most part “tibbles” and dataframes can be used interchangeably. Recently I have gravitated towards using tibbles because they play nicely with the [tidyverse](https://www.tidyverse.org/)
and are much easier to visualize. Calling a set tibble will automatically display nice headers, data types, and provide more information readily than calling a dataframe. Check out R for Data Science for more information about tibbles!
Going to the zoo
for Rolling Averages
In my last post I described using dplyr
and manually adjusting dates and times to set an artificial floor and essentially “shift” data within a defined window. zoo
does all that work for you in a single function. Using rollmean
a user can define a vector of data, supply a window, k
, to roll through, and an alignment on how the mean should be applied (left, right, or center with “center” as the default). There are also similar functions for rollmedian
, rollmax
, rollmin
, and rollsum
.
In my opinion the more useful function is simply to use rollapply
and specify what function you want to apply to your dataset. While the former functions are streamlined for simple functional operations, rollapply
gives the option to make more complex ones should the need arise. Both are shown using the time.df
data below:
# Using rollmean on a vector
roll.vec <- rollmean(x = time.df$val.series, k = 10, align = "left", fill = NA)
# Using rollapply, specify mean, and assign to a new column in time.df
time.df$roll.mean <- rollapply(data = time.df$val.series, FUN = mean, width = 10, fill = NA, align = "left")
Here I have specified I want a window of 10 (in this time-based dataset representative of 10 days), to apply the mean
function, and to align all data to the “left”. Additionally, I specified to fill all blank values with NA
. While this dataset is complete with data, obviously most data sets are not.
To verify the output of this look at the first value of time.df$roll.mean[1]
and compare it to mean(time.df$roll.mean[1:10])
. If you used the same set.seed(123)
values as me, you’ll find that both values come out to 97.6. You can do an additional sanity check by checking tail(time.df, n = 10)
and see that the ending 9 entries are all NA
, while the top entry is 99.8 (the same as executing mean(tail(time.df$val.series, 10))
or mean(time.df$val.series[(nrow(time.df)-10):nrow(time.df)])
).
Alternative Approach with slider
The slider
package is a relatively recent release from Davis Vaughan advertised as drawing on influence from zoo
but incorporating elements of purrr
-style grammar. To check out the documentation for this package in detail, check out Vaughan’s Github Repository.
Similar to purrr
, functional assignments come into play with the arguments for slide
. And similar to zoo
’s rollapply
function I will be specifying function and window. The key differences here are in how “alignment”, completeness, and steps are assigned. .before
and .after
are how left/right alignment are specified and a combination of both implies center alignment of the rolling window. For example, specifying .after = 9
indicates that the rolling window applies to the current value plus the 9 values after it, making it left-aligned. Similarly specifying .before = 9
would apply to the current value and the 9 values prior, making it right-aligned. To make a centrally aligned window, specify .before = 9, .after = 9
, or basically just make both sides equivalent.
library(slider)
# Using slide, apply similar method to zoo and assign to new column in time.df
time.df$slide.mean <- slide(.x = time.df$val.series, .f = ~mean(.x), .step = 1, .after = 9, .complete = T)
Other portions of this include .step
which will apply the rolling window over steps, and .complete
which will either continue evaluation or process to NA
if the full window cannot be calculated. In the below code .complete
is set to TRUE
so mimic the output of rollapply
. Check the bottom of the tibble to see how the final 9 observations are all NA
or NULL
. Otherwise I encourage looking at the roll.mean
and slide.mean
columns to see their identical values.
I also encourage exploring other elements of the slider
ecosystem including slide_index
and slide_period
for working with indices and date/times.