Bayesian time series intervention analysis

These are examples of using Bayesian statistics to infer the impact of an intervention whose effect follows one of several patterns mentioned in 9.2 Intervention Analysis:

Intervention patterns

The calculations are done on PAWS, Wikimedia Cloud Services-hosted JupyterHub.

Update (29 Oct 2018): I've published the report/data analysis which motivated me to put together these notes.

Autoregressive–moving-average model

For any ARMA model of observed series $y_t$ with change-due-to-intervention $z_t$ and error series $\epsilon_t$:

$$ y_t = z_t + \frac{\Theta(B)}{\Phi(B)} \epsilon_t $$

where $\Theta(B)$ is the moving average (MA) polynomial and $\Phi(B)$ is the autoregressive (AR) polynomial.

In all of these, $z_t$ will be some function of an indicator variable $I_t$ and intervention effect $\delta_0$.

To keep these toy examples simple and easy to follow (at least with respect to simulation of data and performing the inference), we will use a stationary ARMA(1,1) model:

$$ y_t = z_t + \phi_1 y_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_t $$

If you're seeing this time series stuff for the first time and are interested in learning more, I recommend the following resources:

Inference

We are primarily interested in inference on the effect of the intervention $\delta_0$. We will use the the Stan probabilistic programming language to accomplish this, but others like JAGS, PyMC3, Edward2 / TensorFlow Probability, greta, Pyro, and Turing.jl can be used as well.

If you're seeing this Bayesian stuff for the first time and are interested in learning more, I recommend the following resources:

Setup

We're using RStan to interface with Stan in R but all of this can also be done in Python with PyStan or in Julia with Stan.jl.

We will simulate $N = 200$ days of observations with intervention $T = 100$. The intervention effect is $\delta_0 = 10$ in all cases. In cases where there is a gradual increase/decrease, $\omega_1 = 0.9$.

The errors are i.i.d. from a Normal distribution ($\mu = 0, \sigma = 2$), and $\theta_1 = 0.6$ and $\phi_1 = -0.7$ as the MA(1) and AR(1) coefficients, respectively:

Patterns

Pattern 1: constant permanent change

If $T$ is the time of intervention, then $I_t = 0$ when $t < T$ and $I_t = 1$ when $t \geq T$.

$$ z_t = \delta_0 I_t $$

The Stan manual authors recommend constraining the coefficients to enforce the estimation of a stationary ARMA process so that's what we're going to do here. Although they don't recommended that in practice in case the data is not stationary, in which case they suggest encouraging stationary parameter estimates with a prior favoring coefficient values near zero.

Once we have a model we can do Bayesian inference in one of two ways:

Let's see how the results differ and how long they take to get:

Pretty good results in way less time!

The original draft of this notebook just had MCMC and it wasn't until right before I was ready to post it that I was like "let's just try VI and see what happens". Since it's not expensive to do it in addition to the full inference, I've included in the other patterns too to show its similarities, differences, advantages, and shortcomings. I feel like approximate inference with (AD)VI isn't as well-known among Bayesian statisticians and data scientists as MCMC-powered full inference is, so hopefully this could be a cool introduction to that.

Pattern 2: temporary, constant change

When the change from the intervention lasts $d$ days, $I_t = 1$ for $T \leq t \leq T + d$ and $I_t = 0$ for all other $t$.

$$ z_t = \delta_0 I_t $$

Pattern 3: gradually leveling-off change

If $T$ is the time of intervention, then $I_t = 0$ when $t < T$ and $I_t = 1$ when $t \geq T$.

$$ z_t = \frac{\delta_0}{1 - \omega_1 B} I_t $$

assuming $\omega_1 < 1$. The above is equivalent to $z_t = \omega_1 z_{t-1} + \delta_0 I_t$. Furthermore, since $|\omega_1| < 1$, $z_t$ can be expressed just in terms of $\omega_1$ and $\delta_0$:

$$ z_t = \frac{\delta_0(1 - \omega_1^{t - T + 1})}{1 - \omega_1} $$

Knowing how the value of $\omega_1$ affects the effect can help us come up with an informative prior for it:

Looking at the observed $y_t$, it appears that it takes about 50 days before the effect stops increasing. We don't need to know $\delta_0$ to see how $z_t$ behaves under different values of $\omega_1$. Looking at the figure above, we could be confident in saying that $\omega_1$ is definitely bigger than 0.1 and probably bigger than 0.5. Instead of a Uniform(0, 1) prior, it looks like we would be wise to use a Beta(8, 2) prior instead, which gives values 0.7-0.9 much higher probability:

Visualization of Beta(8, 2) distribution made with tinydensR RStudio addin

(Made with my in-development RStudio addin tinydensR)

Pattern 4: immediate, gradually decreasing change

The formula and model specification are the same as the pattern above except $I_t = 1$ when $t = T$ and $I_t = 0$ otherwise.

Pattern 5: Gompertz (sigmoidal)

If there's a slow ramp-up and then the change slowly levels-off in an "S" shape, we can represent it with the Gompertz function:

$$ f(t) = a\mathrm{e}^{-b\mathrm{e}^{-ct}} $$

where

Extra credit: pattern 2 with unknown duration

In pattern 2 we had an intervention whose effect lasted $d=18$ days. What if we didn't know that? What if we just knew when the intervention happened but were unsure about how long the effect lasted? Our suspicion is that the effect of the intervention lasted between 1 and 3 weeks and is probably around a fortnight, although we're not particularly sure. We can specify $d$ as a parameter of the model to infer from the data, and we can formalize our prior beliefs as:

$$ d~\sim~\text{Normal}(14, 2) $$

With the Normal distribution there's something called the 68-95-99.7 rule, so one way to think about this is that we're:

By the way, even though we talk about $d$ as a discrete number of days, we can still represent it in the model as a continuous, real-valued random variable.

The probability density looks like this:

In this particular case where we have this additional uncertainty from having $d$ as an unknown, we're going to help the model by setting the lower bound on $\delta_0$ to 0 since it's clear that the intervention had a positive impact.