Performing (moment) model predictions

We demonstate how to use the predict method to obtain k-step ahead predictions / forecasts of a model object, and the observations.

A prediction in this context is considered to be a prior estimate of the state mean and covariance i.e. an estimate obtained without incorporating information from future observations. We denote these

$$ \hat{x}_{i|j} = \mathrm{E}(x_{t_{i}} | x_{t_{j}}, y_{t_{j}}) \\ \hat{P}_{i|j} = \mathrm{V}(x_{t_{i}} | x_{t_{j}}, y_{t_{j}}) $$ where i ≥ j.

The predict method computes the predictions depending on its argument method in the following way:

  1. The Extended Kalman FIlter (method=ekf): The posterior values i|i, i|i are obtained from the filter. In each iteration the moment ODEs $$ \begin{align} \frac{d\hat{x}_{i|i}}{dt} &= f(\hat{x}_{i|i},\cdots) \\ \frac{d\hat{P}_{i|i}}{dt} &= \left[\frac{df}{dx}(\hat{x}_{i|i},\cdots)\right] \hat{P}_{i|i} + \hat{P}_{i|i} \left[\frac{df}{dx}(\hat{x}_{i|i},\cdots)\right]^{T} + G(\hat{x}_{i|i},\cdots)G^{T}(\hat{x}_{i|i},\cdots) \end{align} $$ are solved from the initial time ti until tj where the number of steps is determined by the argument k.ahead to predict, i.e. j=i+k.ahead. The solution at each time point ti + 1, ti + 2, …, ti + k.ahead is the desired predictions. We note that these predictions assume that the underlying probability density of the SDE remains approximately Gaussian. If the prediction horizon is large then this assumption probably doesn’t hold, and one should instead use the simulate method. The stochastic predictions are generated by simulating the SDE forward in time for a user-defined number of simulations.

  2. The Laplace Approximation (method=laplace): The predict method is not yet implemented.

Simulate from the Ornstein-Uhlenbeck process


We use the Ornstein-Uhlenbeck process again.

dXt = θ(μ − Xt) dt  + σX dBt

Ytk = Xtk + etk,   etk ∼ N(0, σY2) We first create data by simulating the process

# Simulate data using Euler Maruyama
set.seed(10)
theta=10; mu=1; sigma_x=1; sigma_y=1e-1
# 
dt.sim = 1e-3
t.sim = seq(0,1,by=dt.sim)
dw = rnorm(length(t.sim)-1,sd=sqrt(dt.sim))
x = 3
for(i in 1:(length(t.sim)-1)) {
  x[i+1] = x[i] + theta*(mu-x[i])*dt.sim + sigma_x*dw[i]
}

# Extract observations and add noise
dt.obs = 1e-2
t.obs = seq(0,1,by=dt.obs)
y = x[t.sim %in% t.obs] + sigma_y * rnorm(length(t.obs))

# Create data
.data = data.frame(
  t = t.obs,
  y = y
)

Construct model object


We now construct the ctsmTMB model object

# Create model object
obj = ctsmTMB$new()

# Set name of model (and the created .cpp file)
obj$setModelname("ornstein_uhlenbeck")

# Add system equations
obj$addSystem(
  dx ~ theta * (mu-x) * dt + sigma_x*dw
)

# Add observation equations
obj$addObs(
  y ~ x
)

# Set observation equation variances
obj$setVariance(
  y ~ sigma_y^2
)

# Specify algebraic relations
obj$setAlgebraics(
  theta ~ exp(logtheta),
  sigma_x ~ exp(logsigma_x),
  sigma_y ~ exp(logsigma_y)
)

# Specify parameter initial values and lower/upper bounds in estimation
obj$setParameter(
  logtheta   = log(c(initial = 5,    lower = 0,    upper = 20)),
  mu         = c(    initial = 0,    lower = -10,  upper = 10),
  logsigma_x = log(c(initial = 1e-1, lower = 1e-5, upper = 5)),
  logsigma_y = log(c(initial = 1e-1, lower = 1e-5, upper = 5))
)

# Set initial state mean and covariance
obj$setInitialState(list(x[1], 1e-1*diag(1)))

Predict


We can use the predict function without having called estimate first, in which case the initial value of the parameters provided during add_parameters are used.

pred = obj$predict(.data)

The output of predict is a list of two data.frames, one for states and one for observations. The two data.frames share the same five initial columns that contain indices i and j, associated time-points t.i and t.j, and a k.ahead

They both contain indices, time-points and associated k-step ahead , state/observation values and associated standard deviations, and observations provided in the data, and predicted observation values based from the predicted state values.

head(pred$states) 
##   i. j.  t.i  t.j k.ahead        x       var.x
## 1  0  0 0.00 0.00       0 3.095456 0.009090909
## 2  0  1 0.00 0.01       1 2.944489 0.008320958
## 3  1  1 0.01 0.01       0 2.827817 0.004541770
## 4  1  2 0.01 0.02       1 2.689903 0.004204726
## 5  2  2 0.02 0.02       0 2.673529 0.002960090
## 6  2  3 0.02 0.03       1 2.543140 0.002773563
head(pred$observations)
##     i. j.  t.i  t.j k.ahead        y   y.data
## 1    0  0 0.00 0.00       0 3.095456 3.105001
## 2    0  1 0.00 0.01       1 2.944489 2.687603
## 2.1  1  1 0.01 0.01       0 2.827817 2.687603
## 3    1  2 0.01 0.02       1 2.689903 2.634588
## 3.1  2  2 0.02 0.02       0 2.673529 2.634588
## 4    2  3 0.02 0.03       1 2.543140 2.267096

The indices i. and j. in the table refer to expectation and variance of the state vector $$ \mathrm{E}(x_{t_{i}} | x_{t_{j}}, y_{t_{j}}) \\ \mathrm{V}(x_{t_{i}} | x_{t_{j}}, y_{t_{j}}) $$ where the moments are given at indices i (time ti) conditioned on the information available up until indices j (time tj). In this sense the state/covariance predictions with i = j are posterior estimates, and all others are prior estimates.

By default only the standard deviations of the states are returned. The entire covariance matrix elements can be returned via

obj$predict(.data, return.covariance = TRUE)

Set number of prediction steps


The number of time steps ahead that predictions are desired can be changed with the k.ahead argument. The default behaviour is to save all prediction steps from 1 to k.ahead but if only a selected few are desired, then the return.k.ahead argument can be used to indicate which should be kept in the output.

pred1 = obj$predict(.data, k.ahead=2)
pred2 = obj$predict(.data, k.ahead=10, return.k.ahead=c(2,5,8))
head(pred1$states)
##   i. j.  t.i  t.j k.ahead        x       var.x
## 1  0  0 0.00 0.00       0 3.095456 0.009090909
## 2  0  1 0.00 0.01       1 2.944489 0.008320958
## 3  0  2 0.00 0.02       2 2.800884 0.007624277
## 4  1  1 0.01 0.01       0 2.827817 0.004541770
## 5  1  2 0.01 0.02       1 2.689903 0.004204726
## 6  1  3 0.01 0.03       2 2.558715 0.003899757
head(pred2$states)
##    i. j.  t.i  t.j k.ahead        x       var.x
## 3   0  2 0.00 0.02       2 2.800884 0.007624277
## 6   0  5 0.00 0.05       5 2.410743 0.005907387
## 9   0  8 0.00 0.08       8 2.074946 0.004635482
## 14  1  3 0.01 0.03       2 2.558715 0.003899757
## 17  1  6 0.01 0.06       5 2.202306 0.003148193
## 20  1  9 0.01 0.09       8 1.895543 0.002591421

Set the model parameters used


The default of predict is to use the initial parameters supplied during setParameter, unless the estimate function has been succesfully run, then predict will use the parameter values at the found minimizer. You can provide other parameters using the pars argument of the method i.e.

some.other.random.pars = rnorm(length(fit$par.fixed))
pred = obj$predict(.data, pars=some.other.random.pars)

Set the initial state and covariance


The default behaviour of predict is to use the initial state and covariance suppled when calling the set_initial_state method. This can be important to change if one wishes to correctly predict on the first few observations in a provided prediction data series. You can supply the x0 and p0 arguments with updated state and covariance estimates

new.initial.x0 = rnorm(1)
new.initial.p0 = rnorm(1)*diag(1)
pred = obj$predict(data=.data,
                   initial.state = list(
                     new.state.value,
                     new.covariance.value
                   )
)

The solver options


You can choose which solver options to use via the arguments ode.solver and ode.timestep.

The ode.solver decides which ode solver algorithm to used when solving the mean and variance ODEs of the SDE.

The ode.timestep determines the time step-size using when solving the ODEs, with default value being the minimum observed time difference in the provided data time vector (min(diff(.data$t))). The time step-size can’t be smaller than this. If the chosen time step-size Δstep does not “almost” divide the observation time differences in the data Δobs i.e. if $$ \frac{\Delta_{obs}}{\Delta_{step}} - \bigg\lfloor \frac{\Delta_{obs}}{\Delta_{step}} \bigg\rfloor \geq 0.02 $$ then the time step-size is reduced such that it does divide evenly by setting

$$ \frac{\Delta_{obs}}{\Delta_{step}} = \bigg\lceil \frac{\Delta_{obs}}{\Delta_{step}} \bigg\rceil $$ In other words: if there it takes 3.561 time-steps to get from ti to ti + 1 then we take 4 time-steps instead, and calculate a reduced time-step such that this is true. If there were only 3.01 time-steps then we retain the original time-step, ignoring the small temporal discrepancy.

We provide these arguments via

obj$predict(.data, ode.timestep = min(diff(.data$t)))

Use-case


We could use our model predictions against observation to compute a model performance score e.g. RMSE (root-mean square error). We first estimate the model parameters, such that they are automatically used when calling predict. We then predict up to 10-steps ahead, and return only these values.

fit = obj$estimate(.data)
pred = obj$predict(.data, k.ahead=10)
pred.states = pred$states
pred.obs = pred$observations
# ggplot2 theme
library(ggplot2)
mytheme =
  theme_minimal() + 
  theme(
    text             = element_text("Avenir Next Condensed",size=15),
    legend.text      = element_text(size=15),
    axis.text        = element_text(size=15),
    strip.text       = element_text(face="bold",size=15),
    legend.box       = "vertical",
    legend.position  = "top",
    plot.title       = element_text(hjust=0.5)
  )

Let’s plot the 10-step predictions against the observations.

pred10 = pred.states[pred.states$k.ahead==10,]
data = .data
ggplot() +
  geom_line(aes(x=pred10$t.j,y=pred10$x,color="10-Step Predictions")) +
    geom_ribbon(aes(x=pred10$t.j,ymin=pred10$x-2*sqrt(pred10$var.x),ymax=pred10$x+2*sqrt(pred10$var.x)),fill="grey",alpha=0.5) +
  geom_point(aes(x=data$t,data$y,color="Observations")) +
  labs(color="",x="Time",y="") +
  # coord_cartesian(xlim=c(0,0.1)) +
  mytheme

We notice that because these are 10-step predictions the state uncertainty (95%) is much larger than for the filtered posterior state estimates from fit i.e.

t         = fit$states$mean$posterior$t
xpost     = fit$states$mean$posterior$x
xpost_sd  = fit$states$sd$posterior$x

ggplot() +
  geom_line(aes(x=t,y=xpost,color="0-Step Predictions (Posterior State Estimates)"),lwd=1) +
  geom_ribbon(aes(x=t,ymin=xpost-2*xpost_sd,ymax=xpost+2*xpost_sd),fill="grey",alpha=0.5) +
  geom_point(aes(x=data$t,y=data$y,color="Observations")) +
  labs(x = "Time", y = "", color="") +
  mytheme

We can calculate the RMSE prediction score for each prediction step as follows:

rmse = c()
k.ahead = 1:10
for(i in k.ahead){
  xy = data.frame(
    x = pred.states[pred.states$k.ahead==i,"x"],
    y = pred.obs[pred.obs$k.ahead==i,"y.data"]
  )
  rmse[i] = sqrt(mean((xy[["x"]] - xy[["y"]])^2))
}

ggplot() +
  geom_line(aes(k.ahead, rmse), color="steelblue") + 
  geom_point(aes(k.ahead, rmse), color="red") +
  labs(
    title = "Root-Mean Square Errors for Different Prediction Horizons",
    x = "Prediction Steps",
    y = "Root-Mean-Square Errors"
    ) +
  mytheme