Kalman Filter

3 minute read

Published:

Model of Kalman Filter

We assume current state can be modeled as a Gaussian distribution

\[P(z_t|z_{t-1}) \sim \mathcal{N}(Az_{t-1},Q)\]

We assume neural observations can also be modeled a Gaussian distribution

\[P(x_t|z_{t}) \sim \mathcal{N}(Cz_{t},R)\]

We also assume abase case

\[P(z_{1}) \sim \mathcal{N}(\Pi,V)\]

Thus the model parameters are: \(\Theta=\{A,Q,\Pi,V,C,R\}\)

Model Training

We aims to maximize the joint likelihood of the state and observed date

\[\mathcal{D}=\{\{x\}^n, \{z\}^n\}_{n=1}^N=\{\{x_1^n, \dots, x_T^n\}, \{z_1^n, \dots, z_T^n\}\}_{n=1}^N\] \[\Theta^* = \arg\max_{\Theta} P(\{x\}^n, \{z\}^n|\Theta)\\ = \arg\max_{\Theta} \prod_{n=1}^N P(z^n_1)\left( \prod_{t=2}^T P(z^n_t|z^n_{t-1}) \right)\left( \prod_{t=1}^T P(x^n_t|z^n_{t}) \right)\\ = \arg\max_{\Theta}\sum_{n=1}^N \log P(z^n_1) + \sum_{t=2}^T \log P(z^n_t|z^n_{t-1}) + \prod_{t=1}^T \log P(x^n_t|z^n_{t}) \\ = \arg\max_{\Theta} \sum_{n=1}^N -\frac{1}{2} \log|V|-\frac{1}{2}(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left(-\frac{1}{2} \log|Q|-\frac{1}{2}(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(-\frac{1}{2} \log|R|-\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right) \\ = \arg\min_{\Theta} \sum_{n=1}^N \log|V|+(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left( \log|Q|+(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right)\]

Suppose \(\mathcal{L}=\sum_{n=1}^N \log|V|+(z^n_1-\Pi)^{\top}V^{-1}(z^n_1-\Pi) + \sum_{t=2}^T \left( \log|Q|+(z^n_{t}-Az^n_{t-1})^{\top}Q^{-1}(z^n_{t}-Az^n_{t-1})\right) + \sum_{t=1}^T \left(\log|R|+\frac{1}{2}(x^n_{t}-Cz^n_{t})^{\top}R^{-1}(x^n_{t}-Cz^n_{t})\right)\) the minimize is achieved when the derivative vanishes \(\nabla_{\Pi} \mathcal{L} = \sum_{n=1}^N -2 (z^n_1-\Pi)^{\top}V^{-1} = 0 \to \Pi^* = \frac{1}{N} \sum_{n=1}^N z^n_1\\ \nabla_{V} \mathcal{L} = \sum_{n=1}^N (V^{-1})^{\top} + (z^n_1-\Pi)(z^n_1-\Pi)^{\top} = 0 \to V^* = \frac{1}{N}\sum_{n=1}^N (z^n_1-\Pi^*)(z^n_1-\Pi^*)^{\top} \\ \nabla_{A} \mathcal{L} = \sum_{n=1}^N \sum_{t=2}^T -2Q^{-1}(z^n_{t}-Az^n_{t-1})(z^n_{t-1})^{\top} = 0 \to A^* = \left(\sum_{n=1}^N \sum_{t=2}^T z^n_{t}(z^n_{t-1})^{\top} \right)\left(\sum_{n=1}^N \sum_{t=2}^T z^n_{t-1}(z^n_{t-1})^{\top} \right)^{-1}\\ \nabla_{Q} \mathcal{L} = \sum_{n=1}^N \sum_{t=2}^T (Q^{-1})^{\top} + (z^n_{t}-Az^n_{t-1})(z^n_{t}-Az^n_{t-1})^{\top} = 0 \to Q^* = \frac{1}{N(T-1)} \sum_{n=1}^N \sum_{t=2}^T (z^n_{t}-A^*z^n_{t-1})(z^n_{t}-A^*z^n_{t-1})^{\top}\\ \nabla_{C} \mathcal{L} = \sum_{n=1}^N \sum_{t=1}^T -2R^{-1}(x^n_{t}-Cz^n_{t})(z^n_{t})^{\top} = 0 \to C^* =\left(\sum_{n=1}^N \sum_{t=1}^T x^n_{t}(z^n_{t})^{\top} \right)\left(\sum_{n=1}^N \sum_{t=1}^T z^n_{t}(z^n_{t})^{\top} \right)^{-1}\\ \nabla_{R} \mathcal{L} = \sum_{n=1}^N \sum_{t=1}^T (R^{-1})^{\top} + (x^n_{t}-Cz^n_{t})(x^n_{t}-Cz^n_{t})^{\top} = 0 \to R^* = \frac{1}{NT} \sum_{n=1}^N \sum_{t=1}^T (x^n_{t}-C^*z^n_{t})(x^n_{t}-C^*z^n_{t})^{\top}\)

Testing the Model

In the testing phase we aims to compute $P(z_t|x_1,\dots,x_t)$. At each time step, we apply a two sub-steps, a one-step prediction and then a measurement update.

  • One step Prediction \(P(z_t|x_1,\dots,x_{t-1}) = \int P(z_t|z_{t-1}) P(z_{t-1}|x_1,\dots,x_{t-1}) dz_{t-1}\)
  • Measurement upadte \(P(z_t|x_1,\dots,x_{t}) = \frac{P(x_t|z_t)P(z_t|x_1,\dots,x_{t-1})}{P(x_t|x_{t-1})}\) We’ll use the following notation for mean and convenience: \(\mu_t^t = E[z_t|x_1,\dots,x_{t}]\\ \Sigma_t^t = cov[z_t|x_1,\dots,x_{t}]\)

    One step Prediction

    We assume that we have the mean $\mu_{t-1}^{t-1}$ and covariance $\Sigma_{t-1}^{t-1}$ from the previous iteration. We need to compute $\mu_{t}^{t-1}$ and $\Sigma_{t}^{t-1}$. Because $z_t = Az_{t-1} + \gamma, \gamma \in \mathcal{N}(0, Q) $ \(\mu_{t}^{t-1} = E[z_t|x_1,\dots,x_{t-1}] \\ = E[ Az_{t-1} + \gamma|x_1,\dots,x_{t-1}]\\ = AE[ z_{t-1}|x_1,\dots,x_{t-1}] + E[\gamma|x_1,\dots,x_{t-1}]\\ = A\mu_{t-1}^{t-1}\)

\(\Sigma_{t}^{t-1} = cov[z_t|x_1,\dots,x_{t-1}] \\ = cov[ Az_{t-1} + \gamma|x_1,\dots,x_{t-1}]\\ = E[(Az_{t-1} + \gamma - \mu_{t}^{t-1})(Az_{t-1} + \gamma-\mu_{t}^{t-1})^{\top}|x_1,\dots,x_{t-1}] \\ = E[(Az_{t-1} - \mu_{t}^{t-1})(Az_{t-1} -\mu_{t}^{t-1})^{\top}|x_1,\dots,x_{t-1}] + E[\gamma \gamma^{\top}|x_1,\dots,x_{t-1}]\\ = E[(Az_{t-1} - A\mu_{t-1}^{t-1})(Az_{t-1} -A\mu_{t-1}^{t-1})^{\top}|x_1,\dots,x_{t-1}] + Q\\ = A E[(z_{t-1} - \mu_{t-1}^{t-1})(z_{t-1} -\mu_{t-1}^{t-1})^{\top}|x_1,\dots,x_{t-1}] A^\top + Q\\ = A \Sigma_{t-1}^{t-1} A^\top + Q\)

Measurement update

We take advantage of the property of the Gaussian distributions If $x=\begin{bmatrix}x_a\ x_b\end{bmatrix}\sim \mathcal{N}\left(\begin{bmatrix}\mu_a\\mu_b\end{bmatrix}, \begin{bmatrix}\Sigma_{aa}&\Sigma_{ab}\ \Sigma_{ba}&\Sigma_{bb}\end{bmatrix}\right)$, then $P(x_a|x_b)$ is Gaussian with \(E(x_a|x_b) = \mu_a + \Sigma_{ab}\Sigma_{bb}^{-1}(x_b-\mu_b)\\ cov(x_a|x_b) = \Sigma_{aa} + \Sigma_{ab}\Sigma_{bb}^{-1}\Sigma_{ba}\) Because $x_t = Cz_{t} + \sigma, \sigma \in \mathcal{N}(0, R) $, Then $E[x_t|x_1,\dots,x_{t-1}]=E[Cz_{t} + \sigma|x_1,\dots,x_{t-1}]=C \mu_{t}^{t-1}$. Similarly $cov[x_t|x_1,\dots,x_{t-1}]=C \Sigma_{t}^{t-1}C^{\top} + R$

\(cov[z_t|x_1,\dots,x_{t-1}] = C\Sigma_{t}^{t-1}\) Now we have the joint distribution $P(\begin{bmatrix}x_t\ z_t\end{bmatrix}|x_1,\dots,x_{t-1})\sim \mathcal{N}\left(\begin{bmatrix}C \mu_t^{t-1}\\mu_t^{t-1}\end{bmatrix}, \begin{bmatrix}C \Sigma_{t}^{t-1}C^{\top} + R &C\Sigma_{t}^{t-1}\ \Sigma_{t}^{t-1}C^{\top}&\Sigma_{t}^{t-1}\end{bmatrix}\right)$

We can write the measurement updates with the Kalman gain \(\mu_t^t = \mu_{t}^{t-1} + K_t (x_t-C \mu_t^{t-1})\\ \Sigma_t^t = \Sigma_{t}^{t-1} - K_t C \Sigma_{t}^{t-1}\)

where $K_t = \Sigma_t^{t-1}C^{\top}(C\Sigma_{t}^{t-1}C^{\top}+R)^{-1}$

Reference

  • Matrix cook book https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
  • Maximum likelihood for Multivariant Gaussian https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/other-readings/chapter13.pdf