ABC Overview

Approximate Bayesian Computation (ABC) is a collection of methods for likelihood free model parameter inference.

Simulation based Rejection ABC

The most basic variant of ABC is referred to as Rejection ABC. The user-defined inputs to this algorithm include:

  • The prior distribution $\pi$, defined over model parameter space $\Theta$
  • The model simulation function $f$
  • Reference data $\mathcal{D}$
  • Acceptance threshold $\varepsilon$
  • Summary statistic $S$ and distance function $d$
  • Desired size of the posterior sample and maximum number of simulations to perform

The pseudocode for simulation-based Rejection ABC in GpABC looks as follows:

  • While the posterior sample is not full, and maximum number of simulations has not been reached:
    • Sample parameter vector (particle) $\theta$ from $\pi$
    • Simulate data $x = f(\theta)$
    • Compute the distance between the summary statistic of the simulated data and that of the reference data $y = d(S(x), S(\mathcal{D}))$
    • If $y \leq \varepsilon$, then accept $\theta$ in the posterior sample

This algorithm is implemented by Julia function SimulatedABCRejection.

Emulation based Rejection ABC

Some models are computationally expensive to simulate. Simulation based ABC for such models would take unreasonably long time to accept enough posterior particles.

To work around this issue, GpABC provides emulation based Rejection ABC. Rather than simulating the model for each sampled particle, this algorithm runs a small number of simulations in the beginning, and uses their results to train the emulator.

User-defined inputs for this algorithm are very similar to those for Simulation based Rejection ABC:

  • The prior distribution $\pi$, defined over model parameter space $\Theta$
  • The model simulation function $f$
  • Reference data $\mathcal{D}$
  • Acceptance threshold $\varepsilon$
  • Summary statistic $S$ and distance function $d$
  • Number of design particles to sample: $n$
  • Batch size to use for regression: $m$
  • Desired size of the posterior sample and maximum number of regressions to perform

The pseudocode for emulation-based Rejection ABC in GpABC looks as follows:

  • Sample $n$ design particles from $\pi$: $\theta_1, \ldots, \theta_n$
  • Simulate the model for the design particles: $x_1, \ldots, x_n = f(\theta_1), \ldots, f(\theta_n)$
  • Compute distances to the reference data: $y_1, \ldots, y_n = d(S(x_1), S(\mathcal{D})), \ldots, d(S(x_n), S(\mathcal{D}))$
  • Use $\theta_1, \ldots, \theta_n$ and $y_1, \ldots, y_n$ to train the emulator $\textbf{gpr}$
  • While the posterior sample is not full, and maximum number of regressions has not been reached:
    • Sample $m$ particles from $\pi$: $\theta_1, \ldots, \theta_m$
    • Compute the approximate distances by running the emulator regression: $y_1, \ldots, y_m = \textbf{gpr}(\theta_1), \ldots, \textbf{gpr}(\theta_m)$
    • For all $j = 1 \ldots m$, if $y_j \leq \varepsilon$, then accept $\theta_j$ in the posterior sample

This algorithm is implemented by Julia function EmulatedABCRejection.

Simulation based ABC - SMC

This sophisticated version of ABC allows to specify a schedule of thresholds, as opposed to just a single value. A number of simulation based ABC iterations are then executed, one iteration per threshold. The posterior of the preceding iteration serves as a prior to the next one.

The user-defined inputs to this algorithm are similar to those of Simulation based Rejection ABC:

  • The prior distribution $\pi$, defined over model parameter space $\Theta$
  • The model simulation function $f$
  • Reference data $\mathcal{D}$
  • A schedule of thresholds $\varepsilon_1, \ldots, \varepsilon_T$
  • Summary statistic $S$ and distance function $d$
  • Desired size of the posterior sample and maximum number of simulations to perform

The pseudocode for simulation-based ABC-SMC in GpABC looks as follows:

  • For $t$ in $1 \ldots T$
    • While the posterior sample is not full, and maximum number of simulations has not been reached:
      • if $t = 1$
        • Sample the particle $\theta$ from $\pi$
        • Use pdf of $\pi$ at each sampled particle as its weight $w$
      • else
        • Sample the particle $\theta$ from the posterior distribution of iteration $t-1$ with weights $w$
        • Perturb $\theta$ using a perturbation kernel
        • Recompute the weights $w$ (see (Toni et al, 2009) for details)
      • Simulate data $x = f(\theta)$
      • Compute the distance between the summary statistic of the simulated data and that of the reference data $y = d(S(x), S(\mathcal{D}))$
      • If $y \leq \varepsilon$, then accept $\theta$ in the posterior sample

This algorithm is implemented by Julia function SimulatedABCSMC.

Emulation based ABC - SMC

Similarly to Simulation based ABC - SMC, Emulation based Rejection ABC has an SMC counterpart. A threshold schedule must be supplied for this algorithm. A number of emulation based ABC iterations are then executed, one iteration per threshold. The posterior of the preceding iteration serves as a prior to the next one. Depending on user-defined settings, either the same emulator can be re-used for all iterations, or the emulator could be re-trained for each iteration.

The user-defined inputs to this algorithm are similar to those of Emulation based Rejection ABC:

  • The prior distribution $\pi$, defined over model parameter space $\Theta$
  • The model simulation function $f$
  • Reference data $\mathcal{D}$
  • A schedule of thresholds $\varepsilon_1, \ldots, \varepsilon_T$
  • Summary statistic $S$ and distance function $d$
  • Number of design particles to sample: $n$
  • Batch size to use for regression: $m$
  • Desired size of the posterior sample and maximum number of regressions to perform

The pseudocode for emulation-based ABC-SMC in GpABC looks as follows:

  • Sample $n$ design particles from $\pi$: $\theta_1, \ldots, \theta_n$
  • Simulate the model for the design particles: $x_1, \ldots, x_n = f(\theta_1), \ldots, f(\theta_n)$
  • Compute distances to the reference data: $y_1, \ldots, y_n = d(S(x_1), S(\mathcal{D})), \ldots, d(S(x_n), S(\mathcal{D}))$
  • Use $\theta_1, \ldots, \theta_n$ and $y_1, \ldots, y_n$ to train the emulator $\textbf{gpr}$
  • For $t$ in $1 \ldots T$
    • Advanced: optionally, if $t > 1$, re-traing the emulator. See GpABC.abc_retrain_emulator.
    • While the posterior sample is not full, and maximum number of regressions has not been reached:
      • if $t = 1$
        • Sample $m$ particles from $\pi$: $\theta_1, \ldots, \theta_m$
        • Use pdf of $\pi$ at each sampled particle as its weight $w$
      • else
        • Sample $m$ particles $\theta_1, \ldots, \theta_m$ from the posterior distribution of iteration $t-1$ with weights $w$
        • Perturb $\theta$ using a perturbation kernel
        • Recompute the weights $w$ (see (Toni et al, 2009) for details)
      • Sample $m$ particles from $\pi$: $\theta_1, \ldots, \theta_m$
      • Compute the approximate distances by running the emulator regression: $y_1, \ldots, y_m = \textbf{gpr}(\theta_1), \ldots, \textbf{gpr}(\theta_m)$
      • For all $j = 1 \ldots m$, if $y_j \leq \varepsilon$, then accept $\theta_j$ in the posterior sample

This algorithm is implemented by Julia function EmulatedABCSMC.

References

  • Toni, T., Welch, D., Strelkowa, N., Ipsen, A., & Stumpf, M. P. H. (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Interface, (July 2008), 187–202. https://doi.org/10.1098/rsif.2008.0172