# Markov-Chain Monte Carlo

## Monte Carlo Methods

Monte Carlo methods are a class of computational algorithms. They’re powerful for exploring the behavior of complex systems for which a complete analytical description does not exist, for nonlinear systems with coupled parameters, or for systems with significant uncertainty in their parameters.

Though the term Monte Carlo covers a broad scope of algorithms, there is a basic structure. Monte Carlo methods generally follow these steps:

1. Define the domain of possible inputs, usually in the form of probability distributions.
2. Generate a set of inputs, randomly sampled from the input distributions.
3. Calculate the deterministic quantities of interest for that set of inputs.
4. Repeat repeat steps 2 and 3 until your parameters are well-sampled and aggregate the results.

One of the classic examples is to calculate the value of ${\displaystyle \pi }$ through a Monte Carlo simulation. We know, by the definition of ${\displaystyle \pi }$, that the area enclosed by a quarter-circle with radius ${\displaystyle r}$ is ${\displaystyle A_{c}=\pi r^{2}/4}$, while the area of square with side ${\displaystyle r}$ is ${\displaystyle A_{s}=r^{2}}$. If we can measure these two areas, we can calculate a value for ${\displaystyle \pi }$:

${\displaystyle \pi =4\times {\frac {A_{c}}{A_{s}}}\,\!}$

To measure these areas, we construct our inputs and our function of interest:

${\displaystyle x\sim {\text{Uniform}}(0,1)\,\!}$
${\displaystyle y\sim {\text{Uniform}}(0,1)\,\!}$
${\displaystyle f(x,y)=\left\{{\begin{array}{lr}1&:{\sqrt {x^{2}+y^{2}}}r\end{array}}\right.\,\!}$

Then we sample our ${\displaystyle x,y}$ over ${\displaystyle N}$ trials, accumulating the value of ${\displaystyle f}$, thereby getting an estimate for ${\displaystyle \pi }$:

${\displaystyle {\frac {\sum f}{N}}\rightarrow {\frac {A_{c}}{A_{s}}}\,\!}$
${\displaystyle {\frac {\sum f}{4N}}\rightarrow \pi \,\!}$

A quick visualization of this is here.

## Why Markov Chains?

Markov chains are systems that undergo discrete transitions amongst countable states. They are random, memoryless processes — the next state depends stochastically on the current state, and not on the previous states. In MCMC, they’re used to construct random walks through the parameter space, preferentially sampling volumes that contribute significantly to the result and ignoring those parts of the parameter space that do not.

This is powerful because, for most Monte Carlo problems, only a small volume of the multi-dimensional parameter space contributes significantly to the final result. In this case, computing a full Monte Carlo simulation is incredibly inefficient, since most of the computations are spent mapping out volumes of low significance. We can, instead, construct a Markov Chain that randomly walks through our input parameter distributions preferentially mapping the high-significance volumes. Of course, we must be careful that the random walks of our Markov Chains faithfully converge to the true distribution we’re attempting to sample — thankfully there are several well-known random walk algorithms that do. The most common include:

## Model Fitting with MCMC

So, now that MCMC gives us an efficient way to run complex and multidimensional Monte Carlo simulations, how does that help us as physicists? The short answer is that the MCMC framework provides a powerful method for fitting models to observations and making inferences from those observations.

Let’s start by formulating the problem in Bayesian terms. Say we have some model, ${\displaystyle y}$ = ${\displaystyle f(x,\Theta )}$, where ${\displaystyle x}$ and ${\displaystyle y}$ are two physical quantities and ${\displaystyle \Theta }$ is some set of model parameters. Now, we observe some data ${\displaystyle D}$, which consists of observations of ${\displaystyle x}$ and ${\displaystyle y}$. Now, we want to know what our data ${\displaystyle D}$ implies about the values of the model parameters, ${\displaystyle \Theta }$. In statistical parlance, we want ${\displaystyle P(\Theta |D)}$, or the posterior probability distribution of our model parameters, given the data.

We can calculate it with Bayes’ theorem:

${\displaystyle P(\Theta |D)\propto P(D|\Theta )P(\Theta )\,\!}$

where ${\displaystyle P(D|\Theta )}$ is the probability of observing our dataset, given a certain model (often called ’likelihood’), and ${\displaystyle P(\Theta )}$ is the probability of having a certain set of model parameters (called your ’prior’).

In this formulation, all of our data points are samples drawn from random variables, usually assumed to have a Gaussian probability distribution about the measured value. This lets us explicitly account for any type of error in our observations — we can handle errors in both the ordinate and the abcissa for any plot! Most other fitting methods (e.g. Least Squares, ${\displaystyle \chi ^{2}}$-minimization) cannot.

This gives us a deterministic function (RHS of equation above) to calculate over a domain of possible input parameters — a great problem for MCMC. The calculated posterior distribution ${\displaystyle P(\Theta |D)}$ gives us the probability distribution for each parameter, from which we can get our best fit value (the mean) and confidence intervals (quantiles).

### MCMC in Practice

There are several well-established codebases for performing MCMC fitting, a choice few are listed below:

A complete example using Python and the pymc package to fit observations of galactic surface brightness is available here.

If you don’t have the IPython notebook but would still like to follow the example above, a static version of the notebook is available (you can download the above files and follow along in the terminal).