Difference between revisions of "Bootstrap resampling"

From AstroBaki
Jump to navigationJump to search
Line 20: Line 20:
  
 
\subsection*{The Algorithm}
 
\subsection*{The Algorithm}
Let's say we observe $N$ data samples, denoted as $\vec{x} = (x_1, x_2, x_3, ..., x_N)$, and we want to compute a statistic $\hat{\theta} = s(\vec{x})$. This statistic $s$ could be the mean or median of our samples, but could also be something much more complex. In measuring $\hat{\theta}$ from our data, we want to know how close our estimator is to the true value, denoted by $\theta_\mathrm{true}$.  
+
Let's say we observe $N$ data samples, denoted as $\vec{x} = (x_1, x_2, x_3, ..., x_N)$, and we want to compute a statistic $\hat{\theta} = s(\vec{x})$. This statistic could be the mean or median of our samples, but could also be something much more complex. In measuring $\hat{\theta}$ from our data, we want to know how close our estimator is to the true value of $\theta$, so we need to compute
 +
an error estimate for $\hat{\theta}$. This can be done using the following bootstrap resampling algorithm:
 +
 
 +
\begin{enumerate}
 +
\item Make a bootstrap sample $x^{\star}$ by sampling with replacement from the original data samples. This bootstrap sample should also be of length $N$ and may contain repetitions of the same
 +
data sample (since we sampled with replacement).
 +
\item Repeat this process and create $B$ bootstrap samples. Generally, $B = 1000 - 10000$, in order to reduce the amount of random scatter in the measurement of the bootstrap error.
 +
\item Compute the same desired statistic for each of the bootstrap samples, $\hat{\theta} = s(x^{\star, b})$, where $b$ ranges from 1 to $B$.
 +
\end{enumerate}
  
 
</latex>
 
</latex>

Revision as of 14:31, 13 December 2012

Prerequisites

Short topical video

Reference Material

Bootstrap Resampling

Bootstrap resampling is a statistical technique to measure the error in a given statistic that has been computed from a sample population. It is a simple yet powerful methord that relies heavily on computational power. The basic premise is that instead of using a theoretical or mathematical model for the parent distribution from which our observed samples were drawn from, we can use the distribution of the observed samples as an approximation for the parent distribution.

The Algorithm

Let’s say we observe data samples, denoted as , and we want to compute a statistic . This statistic could be the mean or median of our samples, but could also be something much more complex. In measuring from our data, we want to know how close our estimator is to the true value of , so we need to compute an error estimate for . This can be done using the following bootstrap resampling algorithm:

  1. Make a bootstrap sample by sampling with replacement from the original data samples. This bootstrap sample should also be of length and may contain repetitions of the same data sample (since we sampled with replacement).
  2. Repeat this process and create bootstrap samples. Generally, , in order to reduce the amount of random scatter in the measurement of the bootstrap error.
  3. Compute the same desired statistic for each of the bootstrap samples, , where ranges from 1 to .