STA695 Spring 2002
Integration, in general, is most useful in statistics for computing probabilities and expectations. Of course, a probability is an expectation of an indicator variable. If a random variable X has density f(x) (could be multivariate), then
including the special case where
for a set A,
Of course, when you are asked to compute an expectation, the first thing to try is to compute the integral analytically. When this fails, numerical integration is another tool you can try. However, in multivariate settings numerical integration becomes quite time consuming to apply (the number of function evaluations increases exponentially in the dimension of the problem). Thus, it is useful to consider other methods of evaluating the integral.
Fortunately, there are methods of evaluating expectation that do not suffer this ``curse of dimensionality''. In particular, recall the strong law of large numbers, which states if
is a sequence of independent, identically distributed random variables, then the quantity
converges almost surely to E[h(X)]. Thus, if we are asked to compute an expectation (or probability), and if we can generate an iid sequence, then we may use this result to estimate the expectation. Of course, this Monte Carlo Estimate of the expectation has error, but typically we generate a sufficiently large number of random variables that error is small.
For n large, sample means are also normally distributed, so we can compute a
confidence interval for
to be
where
is the sample variance of the
values,
Thus, if we can generate a sequence of observations from
, then in theory we can compute as many observations as necessary to make our confidence interval as small as we want. If one wants a confidence interval to have width less than w, generate
until
This may require generating several sets of observations. For example, one might generate 100000 observations to start, compute the confidence interval, and decide it is too large. One can then generate another 100000 observations, add those to the original sequence, and compute another confidence interval, and so on until the confidence interval is sufficiently narrow.
Of course, if we are having trouble performing the integration analytically, we are typically not dealing with a ``familiar'' distribution, and thus generating random values may be difficult. This handout discusses two methods, rejection sampling and importance sampling, which allow a Monte Carlo estimate of an expectation.
For both methods, assume we have a density f(x) known only up to a normalizing constant. Thus,
for a density
and a potentially unknown constant
.
Suppose we can find a known
and density
such that
for all x, where
, and we know how to sample from the density
. Suppose we perform the following steps
The point of this procedure is that the density of X, given acceptance, is
. Thus, repeated uses of this procedure, retaining only accepted values of X, can yield the random sequence
required to compute Monte Carlo estimates of expectations.
To verify this claim, note that the joint density of X and U is
.
The event A is defined to be the event U<f(X)/g(X), Thus
The integral for
identical except the range of integration corresponds to X<a instead of all possible values of X. Thus
Note that this step is the one that requires
for all x. If f(x)>g(x) for some values of x, we would have to consider the integral with respect to u by cases, depending on whether or not f(x)/g(x) was greater than 1. Replacing
and
and canceling constants from the numerator and denominator, we find
which is the probability of observing X<a under the distribution
.
Recall we stated the value of
does not need to be known. This is because it cancels out of the calculation, and thus is not required to use rejection sampling. Typically both
and
must be known to compute g(x).
The efficiency of rejection sampling depends strongly on how well the function g(x) approximates f(x). We derived above that
If f(x)/g(x) tends to be small, indicating that g(x) is a poor approximation for f(x), then the probability of acceptance will be small, and the algorithm will spend most of its time rejecting X values. In contrast, good approximations will tend to have most of the candidate X values accepted, and will be reasonably equivalent to random sampling directly from
.
The central question when using rejection sampling is what to choose for the function g, typically called the envelope function. If
has bounded support, then a very crude method of choosing g(x) is to set g(x)=a, where a is a constant greater than or equal to
, a quantity which can be found numerically or crudely estimated from a plot. Of course, this constant function g may not be a good approximation for f overall, but it satisfies the definition. In general, it is difficult to find an appropriate envelope function, and different methods are used in different situations.
Example
Let
over the interval
. If we use the ``crude envelope'' method, we first find a maximum of f(x) by inspection. Plotting the function in R, we find that the maximum of f(x) is below 3.5, so we will use g(x)=3.5 over the range
.
The next step is to determine
and
from g(x). Since a density must integrate to 1, we know
and thus
and
is a uniform density on
. Thus, to generate a large random sample from
(the density proportional to f), we generate a large sample of X values from
(don't confuse that fact the
happens to be uniform in this example to the random variable U being uniform), generate an equivalent number of
random variables, and accept those
values for which
.
Importance sampling is related to rejection sampling, and may also be used to determine expectations. While we still need a function
for which
is a density we can generate from as before, the advantage to importance sampling is that we do not need f(x)<g(x) for all x.
For any function h(x) and density
,
Thus, if
is a sequence of independent random variables with density
, then
will converge to
This method requires knowledge of the normalizing constant to determine
. Suppose
is unknown, so we only have f(x) available to us. A similar calculation is still possible. Since
is assumed to be a density, we may still formally write
Thus
Multiplying the integrands in the numerator and denominator by
, we find
Thus, we may again use the law of large numbers argument. If
, then we may estimate
by
Of course, typically the 1/n terms are omitted in the numerator and denominator. I left them in this derivation because they make it clear that the numerator and denominator both converge to the appropriate values (and the denominator is nonzero), and thus the ratio converges as well.
Letting
(known as the "importance weight"), the above expression may be written
Thus,
is a weighted average of the
values. As stated in Tanner, under general conditions
converges almost surely to
. However, in practice one acheives the best estimates when
closely approximates the shape of f.
You should also note the denominator in equation (1),
, was eventually estimated by
in equation (2). Thus, one may use
as an estimate for
, a ``side benefit'' of the entire procedure.
The variance of
may be estimated by
Unlike rejection sampling, importance sampling uses all the generated
values. However, the importance sampling estimate is a weighted average, and thus the distribution of the weights plays a large role in determining the efficiency of the estimate. If a few of the
are much larger than the rest, then those
values will dominate the weighted average, and thus the resulting estimate will behave as if you generated a much smaller sample size. To counteract this, typically the function g(x) is chosen to have larger tails than f(x), resulting in f(x)/g(x) to be bounded above by 1 in the tails. In the center of the distribution, it is possible for g(x) to be greater than f(x). In any case, the close g(x) approximates f(x), the better the importance sampling estimator.
Rejection Sampling and Importance Sampling
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -split 0 -show_section_numbers rejimp.tex.
The translation was initiated by Kert Viele on Mon Jan 14 12:22:41 EST 2002