## 13.8 Careful Sample Placement

A classic and effective family of techniques for variance reduction is
based on the careful placement of samples in order to better capture the
features of the integrand (or, more accurately, to be less likely to miss
important features). These techniques are used extensively in `pbrt`.
Indeed, one of the tasks of `Sampler`s in
Chapter 7 was to generate well-distributed
samples for use by the integrators for just this reason, although at the
time we offered only an intuitive sense of why this was worthwhile. Here
we will justify that extra work in the context of Monte Carlo integration.

### 13.8.1 Stratified Sampling

Stratified sampling was first introduced in
Section 7.3, and we now have the tools to
motivate its use. Stratified sampling works by subdividing the
integration domain into nonoverlapping regions
. Each region is called a
*stratum*, and they must completely cover the original domain:

To draw samples from , we will draw samples from each , according to densities inside each stratum. A simple example is supersampling a pixel. With stratified sampling, the area around a pixel is divided into a grid, and a sample is drawn from each grid cell. This is better than taking random samples, since the sample locations are less likely to clump together. Here we will show why this technique reduces variance.

Within a single stratum , the Monte Carlo estimate is

where is the th sample drawn from density . The overall estimate is , where is the fractional volume of stratum ().

The true value of the integrand in stratum is

and the variance in this stratum is

Thus, with samples in the stratum, the variance of the per-stratum estimator is . This shows that the variance of the overall estimator is

If we make the reasonable assumption that the number of samples is proportional to the volume , then we have , and the variance of the overall estimator is

To compare this result to the variance without stratification, we note that
choosing an unstratified sample is equivalent to choosing a random stratum
according to the discrete probability distribution defined by the volumes
and then choosing a random sample in . In this sense,
is chosen *conditionally* on , so it can be shown using
conditional probability that

where is the mean of over the whole domain . See Veach (1997) for a derivation of this result.

There are two things to notice about this expression. First, we know that the
right-hand sum must be nonnegative, since variance is always nonnegative.
Second, it demonstrates that stratified sampling can never increase variance. In
fact, stratification always reduces variance unless the right-hand sum is
exactly 0. It can only be 0 when the function has the same
mean over each stratum . In fact, for stratified sampling to work
best, we would like to maximize the right-hand sum, so it is best to
make the strata have means that are as unequal as possible.
This explains why *compact* strata are desirable if one does not know
anything about the function . If the strata are wide, they will contain
more variation and will have closer to the true mean .

Figure 13.17 shows the effect of using stratified sampling versus a uniform random distribution for sampling ray directions for glossy reflection. There is a reasonable reduction in variance at essentially no cost in running time.

The main downside of stratified sampling is that it suffers from the same “curse of dimensionality” as standard numerical quadrature. Full stratification in dimensions with strata per dimension requires samples, which quickly becomes prohibitive. Fortunately, it is often possible to stratify some of the dimensions independently and then randomly associate samples from different dimensions, as was done in Section 7.3. Choosing which dimensions are stratified should be done in a way that stratifies dimensions that tend to be most highly correlated in their effect on the value of the integrand (Owen 1998). For example, for the direct lighting example in Section 13.7.1, it is far more effective to stratify the pixel positions and to stratify the ray direction—stratifying and would almost certainly be ineffective.

Another solution to the curse of dimensionality that has many of the same advantages of stratification is to use Latin hypercube sampling (also introduced in Section 7.3), which can generate any number of samples independent of the number of dimensions. Unfortunately, Latin hypercube sampling isn’t as effective as stratified sampling at reducing variance, especially as the number of samples taken becomes large. Nevertheless, Latin hypercube sampling is provably no worse than uniform random sampling and is often much better.

### 13.8.2 Quasi Monte Carlo

The low-discrepancy sampling techniques introduced in
Chapter 7 are the foundation of a branch of Monte Carlo
called *quasi Monte Carlo*. The key component of quasi–Monte Carlo
techniques is that they replace the pseudo-random numbers used in standard Monte Carlo
with low-discrepancy point sets generated by carefully designed deterministic
algorithms.

The advantage of this approach is that for many integration problems, quasi–Monte Carlo techniques have asymptotically faster rates of convergence than methods based on standard Monte Carlo. Many of the techniques used in regular Monte Carlo algorithms can be shown to work equally well with quasi–random sample points, including importance sampling. Some others (e.g., rejection sampling) cannot. While the asymptotic convergence rates are not generally applicable to the discontinuous integrands in graphics because they depend on smoothness properties in the integrand, quasi Monte Carlo nevertheless generally performs better than regular Monte Carlo for these integrals in practice. The “Further Reading” section at the end of this chapter has more information about this topic.

In `pbrt`, we have generally glossed over the differences between these two
approaches and have localized them in the `Sampler`s in
Chapter 7. This introduces the possibility of
subtle errors if a `Sampler` generates quasi–random sample points that an
`Integrator` then improperly uses as part of an implementation of an
algorithm that is not suitable for quasi Monte Carlo. As long as
`Integrator`s only use these sample points for importance sampling or other
techniques that are applicable in both approaches, this isn’t a problem.

### 13.8.3 Warping Samples and Distortion

When applying stratified sampling or low-discrepancy sampling to problems
like choosing points on light sources for integration for area lighting,
`pbrt` generates a set of samples over the domain and then uses
algorithms based on the transformation methods introduced in
Sections 13.5
and 13.6 to map these samples to points on the light source. Implicit in this process is the
expectation that the transformation to points on the light source will
generally preserve the stratification properties of the samples from
—in other words, nearby samples should map to nearby positions
on the surface of the light, and faraway samples should map to far-apart
positions on the light. If the mapping does not preserve this
property, then the benefits of stratification are lost.

Shirley’s square-to-circle mapping (Figure 13.13) preserves stratification more effectively than the straightforward mapping (Figure 13.12), which has less compact strata away from the center. This issue also explains why low-discrepancy sequences are generally more effective than stratified patterns in practice: they are more robust with respect to preserving their good distribution properties after being transformed to other domains. Figure 13.18 shows what happens when a set of 16 well-distributed sample points are transformed to be points on a skinny quadrilateral by scaling them to cover its surface; samples from a -sequence remain well distributed, but samples from a stratified pattern fare much less well.