## 7.3 Stratified Sampling

The first `Sampler` implementation that we will introduce subdivides pixel areas
into rectangular regions and generates a single sample inside each region.
These regions are commonly called *strata*, and this sampler is called
the `StratifiedSampler`. The key idea behind stratification is that by
subdividing the sampling domain into nonoverlapping regions and taking a
single sample from each one, we are less likely to miss important features
of the image entirely, since the samples are guaranteed not to all be close
together. Put another way, it does us no good if many samples are taken
from nearby points in the sample space, since each new sample doesn’t add
much new information about the behavior of the image function. From a
signal processing viewpoint, we are implicitly defining an overall sampling
rate such that the smaller the strata are, the more of them we have, and thus
the higher the sampling rate.

The stratified sampler places each sample at a random point inside each
stratum by *jittering* the center point of the stratum by a random
amount up to half the stratum’s width and height. The nonuniformity that
results from this jittering helps turn aliasing into noise, as discussed in
Section 7.1. The sampler also offers an
unjittered mode, which gives uniform sampling in the strata; this mode is
mostly useful for comparisons between different sampling techniques rather
than for rendering high quality images.

Direct application of stratification to high-dimensional sampling quickly
leads to an intractable number of samples. For example, if we divided the
5D image, lens, and time sample space into four strata in
each dimension, the total number of samples per pixel would be . We could reduce this impact by taking fewer samples in some
dimensions (or not stratifying some dimensions, effectively using a single
stratum), but we would then lose the benefit of having well-stratified
samples in those dimensions. This problem with stratification is known as
the *curse of dimensionality*.

We can reap most of the benefits of stratification without paying the price
in excessive total sampling by computing lower dimensional stratified
patterns for subsets of the domain’s dimensions and then randomly
associating samples from each set of dimensions. (This process is
sometimes called *padding*.)
Figure 7.16 shows the basic idea: we might want to take
just four samples per pixel but still have the samples be stratified over
all dimensions. We independently generate four 2D stratified image
samples, four 1D stratified time samples, and four 2D stratified lens
samples. Then we randomly associate a time and lens sample value with
each image sample. The result is that each pixel has
samples that together have good coverage of the sample space.

Figure 7.17 shows the improvement in image quality from using stratified lens samples versus using unstratified random samples when rendering depth of field.

`StratifiedSampler`, which stratified both the image and, more importantly for this image, the lens samples. Stratification makes a substantial improvement for this situation.

Figure 7.18 shows a comparison of a few sampling patterns. The first is a completely random pattern: we generated a number of samples without using the strata at all. The result is terrible; some regions have few samples and other areas have clumps of many samples. The second is a uniform stratified pattern. In the last, the uniform pattern has been jittered, with a random offset added to each sample’s location, keeping it inside its cell. This gives a better overall distribution than the purely random pattern while preserving the benefits of stratification, though there are still some clumps of samples and some regions that are undersampled.

Figure 7.19 shows images rendered using the
`StratifiedSampler` and shows how jittered sample positions turn
aliasing artifacts into less objectionable noise.

As a `PixelSampler` subclass, the implementation of `StartPixel()`
must both generate 1D and 2D samples for the number of dimensions
`nSampledDimensions` passed to the `PixelSampler` constructor as
well as samples for the requested arrays.

After the initial stratified samples are generated, they are randomly shuffled; this is the padding approach described at the start of the section. If this shuffling wasn’t done, then the sample dimensions’ values would be correlated in a way that would lead to errors in images—for example, both the first 2D sample used to choose the film location as well as the first 2D lens sample would always both be in the lower left stratum adjacent to the origin.

The 1D and 2D stratified sampling routines are implemented as utility functions. Both loop over the given number of strata in the domain and place a sample point in each one.

`StratifiedSample2D()` similarly generates samples in the range .

The `Shuffle()` function randomly permutes an array of
`count` sample values, each of which has `nDimensions`
dimensions. (In other words, blocks of values of size `nDimensions`
are permuted.)

Arrays of samples present us with a quandary: for example, if an integrator asks for an array of 64 2D sample values in the sample vector for each sample in a pixel, the sampler has two different goals to try to fulfill:

- It’s desirable that the samples in the array themselves be well distributed in 2D (e.g., by using an stratified grid). Stratification here will improve the quality of the computed results for each individual sample vector.
- It’s desirable to ensure that each the samples in the array for one image sample isn’t too similar to any of the sample values for samples nearby in the image. Rather, we’d like the points to be well distributed with respect to their neighbors, so that over the region around a single pixel, there is good coverage of the entire sample space.

Rather than trying to solve both of these problems simultaneously here,
the `StratifiedSampler`
only addresses the first one. The other samplers later in this chapter
will revisit this issue with more sophisticated techniques and solve both
of them simultaneously to various degrees.

A second complication comes from the fact that the caller may have asked
for an arbitrary number of samples per image sample, so stratification may
not be easily applied. (For example, how do we generate a stratified 2D
pattern of seven samples?) We could just generate an or stratified pattern, but this only gives us the benefit of
stratification in one dimension and no guarantee of a good pattern in the
other dimension. A `StratifiedSampler::RoundSize()` method could
round requests up to the next number that’s the square of integers, but
instead we will use an approach called *Latin hypercube sampling*
(LHS), which can generate any number of samples in any number of dimensions
with a reasonably good distribution.

LHS uniformly divides each dimension’s axis into regions and generates a jittered sample in each of the regions along the diagonal, as shown on the left in Figure 7.20. These samples are then randomly shuffled in each dimension, creating a pattern with good distribution.

An advantage of LHS is that it minimizes clumping of the samples when they are projected onto any of the axes of the sampling dimensions. This property is in contrast to stratified sampling, where of the samples in a 2D pattern may project to essentially the same point on each of the axes. Figure 7.21 shows this worst-case situation for a stratified sampling pattern.

In spite of addressing the clumping problem, LHS isn’t necessarily an improvement to stratified sampling; it’s easy to construct cases where the sample positions are essentially colinear and large areas of the sampling domain have no samples near them (e.g., when the permutation of the original samples is the identity, leaving them all where they started). In particular, as increases, Latin hypercube patterns are less and less effective compared to stratified patterns.

The general-purpose `LatinHypercube()` function generates an arbitrary
number of LHS samples in an arbitrary dimension. The number of elements in
the `samples` array should thus be `nSamples*nDim`.

To do the permutation, this function loops over the samples, randomly permuting the
sample points in one dimension at a time. Note that this is a different
permutation than the earlier `Shuffle()` routine: that routine does one
permutation, keeping all `nDim` sample points in each sample together,
while here `nDim` separate permutations of a single dimension at a
time are done
(Figure 7.22).

`Shuffle()`routine moves entire blocks of

`nDims`elements around. (b) The permutation for Latin hypercube sampling permutes each dimension’s samples independently. Here, the shuffling of the second dimension’s samples from a four-element pattern of three dimensions is shown.

Given the `LatinHypercube()` function, we can now write the code to
compute sample arrays for the current pixel. 1D samples are stratified and
then randomly shuffled, while 2D samples are generated using Latin
hypercube sampling.

We’ll use the scene in Figure 7.23 to
demonstrate properties of some of the `Sampler` implementations.

Figure 7.24 shows the improvement from good
samples for the `DirectLightingIntegrator`. The first image was computed
with 1 image sample per pixel, each with 16 shadow samples. The second
was computed with 16 image samples per pixel, each with 1 shadow
sample. Because the `StratifiedSampler` could generate a good LHS
pattern for the first case, the quality of the shadow is much better, even
with the same total number of shadow samples taken.