## 7.5 (0, 2)-Sequence Sampler

Another approach for generating high-quality samples takes advantage of a
remarkable property of certain low-discrepancy sequences that allows us to
satisfy two desirable properties of samples (only one of which was
satisfied with the `StratifiedSampler`): they generate sample vectors
for a pixel’s worth of image samples such that the sample values for each
pixel sample are well distributed with respect to each other, and
simultaneously such that the aggregate collection of sample values for all
of the pixel samples in the pixel are collectively well distributed.

This sequence uses the first two dimensions of a low-discrepancy sequence derived by Sobol. This sequence is a special type of low-discrepancy sequence known as a -sequence. -sequences are stratified in a very general way. For example, the first 16 samples in a -sequence satisfy the stratification constraint from stratified sampling in Section 7.3, meaning there is just one sample in each of the boxes of extent . However, they also satisfy the Latin hypercube constraint, as only one of them is in each of the boxes of extent and . Furthermore, there is only one sample in each of the boxes of extent and .

Figure 7.28 shows all of the possibilities for dividing the domain into regions where the first 16 samples of a -sequence satisfy the stratification properties. Each succeeding sequence of 16 samples from this pattern also satisfies these distribution properties.

In general, any sequence of length (where is a nonnegative
integer)
from a -sequence satisfies this general stratification constraint.
The set of *elementary intervals* in two dimensions, base 2, is
defined as

where the integer . One sample from each of the first values in the sequence will be in each of the elementary intervals. Furthermore, the same property is true for each subsequent set of values.

To understand now how -sequences can be applied to generating 2D samples, consider a pixel with image samples, each with an array of 2D samples. The first values of a -sequence are well distributed with respect to each other according to the corresponding set of elementary intervals. Furthermore, the first samples are themselves well distributed according to their corresponding elementary intervals, as are the next of them, and the subsequent ones, and so on. Therefore, we can use the first 16 -sequence samples for the samples for the array for the first image sample for a pixel, then the next 16 for the next image sample, and so forth. The result is an extremely well-distributed set of sample points.

### 7.5.1 Sampling with Generator Matrices

The Sobol sequence is based on a different mechanism for generating
sample points than the `HaltonSampler`, which used the radical inverse
in various dimensions. Even with the integer divides in the radical
inverse function converted to multiplies and shifts, the amount of
computation needed to compute the billions of samples that can be needed
for high-quality, high-resolution renderings can still be significant.
Most of the computational expense comes from the cost of performing
non-base-2 computation on computers that natively operate in base 2.
(Consider the contrast between the <<Compute base-2 radical
inverse>> fragment and the `RadicalInverseSpecialized()` template
function.)

Given the high cost of non-base-2 operations, it’s natural to try to
develop sample generation algorithms that operate entirely in base 2. One
such approach that has been effective has been to use *generator
matrices* that allow all computation to be done in the same base. Instead
of using a different base in each dimension, as the Halton sampler did, a
different generator matrix is used in each dimension. With well-chosen
matrices for each sampled dimension, it’s possible to generate very good
low-discrepancy distributions of points. For example, -sequences can
be defined using two specific generator matrices in base 2.

To see how generator matrices are used, consider an -digit number in base , where the th digit of is and where we have an generator matrix . Then the corresponding sample point is defined by

where all arithmetic is performed in the ring (in other words, when all operations are performed modulo ). This construction gives a total of points as ranges from to . If the generator matrix is the identity matrix, then this definition corresponds to the regular radical inverse, base . (It’s worth pausing to make sure you see this connection between Equations (7.7) and (7.9) before continuing.)

In this section, we will exclusively use and . While introducing a matrix to the sample generation algorithm may not seem like a step toward better performance, we’ll see that in the end the sampling code can be mapped to an implementation that uses a small number of bit operations to perform this computation in an extremely efficient manner.

The first step toward high performance comes from the fact that we’re
working in base 2; as such, all entries of are either 0 or 1
and thus we can represent either each row or each column of the matrix
with a single unsigned 32-bit integer. We’ll choose to represent columns
of the matrix as `uint32t_t`s; this choice leads to a very efficient
algorithm for multiplying the column vector by .

Now consider the task of computing the matrix-vector product; using the definition of matrix-vector multiplication, we have:

In other words, for each digit of that has a value of 1, the
corresponding column of should be summed. This addition can in
turn be performed very efficiently in : in that setting,
addition corresponds to the exclusive OR operation. (Consider the
combinations of the two possible operand values—0 and 1—and the result
of adding them , and compare to the values computed by exclusive OR
with the same operand values.) Thus, the multiplication is just a matter of exclusive ORing together the columns
of where ’s bit is 1. This computation is implemented
in the `MultiplyGenerator()` function.

Going back to Equation (7.9) now, if we denote the column vector from the product , then consider the vector product

Because the entries of are stored in a single `uint32_t`, their
value interpreted as a `uint32_t` is

If we were to reverse the order of the bits in the `uint32_t`, then
we would have the value

This is a more useful value: if we divide this value by , we get Equation (7.11), which is , the value we’re trying to compute.

Thus, if we take the result of the `MultiplyGenerator()` function,
reverse the order of the bits in the returned value (e.g., by using
`ReverseBits32()`), and then divide that integer value by to
compute a floating-point value in , we’ve computed our sample value.

To save the small cost of reversing the bits, we can equivalently reverse
the bits in all of the columns of the generator matrix `C` before
passing it to `MultiplyGenerator()`. We will use that convention in the
following.

To make -sequences useful in practice, we also need to be able to generate multiple different sets of 2D sample values for each image sample, and we would like to generate different sample values for each pixel. One approach to this problem would be to use carefully chosen nonoverlapping subsequences of the -sequence for each pixel. Another approach is to randomly scramble the -sequence, giving a new -sequence built by applying a random permutation to the base- digits of the values in the original sequence.

The scrambling approach we will use is due to Kollig and Keller (2002). It repeatedly partitions and shuffles the unit square . In each of the two dimensions, it first divides the square in half and then swaps the two halves with 50% probability. Then it splits each of the intervals and in half and randomly exchanges each of those two halves. This process continues recursively until all of the bits of the base-2 representation have been processed. This process was carefully designed so that it preserves the low-discrepancy properties of the set of points; otherwise, the advantages of the -sequence would be lost from the scrambling. Figure 7.29 shows an unscrambled -sequence and two randomly scrambled variations of it.

Two things make the scrambling process efficient: first, because we are
scrambling two sequences that are computed in base 2, the digits of
the sequences are all 0 or 1, and scrambling a particular digit is
equivalent to exclusive-ORing it with 0 or 1. Second, the simplification
is made such that at each level of the recursive scrambling, the same
decision will be made as to whether to swap each of the pairs of
subintervals or not. The result of these two design choices is that the
scrambling can be encoded as a set of bits stored in a `uint32_t` and
can be applied to the original digits via exclusive-OR operations.

The `SampleGeneratorMatrix()` function pulls these pieces
together to generate sample values.

The `SampleGeneratorMatrix()` function is already fairly efficient,
performing a handful of arithmetic operations each time through the loop
in `MultiplyGenerator()` that runs for a number of iterations
equal to the base-2 logarithm of the value `a`. Remarkably, it’s
possible to do even better by changing the order in which samples are
generated, enumerating them in *Gray code* order.

With Gray codes, successive binary values differ in only a single bit; the third column of Table 7.4 shows the first eight integers in Gray code order. Note that not only does only a single bit change between any pair of values but also that in any power-of-two-sized number of values starting from 0, the Gray code enumerates all of the values from 0 to , just in a different order than usual.

(base 10) | (binary) | Changed Bit Index | |
---|---|---|---|

0 | 000 | 000 | n/a |

1 | 001 | 001 | 0 |

2 | 010 | 011 | 1 |

3 | 011 | 010 | 0 |

4 | 100 | 110 | 2 |

5 | 101 | 111 | 0 |

6 | 110 | 101 | 1 |

7 | 111 | 100 | 0 |

Computing the th Gray code value can be done very efficiently.

By enumerating samples in Gray code order, we can take great advantage of
the fact that only a single bit of changes between subsequent
samples. Assume that we have computed the product
for some index : if another value differs by just one bit from ,
then we only need to add or subtract one column of from to
find (recall
Equation (7.10)). Even better, *both* addition
and subtraction can be performed with exclusive OR, so it doesn’t
matter which operation is needed; we only need to know which bit changed.
As can be seen from Table 7.4, the index of the bit that
changes going from to is given by the number of trailing
0s in the binary representation of . Most CPU instruction sets can
count trailing 0 bits in a single instruction.

Putting this all together, we can very efficiently generate a series of
samples using a generator matrix in Gray code order.
`GrayCodeSample()` takes a generator matrix `C`, a number of
samples to generate `n`, and stores the corresponding samples in
memory at the location pointed to by `p`.

The x86 assembly code for the heart of the inner loop (with the loop control logic elided) is wonderfully brief:

Even if one isn’t an x86 assembly language aficionado, one can appreciate that it’s an incredibly short sequence of instructions to generate each sample value.

There is a second version of `GrayCodeSample()` (not included here)
for generating 2D samples; it takes a generator matrix for each dimension
and fills in an array of `Point2f` values with the samples.

### 7.5.2 Sampler Implementation

The `ZeroTwoSequenceSampler` generates samples for positions on the
film plane, lens, and other 2D samples using scrambled
-sequences, and generates 1D samples with scrambled van
der Corput sequences.

Figure 7.30 compares the result of using a -sequence for sampling the lens for the depth of field to using a stratified pattern.

`StratifiedSampler`, and (3) an image using the

`ZeroTwoSequenceSampler`. The

`ZeroTwoSequenceSampler`’s results are better than the stratified image, although the difference is smaller than the difference between stratified and random sampling.

The constructor rounds the number of samples per pixel up to a power of 2 if necessary, since subsets of -sequences that are not a power of 2 in size are much less well distributed over than those that are.

Since the `ZeroTwoSequenceSampler` is a `PixelSampler`,
its `StartPixel()` method must not only
generate array sample values for the samples in the pixel but must also
generate samples for a number of dimensions of non-array samples.

Generating the samples for the non-array dimensions expected by the
`PixelSampler` is a matter of filling in the appropriate vectors with
the appropriate number of sample values.

The sample vector dimensions with array samples are similar, though with multiple sample values in each dimension.

The `VanDerCorput()` function generates a number of scrambled 1D
sample values using the Gray code-based sampling machinery. Although a
specialized implementation of this function that took advantage of the
structure of the identity matrix could be written, here we use the existing
Gray code implementation, which is more than sufficiently efficient.

`CVanDerCorput`Generator Matrix>>

The generator matrix for the 1D van der Corput sequence is just the identity matrix but with each column’s bits reversed, as per the earlier convention.

`CVanDerCorput`Generator Matrix>>=

There is a subtle implementation detail that must be accounted for when using scrambled -sequences. Often, integrators will use samples from more than one of the sampling patterns that the sampler creates in the process of computing the values of particular integrals. For example, they might use a sample from a 1D pattern to select one of the light sources in the scene to sample illumination from and then might use a sample from a 2D pattern to select a sample point on that light source, if it is an area light.

Even if these two patterns are computed with random scrambling with different random scramble values for each one, some correlation can still remain between elements of these patterns, such that the th element of the 1D pattern and the th element of the 2D pattern are related. As such, in the earlier area lighting example, the distribution of sample points on each light source would not in general cover the entire light due to this correlation, leading to unusual rendering errors.

This problem can be solved easily enough by randomly shuffling the various dimensions individually after they are generated. After generating a scrambled 1D low-discrepancy sampling pattern, giving a well-distributed set of samples across all of the image samples for this pixel, this function shuffles these samples in two ways. Consider, for example, a pixel with 8 image samples, each of which has 4 1D samples for the integrator (giving a total of 32 integrator samples). First, it shuffles samples within each of the 8 groups of 4 samples, putting each set of 4 into a random order. Next, it shuffles each of the 8 groups of 4 samples as a block, with respect to the other blocks of 4 samples.

The `Sobol2D()` function follows a similar structure to
`VanDerCorput()` but uses two generator matrices to generate the first
two dimensions of Sobol points. Its implementation isn’t included here.

Figure 7.31 shows the result of using the -sequence for the area lighting example scene. Note that not only does it give a visibly better image than stratified patterns, but it also does well with one light sample per image sample, unlike the stratified sampler.

`ZeroTwoSequenceSampler`is used for the area light sampling example, similar results are generated (1) with both 1 image sample and 16 light samples as well as (2) with 16 image samples and 1 light sample, thanks to the -sequence sampling pattern that ensures good distribution of samples over the pixel area in both cases. Compare these images to Figure 7.24, where the stratified pattern generates a much worse set of light samples when only 1 light sample is taken for each of the 16 image samples.