## 7.5 (0, 2)-Sequence Sampler

Another approach for generating high-quality samples takes advantage of a remarkable property of certain low-discrepancy sequences that allows us to satisfy two desirable properties of samples (only one of which was satisfied with the StratifiedSampler): they generate sample vectors for a pixel’s worth of image samples such that the sample values for each pixel sample are well distributed with respect to each other, and simultaneously such that the aggregate collection of sample values for all of the pixel samples in the pixel are collectively well distributed.

This sequence uses the first two dimensions of a low-discrepancy sequence derived by Sobol. This sequence is a special type of low-discrepancy sequence known as a -sequence. -sequences are stratified in a very general way. For example, the first 16 samples in a -sequence satisfy the stratification constraint from stratified sampling in Section 7.3, meaning there is just one sample in each of the boxes of extent . However, they also satisfy the Latin hypercube constraint, as only one of them is in each of the boxes of extent and . Furthermore, there is only one sample in each of the boxes of extent and .

Figure 7.28 shows all of the possibilities for dividing the domain into regions where the first 16 samples of a -sequence satisfy the stratification properties. Each succeeding sequence of 16 samples from this pattern also satisfies these distribution properties.

In general, any sequence of length (where is a nonnegative integer) from a -sequence satisfies this general stratification constraint. The set of elementary intervals in two dimensions, base 2, is defined as

where the integer . One sample from each of the first values in the sequence will be in each of the elementary intervals. Furthermore, the same property is true for each subsequent set of values.

To understand now how -sequences can be applied to generating 2D samples, consider a pixel with image samples, each with an array of 2D samples. The first values of a -sequence are well distributed with respect to each other according to the corresponding set of elementary intervals. Furthermore, the first samples are themselves well distributed according to their corresponding elementary intervals, as are the next of them, and the subsequent ones, and so on. Therefore, we can use the first 16 -sequence samples for the samples for the array for the first image sample for a pixel, then the next 16 for the next image sample, and so forth. The result is an extremely well-distributed set of sample points.

### 7.5.1 Sampling with Generator Matrices

The Sobol sequence is based on a different mechanism for generating sample points than the HaltonSampler, which used the radical inverse in various dimensions. Even with the integer divides in the radical inverse function converted to multiplies and shifts, the amount of computation needed to compute the billions of samples that can be needed for high-quality, high-resolution renderings can still be significant. Most of the computational expense comes from the cost of performing non-base-2 computation on computers that natively operate in base 2. (Consider the contrast between the <<Compute base-2 radical inverse>> fragment and the RadicalInverseSpecialized() template function.)

Given the high cost of non-base-2 operations, it’s natural to try to develop sample generation algorithms that operate entirely in base 2. One such approach that has been effective has been to use generator matrices that allow all computation to be done in the same base. Instead of using a different base in each dimension, as the Halton sampler did, a different generator matrix is used in each dimension. With well-chosen matrices for each sampled dimension, it’s possible to generate very good low-discrepancy distributions of points. For example, -sequences can be defined using two specific generator matrices in base 2.

To see how generator matrices are used, consider an -digit number in base , where the th digit of is and where we have an generator matrix . Then the corresponding sample point is defined by

(7.9)

where all arithmetic is performed in the ring (in other words, when all operations are performed modulo ). This construction gives a total of points as ranges from to . If the generator matrix is the identity matrix, then this definition corresponds to the regular radical inverse, base . (It’s worth pausing to make sure you see this connection between Equations (7.7) and (7.9) before continuing.)

In this section, we will exclusively use and . While introducing a matrix to the sample generation algorithm may not seem like a step toward better performance, we’ll see that in the end the sampling code can be mapped to an implementation that uses a small number of bit operations to perform this computation in an extremely efficient manner.

The first step toward high performance comes from the fact that we’re working in base 2; as such, all entries of are either 0 or 1 and thus we can represent either each row or each column of the matrix with a single unsigned 32-bit integer. We’ll choose to represent columns of the matrix as uint32_ts; this choice leads to a very efficient algorithm for multiplying the column vector by .

Now consider the task of computing the matrix-vector product; using the definition of matrix-vector multiplication, we have:

In other words, for each digit of that has a value of 1, the corresponding column of should be summed. This addition can in turn be performed very efficiently in : in that setting, addition corresponds to the exclusive OR operation. (Consider the combinations of the two possible operand values—0 and 1—and the result of adding them , and compare to the values computed by exclusive OR with the same operand values.) Thus, the multiplication is just a matter of exclusive ORing together the columns of where ’s bit is 1. This computation is implemented in the MultiplyGenerator() function.

<<Low Discrepancy Inline Functions>>+=
inline uint32_t MultiplyGenerator(const uint32_t *C, uint32_t a) { uint32_t v = 0; for (int i = 0; a != 0; ++i, a >>= 1) if (a & 1) v ^= C[i]; return v; }

Going back to Equation (7.9) now, if we denote the column vector from the product , then consider the vector product

Because the entries of are stored in a single uint32_t, their value interpreted as a uint32_t is

If we were to reverse the order of the bits in the uint32_t, then we would have the value

This is a more useful value: if we divide this value by , we get Equation (7.11), which is , the value we’re trying to compute.

Thus, if we take the result of the MultiplyGenerator() function, reverse the order of the bits in the returned value (e.g., by using ReverseBits32()), and then divide that integer value by to compute a floating-point value in , we’ve computed our sample value.

To save the small cost of reversing the bits, we can equivalently reverse the bits in all of the columns of the generator matrix C before passing it to MultiplyGenerator(). We will use that convention in the following.

To make -sequences useful in practice, we also need to be able to generate multiple different sets of 2D sample values for each image sample, and we would like to generate different sample values for each pixel. One approach to this problem would be to use carefully chosen nonoverlapping subsequences of the -sequence for each pixel. Another approach is to randomly scramble the -sequence, giving a new -sequence built by applying a random permutation to the base- digits of the values in the original sequence.

The scrambling approach we will use is due to Kollig and Keller (2002). It repeatedly partitions and shuffles the unit square . In each of the two dimensions, it first divides the square in half and then swaps the two halves with 50% probability. Then it splits each of the intervals and in half and randomly exchanges each of those two halves. This process continues recursively until all of the bits of the base-2 representation have been processed. This process was carefully designed so that it preserves the low-discrepancy properties of the set of points; otherwise, the advantages of the -sequence would be lost from the scrambling. Figure 7.29 shows an unscrambled -sequence and two randomly scrambled variations of it.

Two things make the scrambling process efficient: first, because we are scrambling two sequences that are computed in base 2, the digits of the sequences are all 0 or 1, and scrambling a particular digit is equivalent to exclusive-ORing it with 0 or 1. Second, the simplification is made such that at each level  of the recursive scrambling, the same decision will be made as to whether to swap each of the pairs of subintervals or not. The result of these two design choices is that the scrambling can be encoded as a set of bits stored in a uint32_t and can be applied to the original digits via exclusive-OR operations.

The SampleGeneratorMatrix() function pulls these pieces together to generate sample values.

<<Low Discrepancy Inline Functions>>+=
inline Float SampleGeneratorMatrix(const uint32_t *C, uint32_t a, uint32_t scramble = 0) { return (MultiplyGenerator(C, a) ^ scramble) * 0x1p-32f; }

The SampleGeneratorMatrix() function is already fairly efficient, performing a handful of arithmetic operations each time through the loop in MultiplyGenerator() that runs for a number of iterations equal to the base-2 logarithm of the value a. Remarkably, it’s possible to do even better by changing the order in which samples are generated, enumerating them in Gray code order.

With Gray codes, successive binary values differ in only a single bit; the third column of Table 7.4 shows the first eight integers in Gray code order. Note that not only does only a single bit change between any pair of values but also that in any power-of-two-sized number of values starting from 0, the Gray code enumerates all of the values from 0 to , just in a different order than usual.

(base 10) (binary)Changed Bit Index
0 000 000 n/a
1 001 001 0
2 010 011 1
3 011 010 0
4 100 110 2
5 101 111 0
6 110 101 1
7 111 100 0

Computing the th Gray code value can be done very efficiently.

<<Low Discrepancy Inline Functions>>+=
inline uint32_t GrayCode(uint32_t n) { return (n >> 1) ^ n; }

By enumerating samples in Gray code order, we can take great advantage of the fact that only a single bit of changes between subsequent samples. Assume that we have computed the product for some index : if another value differs by just one bit from , then we only need to add or subtract one column of from to find (recall Equation (7.10)). Even better, both addition and subtraction can be performed with exclusive OR, so it doesn’t matter which operation is needed; we only need to know which bit changed. As can be seen from Table 7.4, the index of the bit that changes going from to is given by the number of trailing 0s in the binary representation of . Most CPU instruction sets can count trailing 0 bits in a single instruction.

Putting this all together, we can very efficiently generate a series of samples using a generator matrix in Gray code order. GrayCodeSample() takes a generator matrix C, a number of samples to generate n, and stores the corresponding samples in memory at the location pointed to by p.

<<Low Discrepancy Inline Functions>>+=
inline void GrayCodeSample(const uint32_t *C, uint32_t n, uint32_t scramble, Float *p) { uint32_t v = scramble; for (uint32_t i = 0; i < n; ++i) { p[i] = v * 0x1p-32f; /* 1/2^32 */ v ^= C[CountTrailingZeros(i + 1)]; } }

The x86 assembly code for the heart of the inner loop (with the loop control logic elided) is wonderfully brief:

xorps %xmm1, %xmm1 cvtsi2ssq %rax, %xmm1 mulss %xmm0, %xmm1 movss %xmm1, (%rcx,%rdx,4) incq %rdx bsfl %edx, %eax xorl \$31, %eax xorl (%rdi,%rax,4), %esi

Even if one isn’t an x86 assembly language aficionado, one can appreciate that it’s an incredibly short sequence of instructions to generate each sample value.

There is a second version of GrayCodeSample() (not included here) for generating 2D samples; it takes a generator matrix for each dimension and fills in an array of Point2f values with the samples.

### 7.5.2 Sampler Implementation

The ZeroTwoSequenceSampler generates samples for positions on the film plane, lens, and other 2D samples using scrambled -sequences, and generates 1D samples with scrambled van der Corput sequences.

<<ZeroTwoSequenceSampler Declarations>>=
class ZeroTwoSequenceSampler : public PixelSampler { public: <<ZeroTwoSequenceSampler Public Methods>>
ZeroTwoSequenceSampler(int64_t samplesPerPixel, int nSampledDimensions = 4); void StartPixel(const Point2i &); std::unique_ptr<Sampler> Clone(int seed); int RoundCount(int count) const { return RoundUpPow2(count); }
};

Figure 7.30 compares the result of using a -sequence for sampling the lens for the depth of field to using a stratified pattern.

The constructor rounds the number of samples per pixel up to a power of 2 if necessary, since subsets of -sequences that are not a power of 2 in size are much less well distributed over than those that are.

<<ZeroTwoSequenceSampler Method Definitions>>=
ZeroTwoSequenceSampler::ZeroTwoSequenceSampler(int64_t samplesPerPixel, int nSampledDimensions) : PixelSampler(RoundUpPow2(samplesPerPixel), nSampledDimensions) { }

<<ZeroTwoSequenceSampler Public Methods>>=
int RoundCount(int count) const { return RoundUpPow2(count); }

Since the ZeroTwoSequenceSampler is a PixelSampler, its StartPixel() method must not only generate array sample values for the samples in the pixel but must also generate samples for a number of dimensions of non-array samples.

<<ZeroTwoSequenceSampler Method Definitions>>+=
void ZeroTwoSequenceSampler::StartPixel(const Point2i &p) { <<Generate 1D and 2D pixel sample components using -sequence>>
for (size_t i = 0; i < samples1D.size(); ++i) VanDerCorput(1, samplesPerPixel, &samples1D[i], rng); for (size_t i = 0; i < samples2D.size(); ++i) Sobol2D(1, samplesPerPixel, &samples2D[i], rng);
<<Generate 1D and 2D array samples using -sequence>>
for (size_t i = 0; i < samples1DArraySizes.size(); ++i) VanDerCorput(samples1DArraySizes[i], samplesPerPixel, &sampleArray1D[i], rng); for (size_t i = 0; i < samples2DArraySizes.size(); ++i) Sobol2D(samples2DArraySizes[i], samplesPerPixel, &sampleArray2D[i], rng);
PixelSampler::StartPixel(p); }

Generating the samples for the non-array dimensions expected by the PixelSampler is a matter of filling in the appropriate vectors with the appropriate number of sample values.

<<Generate 1D and 2D pixel sample components using -sequence>>=
for (size_t i = 0; i < samples1D.size(); ++i) VanDerCorput(1, samplesPerPixel, &samples1D[i], rng); for (size_t i = 0; i < samples2D.size(); ++i) Sobol2D(1, samplesPerPixel, &samples2D[i], rng);

The sample vector dimensions with array samples are similar, though with multiple sample values in each dimension.

<<Generate 1D and 2D array samples using -sequence>>=
for (size_t i = 0; i < samples1DArraySizes.size(); ++i) VanDerCorput(samples1DArraySizes[i], samplesPerPixel, &sampleArray1D[i], rng); for (size_t i = 0; i < samples2DArraySizes.size(); ++i) Sobol2D(samples2DArraySizes[i], samplesPerPixel, &sampleArray2D[i], rng);

The VanDerCorput() function generates a number of scrambled 1D sample values using the Gray code-based sampling machinery. Although a specialized implementation of this function that took advantage of the structure of the identity matrix could be written, here we use the existing Gray code implementation, which is more than sufficiently efficient.

<<Low Discrepancy Inline Functions>>+=
inline void VanDerCorput(int nSamplesPerPixelSample, int nPixelSamples, Float *samples, RNG &rng) { uint32_t scramble = rng.UniformUInt32(); <<Define CVanDerCorput Generator Matrix>>
const uint32_t CVanDerCorput[] = { 0b10000000000000000000000000000000, 0b1000000000000000000000000000000, 0b100000000000000000000000000000, 0b10000000000000000000000000000, <<Remainder of Van Der Corput generator matrix entries>>
0b10000, 0b100000, 0b1000000, 0b10000000, 0b100000000, 0b1000000000, 0b10000000000, 0b100000000000, 0b1000000000000, 0b10000000000000, 0b100000000000000, 0b1000000000000000, 0b10000000000000000, 0b100000000000000000, 0b1000000000000000000, 0b10000000000000000000, 0b100000000000000000000, 0b1000000000000000000000, 0b10000000000000000000000, 0b100000000000000000000000, 0b1000000000000000000000000, 0b10000000000000000000000000, 0b100000000000000000000000000, 0b1000000000000000000000000000, 0b10000000000000000000000000000, 0b100000000000000000000000000000, 0b1000000000000000000000000000000, 0b10000000000000000000000000000000,
};
int totalSamples = nSamplesPerPixelSample * nPixelSamples; GrayCodeSample(CVanDerCorput, totalSamples, scramble, samples); <<Randomly shuffle 1D sample points>>
for (int i = 0; i < nPixelSamples; ++i) Shuffle(samples + i * nSamplesPerPixelSample, nSamplesPerPixelSample, 1, rng); Shuffle(samples, nPixelSamples, nSamplesPerPixelSample, rng);
}

The generator matrix for the 1D van der Corput sequence is just the identity matrix but with each column’s bits reversed, as per the earlier convention.

<<Define CVanDerCorput Generator Matrix>>=
const uint32_t CVanDerCorput[] = { 0b10000000000000000000000000000000, 0b1000000000000000000000000000000, 0b100000000000000000000000000000, 0b10000000000000000000000000000, <<Remainder of Van Der Corput generator matrix entries>>
0b10000, 0b100000, 0b1000000, 0b10000000, 0b100000000, 0b1000000000, 0b10000000000, 0b100000000000, 0b1000000000000, 0b10000000000000, 0b100000000000000, 0b1000000000000000, 0b10000000000000000, 0b100000000000000000, 0b1000000000000000000, 0b10000000000000000000, 0b100000000000000000000, 0b1000000000000000000000, 0b10000000000000000000000, 0b100000000000000000000000, 0b1000000000000000000000000, 0b10000000000000000000000000, 0b100000000000000000000000000, 0b1000000000000000000000000000, 0b10000000000000000000000000000, 0b100000000000000000000000000000, 0b1000000000000000000000000000000, 0b10000000000000000000000000000000,
};

There is a subtle implementation detail that must be accounted for when using scrambled -sequences. Often, integrators will use samples from more than one of the sampling patterns that the sampler creates in the process of computing the values of particular integrals. For example, they might use a sample from a 1D pattern to select one of the light sources in the scene to sample illumination from and then might use a sample from a 2D pattern to select a sample point on that light source, if it is an area light.

Even if these two patterns are computed with random scrambling with different random scramble values for each one, some correlation can still remain between elements of these patterns, such that the th element of the 1D pattern and the th element of the 2D pattern are related. As such, in the earlier area lighting example, the distribution of sample points on each light source would not in general cover the entire light due to this correlation, leading to unusual rendering errors.

This problem can be solved easily enough by randomly shuffling the various dimensions individually after they are generated. After generating a scrambled 1D low-discrepancy sampling pattern, giving a well-distributed set of samples across all of the image samples for this pixel, this function shuffles these samples in two ways. Consider, for example, a pixel with 8 image samples, each of which has 4 1D samples for the integrator (giving a total of 32 integrator samples). First, it shuffles samples within each of the 8 groups of 4 samples, putting each set of 4 into a random order. Next, it shuffles each of the 8 groups of 4 samples as a block, with respect to the other blocks of 4 samples.

<<Randomly shuffle 1D sample points>>=
for (int i = 0; i < nPixelSamples; ++i) Shuffle(samples + i * nSamplesPerPixelSample, nSamplesPerPixelSample, 1, rng); Shuffle(samples, nPixelSamples, nSamplesPerPixelSample, rng);

The Sobol2D() function follows a similar structure to VanDerCorput() but uses two generator matrices to generate the first two dimensions of Sobol points. Its implementation isn’t included here.

<<Low Discrepancy Declarations>>+=
inline void Sobol2D(int nSamplesPerPixelSample, int nPixelSamples, Point2f *samples, RNG &rng);

Figure 7.31 shows the result of using the -sequence for the area lighting example scene. Note that not only does it give a visibly better image than stratified patterns, but it also does well with one light sample per image sample, unlike the stratified sampler.