## 8.7 Sobol’ Samplers

While the Halton sequence is effective for Monte Carlo integration,
each radical inverse computation requires one integer division for each
digit. The integer division instruction is one of the slowest ones on most
processors, which can affect overall rendering performance, especially in
highly optimized renderers. Therefore, in this section we will describe
three `Sampler`s that are based on the Sobol’ sequence, a
low-discrepancy sequence that is defined entirely in base 2, which leads to
efficient implementations.

The base-2 radical inverse can be computed more
efficiently than the way that the base-agnostic `RadicalInverse()`
function computes it. The key is to take advantage of the fact that
numbers are already represented in base 2 on digital
computers. If is a 64-bit value, then from
Equation (8.18),

where are its bits. First, consider reversing its bits, still represented as an integer value, which gives

If we then divide this value by , we have

which equals (recall Equation (8.19)).
Thus, the base-2 radical inverse can equivalently
be computed using a bit reverse and a power-of-two division. The division
can be replaced with multiplication by the corresponding inverse
power-of-two, which gives the same result with IEEE floating point.
Some processors provide a native instruction that directly reverses the bits in a
register; otherwise it can be done in operations, where
is the number of bits. (See the implementation of `ReverseBits32()` in
Section B.2.7.)

While the implementation of a function that generates Halton points could
be optimized by taking advantage of this for the first dimension where
, performance would not improve for any of the remaining dimensions, so
the overall benefit would be low. The Sobol’ sequence uses for all
dimensions, which allows it to benefit in all cases from computers’ use of
base 2 internally. So that each dimension has a different set of sample values,
it uses a different *generator matrix* for each dimension, where the
generator matrices are carefully chosen so that the resulting sequence has
low discrepancy.

To see how generator matrices are used, consider an -digit number in base , where the th digit of is and where we have an generator matrix . Then the corresponding sample point is defined by

where all arithmetic is performed in the ring (in other words, when all operations are performed modulo ). This construction gives a total of points as ranges from to . If the generator matrix is the identity matrix, then this definition corresponds to the regular radical inverse, base . (It is worth pausing to make sure you see this connection between Equations (8.19) and (8.22) before continuing.)

In this section, we will exclusively use and . While introducing a matrix to each dimension of the sample generation algorithm may not seem like a step toward better performance, we will see that in the end the sampling code can be mapped to an implementation that uses a small number of bit operations that perform this computation in an extremely efficient manner.

The first step toward high performance comes from the fact that we are
working in base 2; as such, all entries of are either 0 or 1
and thus we can represent either each row or each column of the matrix with
a single unsigned 32-bit integer. We will choose to represent columns of the
matrix as `uint32_t`s; this choice leads to an efficient
algorithm for multiplying the column vector by .

Now consider the task of computing the matrix-vector product; using the definition of matrix-vector multiplication, we have:

In other words, for each digit of that has a value of 1, the
corresponding column of should be summed. This addition can in
turn be performed efficiently in : in that setting,
addition corresponds to the bitwise exclusive
or operation. (Consider the
combinations of the two possible operand values—0 and 1—and the result
of adding them , and compare to the values computed by exclusive
or
with the same operand values.) Thus, the multiplication is just a matter of exclusive oring together the columns
of where ’s bit is 1. This computation is implemented
in the `MultiplyGenerator()` function.

Going back to Equation (8.22), if we denote the column vector from the product , then consider the vector product

Applying the same ideas as we did before to derive an efficient base-2
radical inverse algorithm, this value can also be computed by reversing the
bits of and dividing by . To save the small cost of reversing
the bits, we can equivalently reverse the bits in all the columns of the
generator matrix before passing it to
`MultiplyGenerator()`. We will use that convention in what follows.

We will not discuss how the Sobol’ matrices are derived in a way that leads to a low-discrepancy sequence; the “Further Reading” section has pointers to more details. However, the first few Sobol’ generator matrices are shown in Figure 8.34. Note that the first is the identity, corresponding to the van der Corput sequence. Subsequent dimensions have various fractal-like structures to their entries.

### 8.7.1 Stratification over Elementary Intervals

The first two dimensions of the Sobol’ sequence are stratified in a very general way that makes them particularly effective in integration. For example, the first 16 samples satisfy the stratification constraint from stratified sampling in Section 8.5, meaning there is just one sample in each of the boxes of extent . However, they are also stratified over all the boxes of extent and . Furthermore, there is only one sample in each of the boxes of extent and . Figure 8.35 shows all the possibilities for dividing the domain into regions where the first 16 Sobol’ samples satisfy these stratification properties.

Not only are corresponding stratification constraints obeyed by any
power-of-2 set of samples starting from the beginning of the sequence, but
subsequent power-of-2-sized sets of samples fulfill them as well.
More formally, any sequence of length (where is a
nonnegative integer) satisfies this general
stratification constraint. The set of *elementary intervals* in two
dimensions, base 2, is defined as

where the integer . One sample
from each of the first values in the sequence will be in each
of the elementary intervals. Furthermore, the same property is true for
each subsequent set of values.
Such a sequence is called a *-sequence*.

### 8.7.2 Randomization and Scrambling

For the same reasons as were discussed in Section 8.6.2 in the context of the Halton sequence, it is also useful to be able to scramble the Sobol’ sequence. We will now define a few small classes that scramble a given sample value using various approaches. As with the generation of Sobol’ samples, scrambling algorithms for them can also take advantage of their base-2 representation to improve their efficiency.

All the following randomization classes take an unsigned 32-bit integer that they should interpret as a fixed-point number with 0 digits before and 32 digits after the radix point. Put another way, after randomization, this value will be divided by to yield the final sample value in .

The simplest approach is not to randomize the sample at all. In that
case, the value is returned unchanged; this is implemented by
`NoRandomizer`.

Alternatively, random permutations can be applied to the digits, such as
was done using the `DigitPermutation` class with the Halton sampler.
In base 2, however, a random permutation of can be represented
with a single bit, as there are only two unique permutations. If the
permutation is denoted by a bit with value 1 and the permutation
is denoted by 0, then the permutation can be applied by computing
the exclusive or of the permutation bit with a digit’s bit. Therefore, the
permutation for all 32 bits can be represented by a 32-bit integer and all
of the permutations can be applied in a single operation by computing the
exclusive or of the provided value with the permutation.

Owen scrambling is also effective with Sobol’ points. `pbrt` provides
two implementations of it, both of which take advantage of their base-2
representation. `FastOwenScrambler` implements a highly efficient
approach, though the spectral properties of the resulting points are not
quite as good as a true Owen scramble.

Its implementation builds on the fact that in base 2, if a number is multiplied by an even value, then the value of any particular bit in it only affects the bits above it in the result. Equivalently, for any bit in the result, it is only affected by the bits below it and the even multiplicand. One way to see why this is so is to consider long multiplication (as taught in grade school) applied to binary numbers. Given two -digit binary numbers and where is the th digit of , then using Equation (8.18), we have

Thus, for any digit where is one, the value of is shifted bits to the left and added to the final result and so any digit of the result only depends on lower digits of .

The bits in the value provided to the randomization class must be reversed
so that the low bit corresponds to in the final sample value. Then,
the properties illustrated in Equation (8.25) can be
applied: the product of an even value with the sample value `v` can be
interpreted as a bitwise permutation as was done in the
`BinaryPermuteScrambler`, allowing the use of an exclusive
or to
permute all the bits. After a few rounds of this and a few operations
to mix the seed value in, the bits are reversed again before being returned.

The `OwenScrambler` class implements a full Owen scramble, operating
on each bit in turn.

`b`in

`v`>> } return v; }

The first bit (corresponding to in the final sample value) is handled specially, since there are no bits that precede it to affect its randomization. It is randomly flipped according to the seed value provided to the constructor.

`b`in

`v`>> } return v; }

For all the following bits, a bit mask is computed such that the bitwise
and of the mask with the value gives the bits above `b`—the values of
which should determine the permutation that is used for the current bit.
Those are run through `MixBits()` to get a hashed value that is then
used to determine whether or not to flip the current bit.

`b`in

`v`>>=

### 8.7.3 Sobol’ Sample Generation

We now have the pieces needed to implement functions that generate Sobol’
samples. The `SobolSample()` function performs this task for a given
sample index `a` and dimension, applying the provided randomizer to
the sample before returning it.

Because this function is templated on the type of the
randomizer, a specialized instance of it will be compiled using the
provided randomizer, leading to the randomization algorithm being expanded
inline in the function. For `pbrt`’s purposes, there is no need for a more
general mechanism for sample randomization, so the small performance
benefit is worth taking in this implementation approach.

`v`using generator matrices>>

Samples are computed using the Sobol’ generator matrices, following the
approach described by Equation (8.23). All the
generator matrices are stored consecutively in the `SobolMatrices32`
array. Each one has `SobolMatrixSize` columns, so scaling the
dimension by `SobolMatrixSize` brings us to
the first column of the matrix for the given dimension.

`v`using generator matrices>>=

The value is then randomized with the given randomizer before being
rescaled to . (The constant `0x1p-32` is , expressed
as a hexadecimal floating-point number.)

### 8.7.4 Global Sobol’ Sampler

The `SobolSampler` generates samples by direct evaluation of the
-dimensional Sobol’ sequence. Like the `HaltonSampler`, it scales
the first two dimensions of the sequence to cover a range of image pixels.
Thus, in a similar fashion, nearby pixels have well-distributed
-dimensional sample points not just individually but also with respect
to nearby pixels.

`randomize`>>

The `SobolSampler` uniformly scales the first two dimensions by the
smallest power of 2 that causes the sample domain to cover the
image area to be sampled. As with the `HaltonSampler`, this
specific scaling scheme is chosen in order to make it easier to compute the
reverse mapping from pixel coordinates to the sample indices that land
in each pixel.

All four of the randomization approaches from
Section 8.7.2 are supported by the
`SobolSampler`; `randomize` encodes which one to apply.

The sampler needs to record the current pixel for use in its
`GetPixel2D()` method and, like other samplers, tracks the current
dimension in its `dimension` member variable.

The `SobolIntervalToIndex()` function returns the index of the
`sampleIndex`th sample in the pixel `p`, if the
sampling domain has been scaled by to cover the pixel
sampling area.

The general approach used to derive the algorithm it implements is similar
to that used by the Halton sampler in its `StartPixelSample()`
method. Here, scaling by a power of two means that the base-2 logarithm of
the scale gives the number of digits of the product
that form the scaled sample’s integer component. To find the values of
that give a particular integer value after scaling, we can compute the
inverse of : given

then equivalently

We will not include the implementation of this function here.

Sample generation is now straightforward. There is the usual management of
the `dimension` value, again with the first two dimensions reserved
for the pixel sample, and then a call to `SampleDimension()` gives the
sample for a single Sobol’ dimension.

The `SampleDimension()` method takes care of calling
`SobolSample()` for the current sample index and specified dimension
using the appropriate randomizer.

`randomize`>>

If a randomizer is being used, a seed value must be computed for it. Note
that the hash value passed to each randomizer is based solely on the
current dimension and user-provided seed, if any. It must *not* be
based on the current pixel or the current sample index within the pixel,
since the same randomization should be used at all the pixels and all
the samples within them.

`randomize`>>=

2D sample generation is easily implemented using `SampleDimension()`.
If all sample dimensions have been consumed, `Get2D()` goes back to
the start and skips the first two dimensions, as was done in the `HaltonSampler`.

Pixel samples are generated using the first two dimensions of the Sobol’
sample. `SobolIntervalToIndex()` does not account for randomization,
so the `NoRandomizer` is always used for the pixel sample, regardless
of the value of `randomize`.

The samples returned for the pixel position need to be adjusted so that
they are offsets within the current pixel. Similar to what was done in the
`HaltonSampler`, the sample value is scaled so that the pixel
coordinates are in the integer component of the result. The remaining
fractional component gives the offset within the pixel that the sampler
returns.

### 8.7.5 Padded Sobol’ Sampler

The `SobolSampler` generates sample points that have low discrepancy
over all of their dimensions. However, the distribution of samples in
two-dimensional slices of the - dimensional space is not necessarily
particularly good. Figure 8.36 shows an example.

For rendering, this state of affairs means that, for example, the samples
taken over the surface of a light source at a given pixel may not be well
distributed. It is of only slight solace to know that the full set of
-dimensional samples are well distributed in return.
Figure 8.37 shows this problem in
practice with the `SobolSampler`: 2D projections of the form shown in
Figure 8.36 end up generating a characteristic
checkerboard pattern in the image at low sampling rates.

`SobolSampler`at a Low Sampling Rate. With that sampler, these sorts of checkerboard patterns can result due to structure in the lower-dimensional projections of the form shown in Figure 8.36.

*(Killeroo model courtesy of headus/Rezard.)*

Therefore, the `PaddedSobolSampler` generates samples from the
Sobol’ sequence in a way that focuses on returning good distributions for
the dimensions used by each 1D and 2D sample independently. It does so via
padding samples, similarly to the `StratifiedSampler`, but here using
Sobol’ samples rather than jittered samples.

`dim`>>

The constructor, not included here, initializes the following member
variables from provided values. As with the `SobolSampler`, using a
pixel sample count that is not a power of 2 will give suboptimal results; a
warning is issued in this case.

`StartPixelSample()`,
also not included here, just records the specified pixel, sample index, and
dimension.

1D samples are generated by randomly shuffling a randomized van der Corput sequence.

`dim`>>

Here, the permutation used for padding is based on the current pixel and dimension. It must not be based on the sample index, as the same permutation should be applied to all sample indices of a given dimension in a given pixel.

Given the permuted sample index value `index`, a separate method,
`SampleDimension()`, takes care of generating the corresponding
Sobol’ sample. The high bits of the hash value are
reused for the sample’s randomization; doing so should be safe, since
`PermutationElement()` uses the hash it is passed in an entirely
different way than any of the sample randomization schemes do.

`dim`>>=

`SampleDimension()` follows the same approach as the corresponding
method in `SobolSampler`, creating the appropriate randomizer before
calling `SobolSample()`.

Padded 2D samples are generated starting with a similar permutation of sample indices.

Randomization also parallels the 1D case; again, bits from `hash` are
used both for the random permutation of sample indices and for
sample randomization.

For this sampler, pixel samples are generated in the same manner as all
other 2D samples, so the sample generation request is forwarded on to
`Get2D()`.

### 8.7.6 Blue Noise Sobol’ Sampler

`ZSobolSampler` is a third sampler based on the Sobol’ sequence. It
is also based on padding 1D and 2D Sobol’ samples, but uses sample indices in a
way that leads to a blue noise distribution of sample values. This tends to
push error to higher frequencies, which in turn makes it appear more visually
pleasing. Figure 8.38 compares a scene rendered with the
`PaddedSobolSampler` and the `ZSobolSampler`; both have essentially the
same MSE, but the one rendered using `ZSobolSampler` looks better to
most human observers. This `Sampler` is the default one used by `pbrt` if no sampler is specified in the scene description.

`PaddedSobolSampler`. (b) Rendered with the

`ZSobolSampler`. Both images are rendered using 1 sample per pixel and have the same overall error, but the second image looks much better thanks to a blue noise distribution of error.

*(Dragon model courtesy of the Stanford Computer Graphics Laboratory.)*

`sampleIndex`>>

`sampleIndex`>>

`permutations`>>

`mortonIndex`>>

`p`to use for

`digit`>>

This sampler generates blue noise samples by taking advantage of the properties of -sequences. To understand the idea behind its implementation, first consider rendering a two-pixel image using 16 samples per pixel where a set of 2D samples are used for area light source sampling in each pixel. If the first 16 samples from a -sequence are used for the first pixel and the next 16 for the second, then not only will each pixel individually use well-stratified samples, but the set of all 32 samples will collectively be well stratified thanks to the stratification of -sequences over elementary intervals (Section 8.7.1). Consequently, the samples used in each pixel will generally be in different locations than in the other pixel, which is precisely the sample decorrelation exhibited by blue noise. (See Figure 8.39.)

More generally, if all the pixels in an image take different power-of-2 aligned and sized segments of samples from a single large set of Sobol’ samples in a way that nearby pixels generally take adjacent segments, then the distribution of samples across the entire image will be such that pixels generally use different sample values than their neighbors. Allocating segments of samples in scanline order would give good distributions along scanlines, but it would not do much from scanline to scanline. The Morton curve, which was introduced earlier in Section 7.3.3 in the context of linear bounding volume hierarchies, gives a better mechanism for this task: if we compute the Morton code for each pixel and then use that to determine the pixel’s starting index into the full set of Sobol’ samples, then nearby pixels—those that are nearby in both and —will tend to use nearby segments of the samples. This idea is illustrated in Figure 8.40.

*(Dragon model courtesy of the Stanford Computer Graphics Laboratory.)*

Used directly in this manner, the Morton curve can lead to visible structure in the image; see Figure 8.41, where samples were allocated in that way. This issue can be addressed with a random permutation of the Morton indices interpreted as base-4 numbers, which effectively groups pairs of one bit from and one bit from in the Morton index into single base-4 digits. Randomly permuting these digits still maintains much of the spatial coherence of the Morton curve; see Figure 8.42 for an illustration of the permutation approach. Figure 8.38(b) shows the resulting improvement in a rendered image.

A second problem with the approach as described so far is that it does not randomize the order of sample indices within a pixel, as is necessary for padding samples across different dimensions. This shortcoming can be addressed by appending the bits of the sample index within a pixel to the pixel’s Morton code and then including those in the index randomization as well.

In addition to the usual configuration parameters, the `ZSobolSampler`
constructor also stores the base-2 logarithm of the number of samples per
pixel as well as the number of base-4 digits in the full extended Morton
index that includes the sample index.

The `StartPixelSample()` method’s main task is to construct the
initial unpermuted sample index by computing the pixel’s Morton code and
then appending the sample index, using a left shift to make space for it.
This value is stored in `mortonIndex`.

Sample generation is similar to the `PaddedSobolSampler` with the
exception that the index of the sample is found with a call to the
`GetSampleIndex()` method (shown next), which randomizes
`mortonIndex`. The <<Generate 1D Sobol sample at
`sampleIndex`>> fragment then calls `SobolSample()` to generate the
`sampleIndex`th sample using the appropriate randomizer. It is
otherwise effectively the same as the
`PaddedSobolSampler::SampleDimension()` method, so its implementation
is not included here.

`sampleIndex`>>

2D samples are generated in a similar manner, using the first two Sobol’
sequence dimensions and a sample index returned by
`GetSampleIndex()`. Here as well, the fragment that dispatches calls
to `SobolSample()` corresponding to the chosen randomization scheme is
not included.

`sampleIndex`>>

Pixel samples are generated the same way as other 2D sample distributions.

The `GetSampleIndex()` method is where most of the complexity of this
sampler lies. It computes a random permutation of the digits of
`mortonIndex`, including handling the case where the number of samples
per pixel is only a power of 2 but not a power of 4; that case needs
special treatment since the total number of bits in the index is odd, which
means that only one of the two bits needed for the last base-4 digit is
available.

`permutations`>>

`mortonIndex`>>

`p`to use for

`digit`>>

We will find it useful to have all of the permutations of four
elements explicitly enumerated; they are stored in the
`permutations` array.

`permutations`>>=

The digits are randomized from most significant to least significant. In the case of the number of samples only being a power of 2, the loop terminates before the last bit, which is handled specially since it is not a full base-4 digit.

`mortonIndex`>>

`p`to use for

`digit`>>

After the current digit is extracted from `mortonIndex`, it is
permuted using the selected permutation before being shifted back into
place to be added to `sampleIndex`.

`mortonIndex`>>=

`p`to use for

`digit`>>

Which permutation to use is determined by hashing both the higher-order digits and the current sample dimension. In this way, the index is hashed differently for different dimensions, which randomizes the association of samples in different dimensions for padding. The use of the higher-order digits in this way means that this approach bears some resemblance to Owen scrambling, though here it is applied to sample indices rather than sample values. The result is a top-down hierarchical randomization of the Morton curve.

`p`to use for

`digit`>>=

In the case of a power-of-2 sample count, the single remaining bit in
`mortonIndex` is handled specially, though following the same approach
as the other digits: the higher-order bits and dimension are hashed to
choose a permutation. In this case, there are only two possible
permutations, and as with the `BinaryPermuteScrambler`, an exclusive
or is sufficient to apply whichever of them was selected.

### 8.7.7 Evaluation

In this section we have defined three `Sampler`s, each of which
supports four randomization algorithms, giving a total of 12 different ways of
generating samples. All are effective samplers, though their
characteristics vary. In the interest of space, we will not include
evaluations of every one of them here but will focus on the most
significant differences among them.

`FastOwenScrambler`, (d) scrambled using hashed Owen scrambling. The unscrambled Sobol’ points have a remarkably bad power spectral density (PSD) and random digit permutations are of only slight benefit. Owen scrambling greatly improves the PSD.

Figure 8.43(a) shows the PSD of the unscrambled 2D Sobol’ point set; it is an especially bad power spectrum. Like the Halton points, the 2D Sobol’ points have low energy along the two axes thanks to well-distributed 1D projections, but there is significant structured variation at higher off-axis frequencies, including regions with very high PSD values. As seen in Figure 8.43(b), randomizing the Sobol’ points with random digit permutations only slightly improves the power spectrum. Only with the Owen scrambling algorithms does the power spectrum start to become uniform at higher frequencies, though some structure still remains (Figures 8.43(c) and (d)).

These well-distributed 1D projections are one reason why low-discrepancy sequences are generally more effective than stratified patterns: they are more robust with respect to preserving their good distribution properties after being transformed to other domains. Figure 8.44 shows what happens when a set of 16 sample points are transformed to be points on a skinny quadrilateral by scaling them to cover its surface; samples from the Sobol’ sequence remain well distributed, but samples from a stratified pattern fare worse.

`StratifiedSampler`(normalized MSE 1), (b) an image rendered using the

`HaltonSampler`(normalized MSE 1.44), (c) an image rendered using the

`PaddedSobolSampler`(normalized MSE 0.96), (d) an image rendered using the

`SobolSampler`(normalized MSE 0.64), and (e) an image rendered using the

`ZSobolSampler`(normalized MSE 0.84). All the low-discrepancy samplers use hashed Owen scrambling for randomization and 16 samples per pixel.

Returning to the simple scene with defocus blur that was used in
Figure 8.23,
Figure 8.45 compares using the Halton sampler to the
three Sobol’ samplers for rendering that scene. We can see that the
Halton sampler has higher error than the `StratifiedSampler`, which is
due to its 2D projections (as are used for sampling the lens) not
necessarily being well distributed. The `PaddedSobolSampler` gives
little benefit over the stratified sampler, since for sampling a lens, the
stratification is the most important one and both fulfill
that. The `SobolSampler` has remarkably low error, even though the
rendered image shows the characteristic structure of 2D projections of the
Sobol’ sequence. The `ZSobolSampler` combines reasonably low
error with the visual benefit of distributing its error with blue
noise.

Figure 8.46 shows the performance of Sobol’ sample points with the two test functions. It does well with both, but especially so with Owen scrambled points and the smooth Gaussian function, where it has an asymptotically faster rate of convergence. Figure 8.47 graphs MSE versus the sample count for the blurry dragon test scene from Figure 8.32. Both the Halton and Sobol’ samplers have roughly 10% lower MSE than independent sampling at equivalent sample counts.