7.4 The Halton Sampler
The underlying goal of the StratifiedSampler is to generate a well-distributed but non-uniform set of sample points, with no two sample points too close together and no excessively large regions of the sample space that have no samples. As Figure 7.18 showed, a jittered pattern does this much better than a random pattern does, although its quality can suffer when samples in adjacent strata happen to be close to the shared boundary of their two strata.
This section introduces the HaltonSampler, which is based on algorithms that directly generate low-discrepancy point sets. Unlike the points generated by the StratifiedSampler, the HaltonSampler not only generates points that are guaranteed to not clump too closely together, but it also generates points that are simultaneously well distributed over all of the dimensions of the sample vector—not just one or two dimensions at a time, as the StratifiedSampler did.
7.4.1 Hammersley and Halton Sequences
The Halton and Hammersley sequences are two closely related low-discrepancy point sets. Both are based on a construction called the radical inverse, which is based on the fact that a positive integer value can be expressed in a base with a sequence of digits uniquely determined by
where all digits are between 0 and .
The radical inverse function in base converts a nonnegative integer to a fractional value in by reflecting these digits about the radix point:
Thus, the contribution of the digit to the radical inverse is .
One of the simplest low-discrepancy sequences is the van der Corput sequence, which is a 1D sequence given by the radical inverse function in base 2:
Table 7.3 shows the first few values of the van der Corput sequence. Notice how it recursively splits the intervals of the 1D line in half, generating a sample point at the center of each interval.
The discrepancy of this sequence is
which matches the best discrepancy that has been attained for infinite sequences in dimensions,
To generate points in an -dimensional Halton sequence, we use the radical inverse base , with a different base for each dimension of the pattern. The bases used must all be relatively prime to each other, so a natural choice is to use the first prime numbers ):
One of the most useful characteristics of the Halton sequence is that it can be used even if the total number of samples needed isn’t known in advance; all prefixes of the sequence are well distributed, so as additional samples are added to the sequence low discrepancy will be maintained. (However, its distribution is best when the total number of samples is the product of powers of the bases for exponents .)
The discrepancy of an -dimensional Halton sequence is
which is asymptotically optimal.
If the number of samples is fixed, the Hammersley point set can be used, giving slightly lower discrepancy. Hammersley point sets are defined by
where is the total number of samples to be taken and as before all of the bases are relatively prime. Figure 7.25(a) shows a plot of the first 216 points of the 2D Halton sequence. Figure 7.25(b) shows the first 256 points of the Hammersley sequence.
The function RadicalInverse() computes the radical inverse for a given number a using the baseIndexth prime number as the base. The function is implemented using an enormous switch statement, where baseIndex is mapped to the appropriate prime number and then a separate RadicalInverseSpecialized() template function actually computes the radical inverse. (The reason for the curious switch-based structure will be explained shortly.)
For the base-2 radical inverse, we can take advantage of the fact that numbers in digital computers are already represented in base 2 to compute the radical inverse more efficiently. For a 64-bit value , we have from Equation (7.6)
First consider the result of reversing the bits of , still considering it as an integer value, which gives
If we then divide this value by , we have
which is . Thus, the base-2 radical inverse can equivalently be computed with a bit reverse and a power-of-two division.
The bits of an integer quantity can be efficiently reversed with a series of logical bit operations. The first line of the ReverseBits32() function, which reverses the bits of a 32-bit integer, swaps the lower 16 bits with the upper 16 bits of the value. The next line simultaneously swaps the first 8 bits of the result with the second 8 bits and the third 8 bits with the fourth. This process continues until the last line, which swaps adjacent bits. To understand this code, it’s helpful to write out the binary values of the various hexadecimal constants. For example, 0xff00ff00 is 11111111000000001111111100000000 in binary; it’s easy to see that a bitwise OR with this value masks off the first and third 8-bit quantities.
The bits of a 64-bit value can then be reversed by reversing the two 32-bit components individually and then interchanging them.
To compute the base-2 radical inverse, then, we reverse the bits and multiply by , where the hexadecimal floating-point constant 0x1p-64 is used for the value . As explained in Section 3.9.1, implementing a power-of-two division via the corresponding power-of-two multiplication gives the same result with IEEE floating point. (And floating-point multiplication is generally more efficient than floating-point division.)
For other bases, the RadicalInverseSpecialized() template function computes the radical inverse by computing the digits starting with and computing a series where , such that
(For example, in base 10, it would convert the value 1234 to 4321.) This value can be found entirely using integer arithmetic, without accumulating any round-off error.
The final value of the radical inverse is then found by converting to floating-point and multiplying by , where is the number of digits in the value, to get the value in Equation (7.7). The term for this multiplication is built up in invBaseN as the digits are processed.
A natural question to ask would be why a template function parameterized on the base is used here (rather than, say, a regular function call that took the base as a parameter, which would avoid the generation of a separate code path for each base). The motivation is that integer division is shockingly slow on modern CPUs, and much more efficient approaches are possible for division by a compile-time constant.
For example, integer division of a 32-bit value by 3 can be computed exactly by multiplying this value by 2863311531 to get a 64-bit intermediate and then shifting the result right by 33 bits; these are both fairly efficient operations. (A similar approach can be used for dividing 64-bit values by 3, but the magic constant is much larger; see Warren (2006) for more about these techniques.) Thus, using a template function here allows the compiler to see that the division to compute the value of next in the while loop is actually a division by a constant and gives it a chance to apply this optimization. The code with this optimization runs 5.9 times faster on a 2015-era laptop than an implementation based on integer division instructions.
Another optimization is that we avoid computing a running sum over reversed digits multiplied by the reciprocal base; instead, this multiplication is postponed all the way until the end when the loop terminates. The main issue here is that floating-point and integer units on current processors operate fairly independently from each other. Referencing an integer variable within a floating computation in a tight loop would introduce pipeline bubbles related to the amount of time that is needed to convert and move the values from one unit to the other.
It will be useful to be able to compute the inverse of the radical inverse function; the InverseRadicalInverse() function takes the reversed integer digits in some base, corresponding to reversedDigits in the RadicalInverseSpecialized() template function immediately before being multiplied by the factor to convert to a floating-point value in . Note that in order to be able to compute the inverse correctly, the total number of digits in the original value must be known: for example, both 1234 and 123400 are converted to 4321 after the integer-only part of the radical inverse algorithm; trailing zeros become leading zeros, which are lost.
The Hammersley and Halton sequences have the shortcoming that as the base increases, sample values can exhibit surprisingly regular patterns. This issue can be addressed with scrambled Halton and Hammersley sequences, where a permutation is applied to the digits when computing the radical inverse:
where is a permutation of the digits . Note that the same permutation is used for each digit, and the same permutation is used for generating all of the sample points in a given base . Figure 7.26 shows the effect of scrambling with the Halton sequence.
In the following, we will use random permutations, though specific constructions of permutations can give slightly better results; see the “Further Reading” section for more details.
The ComputeRadicalInversePermutations() function computes these random permutation tables. It initializes a single contiguous array for all of the permutations, where the first two values are a permutation of the integers zero and one for , the next three values are a permutation of for , and so forth for successive prime bases. At entry to the for loop below, p points to the start of the permutation array to initialize for the current prime base.
The total size of the permutation array is given by the sum of the prime numbers up to the end of a precomputed table of prime numbers.
Generating each permutation is easy: we just initialize p to the identity permutation for the current prime length and then randomly shuffle its values.
The ScrambledRadicalInverse() function is essentially the same as RadicalInverse() except that it puts each digit through the permutation table for the given base. See Exercise 7.9.3 for discussion of a more efficient implementation for the base-2 case, following RadicalInverse().
The implementation below also accounts for a special case that can arise when perm maps the digit 0 to a nonzero value. In this case, the iteration stops prematurely once a reaches 0, incorrectly missing an infinitely long suffix of digits with value perm. Fortunately, this is a geometric series with a simple analytic solution whose value is added in the last line.
7.4.2 Halton Sampler Implementation
The HaltonSampler generates sample vectors using the Halton sequence. Unlike the StratifiedSampler, it is fully deterministic; it uses no pseudo-random numbers in its operation. However, Halton samples can lead to aliasing if the image isn’t sufficiently well sampled. Figure 7.27 compares the results of sampling a checkerboard texture using a Halton-based sampler to using the stratified sampler from the previous section. Note the unpleasant pattern along edges in the foreground and toward the horizon.
The permutation tables for the scrambled radical inverses are shared across all HaltonSampler instances and are computed the first time the constructor runs. For pbrt’s requirements, this approach is fine: the current implementation only uses different sampler instances for different tiles of the image, where we’d like to always use the same permutations anyway. For other uses, it could be worthwhile to have more control over when different permutations are used.
The utility method PermutationForDimension() returns a pointer to the start of the permutation array for the given dimension.
To be able to quickly find the offset for a given dimension, it’s helpful to have the sums of the prime numbers preceding each prime.
To map the first two dimensions of samples from to pixel coordinates, the HaltonSampler finds the smallest scale factor that is larger than the lower of either the image resolution or kMaxResolution in each dimension. (We will see shortly how this specific choice of scales makes it easy to see which pixel a sample lands in.) After scaling, any samples outside the image extent will be simply ignored.
For images with resolution greater than kMaxResolution in one or both dimensions, a tile of Halton points is repeated across the image. This resolution limit helps maintain sufficient floating-point precision in the computed sample values.
For each dimension, baseScales holds the scale factor, or , and baseExponents holds the exponents and .
To see why the HaltonSampler uses this scheme to map samples to pixel coordinates, consider the effect of scaling a value computed with the radical inverse base by a factor . If the digits of expressed in base are , then recall that the radical inverse is the value base . If we multiply this value by , for example, we have the first two digits have moved to the left of the radix point, and the fractional component of the value starts with .
This operation—scaling by —forms the core of being able to determine which sample indices land in which pixels. Considering the first two digits in the above example, we can see that the integer component of the scaled value ranges from to and that as increases, its last two digits in base take on any particular value once in every values in this range.
Given a value , , we can find the first value that gives the value in the integer components. By definition, the digits of in base are . Thus, if and , then the scaled value of ’s radical inverse will have an integer component equal to .
Because the bases and used in the HaltonSampler for pixel samples are relatively prime, it follows that if the sample values are scaled by some , then any particular pixel in the range will be visited once every samples. This product is stored in sampleStride.
The sample index for the first Halton sample that lands in currentPixel is stored in offsetForCurrentPixel. After this offset has first been computed for the first sample in the current pixel, subsequent samples in the pixel are found at increments of sampleStride samples in the Halton sequence.
Computing the index of the first sample in a given pixel where the samples have been scaled by involves computing the inverse radical inverse of the last digits of in base 2, which we’ll denote by , and of the last digits of in base 3, . This gives us a system of equations
where the index that satisfies these equations is the index of a sample that lies within the given pixel, after scaling. We don’t include the code that solves for <<Compute Halton sample offset for currentPixel>> here in the book; see Grünschloß et al. (2012) for details of the algorithm used to find .
The computation of sample offsets doesn’t account for random digit permutations, so those aren’t included in the sample values computed here. Also, because the low baseExponents[i] digits of the first two dimensions are used to select which pixel is sampled, these digits must be discarded before computing the radical inverse for the first two dimensions of the sample vector, since the SampleDimension() method is supposed to return the fractional offset within the pixel being sampled. Higher dimensions are just sampled directly, including the random permutations.