## 8.2 Sampling and Integration

The lighting integration algorithms used throughout `pbrt` are based on
Monte Carlo integration, yet the focus of Section 8.1
was on sampling and reconstruction. That topic is an important one for
understanding aliasing and the use of filters for image reconstruction, but
it is a different one than minimizing Monte Carlo integration error. There
are a number of connections between Monte Carlo and both Fourier analysis
and other approaches to analyzing point-sampling algorithms, however.
For example, jittered sampling is a form of stratified sampling,
a variance reduction technique that was introduced in
Section 2.2.1. Thus, we can see that
jittered sampling is advantageous from both perspectives.

Given multiple perspectives on the problem, one might ask, what is the best
sampling approach to use for Monte Carlo integration? There is no easy
answer to this question, which is reflected by the fact that this chapter
presents a total of 6 classes that implement the upcoming `Sampler`
interface to generate sample points, though a number of them offer a few
variations of an underlying sampling approach, giving a total of 17
different techniques.

Although some of this variety is for pedagogy, it is largely due to the
fact that the question of which sampling technique is best is not easily
answered. Not only do the various mathematical approaches to analyzing
sampling techniques often disagree,
but another difficulty comes from the human
visual system: rendered images are generally for human consumption and most
mathematical approaches for evaluating sampling patterns do not account for
this fact. Later in this chapter, we will see that sampling patterns that
lead to errors in the image having blue noise characteristics are visually
preferable, yet may not have any lower numeric error than those that do
not. Thus, `pbrt` provides a variety of options, allowing the user to make
their own choice among them.

### 8.2.1 Fourier Analysis of Variance

Fourier analysis can also be applied to evaluate sampling patterns in the context of Monte Carlo integration, leading to insights about both variance and the convergence rates of various sampling algorithms. We will make three simplifications in our treatment of this topic here. There are more general forms of the theory that do not require these, though they are more complex. (As always, see the “Further Reading” section for more information.) We assume that:

- The sample points are uniformly distributed and equally weighted (i.e., importance sampling is not being used).
- The Monte Carlo estimator used is unbiased.
- The properties of the sample points are homogeneous with respect to toroidal translation over the sampling domain. (If they are not, the analysis is effectively over all possible random translations of the sample points.)

Excluding importance sampling has obvious implications, though we note that the last assumption, homogeneity, is also significant. Many of the sampling approaches later in this chapter are based on decomposing the sampling domain into strata and placing a single sample in each one. Homogenizing such algorithms causes some of those regions to wrap around the boundaries of the domain, which harms their effectiveness. Equivalently, homogenization can be seen as toroidally translating the function being integrated, which can introduce discontinuities that were not previously present. Nevertheless, we will see that there is still much useful insight to be had about the behavior of sampling patterns in spite of these simplifications.

Our first step is to introduce the *Fourier series* representation of
functions, which we will use as the basis for analysis of sampling patterns
for the remainder of this section. The Fourier transform assumes that the
function has infinite extent, while for rendering we are generally
operating over the domain or on mappings from there to other finite
domains such as the unit hemisphere. While it is tempting to apply the
Fourier transform as is, defining to be zero outside the domain of
interest, doing so introduces a discontinuity in the function at the
boundaries that leads to error due to the Gibbs phenomenon in the Fourier
coefficients. Fourier series are defined over a specific finite domain and
so do not suffer from this problem.

The Fourier series represents a function using an infinite set of coefficients for all integer-valued . (We use to index coefficients in order to avoid confusion with the use of for the unit imaginary number.) For the domain, the coefficients are given by

(Domains other than can be handled using a straightforward reparameterization.)

Expressed using the Fourier series coefficients, the original function is

It can be shown that the continuous Fourier transform corresponds to the limit of taking the Fourier series with an infinite extent.

The PSD of a function in the Fourier series basis is given by the product of each coefficient with its complex conjugate,

In order to analyze Monte Carlo integration in frequency space, we will start by defining the sampling function for a set of sample points as the averaged sum of samples, each represented by a delta distribution,

Given the sampling function, it is possible to rewrite the Monte Carlo estimator as an integral:

It may seem like we are moving backward: after all, the point of Monte Carlo integration is to transform integrals into sums. However, this transformation is key to being able to apply the Fourier machinery to the problem.

If we substitute the Fourier series expansion of Equation (8.9) into Equation (8.10), we can find that

From the definition of the Fourier series coefficients, we know that . Furthermore, from the definition of and the assumption of uniform and unweighted samples. Therefore, the error in the Monte Carlo estimate is given by

where denotes the set of all integers except for zero.

Equation (8.11) is the key result that gives us insight about integration error. It is worth taking the time to understand and to consider the implications of it. For example, if is band limited, then for all after some value . In that case, if ’s sampling rate is at least equal to ’s highest frequency, then for all and a zero variance estimator is the result. Only half the sampling rate is necessary for perfect integration compared to what is needed for perfect reconstruction!

Using Equation (8.11) with the definition of variance, it can be shown that the variance of the estimator is given by the sum of products of ’s and ’s PSDs:

This gives a clear direction about how to reduce variance: it is best if the power spectrum of the sampling pattern is low where the function’s power spectrum is high. In rendering, the function is generally not available analytically, let alone in the Fourier series basis, so we follow the usual expectation that the function has most of its frequency content in lower frequencies. This assumption argues for a sampling pattern with its energy concentrated in higher frequencies and minimal energy in the lower frequencies—precisely the same blue noise criterion that we earlier saw was effective for antialiasing.

An insight that directly follows from Equation (8.12) is that with uniform random sampling (i.e., white noise), is the constant , which leads to the variance of

which is the same variance of the Monte Carlo estimator that was derived earlier using different means in Section 2.1.4. More generally, if the PSD for a sampling technique can be asymptotically bounded, it can be shown that the technique exhibits a higher rate of variance reduction given a suitable function being integrated. One example is that in 2D, a jittered sampling pattern can achieve variance, given a smooth integrand.

Fourier analysis has also revealed that *Poisson disk* sampling
patterns have unexpectedly bad asymptotic convergence. Poisson disk point
sets are constructed such that no two points can be closer than some
minimum distance (see
Figure 8.16). For many years, they
were believed
to be superior to jittered patterns. The Poisson disk criterion is an
appealing one, as it prohibits multiple samples from clumping close together, as
is possible with adjacent jittered samples.

Part of the appeal of Poisson disk patterns is that initially they seem to have superior blue noise characters to jittered patterns, with a much larger range of frequencies around the origin where the PSD is low. Figure 8.17 shows the PSDs of 2D jittered and Poisson disk sample points. Both feature a spike at the origin, a ring of low energy around it, and then a transition to fairly equal-energy noise at higher frequencies.

Radially averaged plots of the distribution of energy in these PSDs, however, makes their behavior in the low frequencies more clear; see Figure 8.18. We can see that although the Poisson disk pattern has low energy for a larger range of frequencies than the jittered pat- tern, its PSD retains a small amount of energy all the way until 0, while the jittered pattern does not.

Using Fourier analysis of variance, it can be shown that due to this lingering energy, the variance when using Poisson disk sampling is never any better than —worse than jittered points for some integrands. (Though remember that these are asymptotic bounds, and that for small , Poisson disk–distributed points may give lower variance.) Nevertheless, the poor asymptotic convergence for what seems like it should be an effective sampling approach was a surprise, and points to the value of this form of analysis.

### 8.2.2 Low Discrepancy and Quasi Monte Carlo

Outside of Fourier analysis, another useful approach for evaluating the
quality of sample points is based on a concept called *discrepancy*.
Well-distributed sampling patterns have low discrepancy, and thus the
sample pattern generation problem can be considered to be one of finding a
suitable pattern of points with low discrepancy.

In discussing the discrepancy of sample points, we will draw a distinction
between *sample sets*, which are a specific number of points, and
*sample sequences*, which are defined by an algorithm that can
generate an arbitrary number of points. For a fixed number of samples, it
is generally possible to distribute points in a sample set slightly better
than the same number of points in a sample sequence. However, sequences
can be especially useful with adaptive sampling algorithms, thanks to their
flexibility in the number of points they generate.

The basic idea of discrepancy is that the quality of a set of points in
a -dimensional space can be evaluated by looking at regions
of the domain , counting the number of points inside each region,
and comparing the volume of each region to the number of sample points
inside. In general, a given fraction of the volume should have roughly the
same fraction of the total number of sample points inside of it. While
it is not possible for this always to be the case, we can still try to use
patterns that minimize the maximum difference between the actual volume and
the volume estimated by the points (the *discrepancy*).
Figure 8.19 shows an example of the idea in two
dimensions.

To compute the discrepancy of a set of points, we first pick a family of shapes that are subsets of . For example, boxes with one corner at the origin are often used. This corresponds to

where . Given a set of sample points , the discrepancy of with respect to is

where is the number of points in and is the volume of .

The intuition for why Equation (8.13) is a reasonable measure
of quality is that the value is an approximation of the
volume of the box given by the particular points . Therefore, the
discrepancy is the worst error over all possible boxes from this way of
approximating the volume. When the set of shapes is the set of boxes with
a corner at the origin, this value is called the *star discrepancy*,
. Another popular option for is the set of all axis-aligned
boxes, where the restriction that one corner be at the origin has been
removed.

For some point sets, the discrepancy can be computed analytically. For example, consider the set of points in one dimension

We can see that the star discrepancy of is

For example, take the interval . Then , but . This interval (and the intervals , etc.) is the interval where the largest differences between volume and fraction of points inside the volume are seen.

The star discrepancy of this point set can be improved by modifying it slightly:

Then

The bounds for the star discrepancy of a sequence of points in one dimension have been shown to be

Thus, the earlier set from Equation (8.14) has the lowest possible discrepancy for a sequence in 1D. In general, it is much easier to analyze and compute bounds for the discrepancy of sequences in 1D than for those in higher dimensions. When it is not possible to derive the discrepancy of a sampling technique analytically, it can be estimated numerically by constructing a large number of shapes , computing their discrepancy, and reporting the maximum value found.

The astute reader will notice that according to the discrepancy measure, the uniform sequence in 1D is optimal, but Fourier analysis indicated that jittering was superior to uniform sampling. Fortunately, low-discrepancy patterns in higher dimensions are much less uniform than they are in one dimension and thus usually work reasonably well as sample patterns in practice. Nevertheless, their underlying uniformity means that low-discrepancy patterns can be more prone to visually objectionable aliasing than patterns with pseudo-random variation.

A -dimensional sequence of points is said to have *low discrepancy*
if its discrepancy is of the order

These bounds are the best that are known for arbitrary .

Low-discrepancy point sets and sequences are often generated using
deterministic algorithms; we will see a number of such algorithms in
Sections 8.6 and 8.7. Using such
points to sample functions for integration brings us to
*quasi–Monte Carlo* (QMC) methods.
Many of the techniques used in regular Monte Carlo algorithms can
be shown to work equally well with such *quasi-random* sample points.

The *Koksma–Hlawka inequality* relates the discrepancy of a set of
points used for integration to the error of an estimate of the
integral of a function . It is:

where is the *total variation* of the function
being integrated. It is defined as

over all partitions of the domain at points . In essence, the total variation represents how quickly the function’s value ever changes between points, and the discrepancy represents how effective the points used for integration are at catching the function’s variation.

Given the definition of low discrepancy from Equation (8.15), we can see from the Koksma–Hlawka inequality that as the dimensionality of the integrand increases, the integration error with low discrepancy approaches , which is asymptotically much better than the error from Monte Carlo integration (Section 2.1.4). Note also that these error bounds are asymptotic; in practice, QMC usually has an even better rate of convergence.

However, because QMC integration is deterministic, it is not possible to
use variance as a measure of an estimator’s quality, though of course one
can still compute the mean squared error. Alternatively, the sample points
can be randomized using approaches that are carefully designed not to harm
their discrepancy. We will see later in the chapter that randomization can
even lead to improved rates of convergence. Such approaches are
*randomized quasi–Monte Carlo* (RQMC) methods and again allow the use
of variance. RQMC is the foundation of most of `pbrt`’s Monte Carlo
integration algorithms.

In most of this text, we have glossed over the differences between Monte
Carlo, QMC, and RQMC, and have localized the choice among them in the
`Sampler`s in this chapter. Doing so introduces the possibility of
subtle errors if a `Sampler` generates quasi-random sample points that
an `Integrator` then improperly uses as part of an implementation of an
algorithm that is not suitable for quasi Monte Carlo, though none of the
integrators described in the text do so.