Sampling and Integration

8.2 Sampling and Integration

The lighting integration algorithms used throughout pbrt are based on Monte Carlo integration, yet the focus of Section 8.1 was on sampling and reconstruction. That topic is an important one for understanding aliasing and the use of filters for image reconstruction, but it is a different one than minimizing Monte Carlo integration error. There are a number of connections between Monte Carlo and both Fourier analysis and other approaches to analyzing point-sampling algorithms, however. For example, jittered sampling is a form of stratified sampling, a variance reduction technique that was introduced in Section 2.2.1. Thus, we can see that jittered sampling is advantageous from both perspectives.

Given multiple perspectives on the problem, one might ask, what is the best sampling approach to use for Monte Carlo integration? There is no easy answer to this question, which is reflected by the fact that this chapter presents a total of 6 classes that implement the upcoming Sampler interface to generate sample points, though a number of them offer a few variations of an underlying sampling approach, giving a total of 17 different techniques.

Although some of this variety is for pedagogy, it is largely due to the fact that the question of which sampling technique is best is not easily answered. Not only do the various mathematical approaches to analyzing sampling techniques often disagree, but another difficulty comes from the human visual system: rendered images are generally for human consumption and most mathematical approaches for evaluating sampling patterns do not account for this fact. Later in this chapter, we will see that sampling patterns that lead to errors in the image having blue noise characteristics are visually preferable, yet may not have any lower numeric error than those that do not. Thus, pbrt provides a variety of options, allowing the user to make their own choice among them.

8.2.1 Fourier Analysis of Variance

Fourier analysis can also be applied to evaluate sampling patterns in the context of Monte Carlo integration, leading to insights about both variance and the convergence rates of various sampling algorithms. We will make three simplifications in our treatment of this topic here. There are more general forms of the theory that do not require these, though they are more complex. (As always, see the “Further Reading” section for more information.) We assume that:

The sample points are uniformly distributed and equally weighted (i.e., importance sampling is not being used).
The Monte Carlo estimator used is unbiased.
The properties of the sample points are homogeneous with respect to toroidal translation over the sampling domain. (If they are not, the analysis is effectively over all possible random translations of the sample points.)

Excluding importance sampling has obvious implications, though we note that the last assumption, homogeneity, is also significant. Many of the sampling approaches later in this chapter are based on decomposing the sampling domain into strata and placing a single sample in each one. Homogenizing such algorithms causes some of those regions to wrap around the boundaries of the domain, which harms their effectiveness. Equivalently, homogenization can be seen as toroidally translating the function being integrated, which can introduce discontinuities that were not previously present. Nevertheless, we will see that there is still much useful insight to be had about the behavior of sampling patterns in spite of these simplifications.

Our first step is to introduce the Fourier series representation of functions, which we will use as the basis for analysis of sampling patterns for the remainder of this section. The Fourier transform assumes that the function has infinite extent, while for rendering we are generally operating over the domain or on mappings from there to other finite domains such as the unit hemisphere. While it is tempting to apply the Fourier transform as is, defining to be zero outside the domain of interest, doing so introduces a discontinuity in the function at the boundaries that leads to error due to the Gibbs phenomenon in the Fourier coefficients. Fourier series are defined over a specific finite domain and so do not suffer from this problem.

The Fourier series represents a function using an infinite set of coefficients for all integer-valued . (We use to index coefficients in order to avoid confusion with the use of for the unit imaginary number.) For the domain, the coefficients are given by

(8.8)

(Domains other than can be handled using a straightforward reparameterization.)

Expressed using the Fourier series coefficients, the original function is

(8.9)

It can be shown that the continuous Fourier transform corresponds to the limit of taking the Fourier series with an infinite extent.

The PSD of a function in the Fourier series basis is given by the product of each coefficient with its complex conjugate,

In order to analyze Monte Carlo integration in frequency space, we will start by defining the sampling function for a set of sample points as the averaged sum of samples, each represented by a delta distribution,

Given the sampling function, it is possible to rewrite the Monte Carlo estimator as an integral:

(8.10)

It may seem like we are moving backward: after all, the point of Monte Carlo integration is to transform integrals into sums. However, this transformation is key to being able to apply the Fourier machinery to the problem.

If we substitute the Fourier series expansion of Equation (8.9) into Equation (8.10), we can find that

From the definition of the Fourier series coefficients, we know that . Furthermore, from the definition of and the assumption of uniform and unweighted samples. Therefore, the error in the Monte Carlo estimate is given by

(8.11)

where denotes the set of all integers except for zero.

Equation (8.11) is the key result that gives us insight about integration error. It is worth taking the time to understand and to consider the implications of it. For example, if is band limited, then for all after some value . In that case, if ’s sampling rate is at least equal to ’s highest frequency, then for all and a zero variance estimator is the result. Only half the sampling rate is necessary for perfect integration compared to what is needed for perfect reconstruction!

Using Equation (8.11) with the definition of variance, it can be shown that the variance of the estimator is given by the sum of products of ’s and ’s PSDs:

(8.12)

This gives a clear direction about how to reduce variance: it is best if the power spectrum of the sampling pattern is low where the function’s power spectrum is high. In rendering, the function is generally not available analytically, let alone in the Fourier series basis, so we follow the usual expectation that the function has most of its frequency content in lower frequencies. This assumption argues for a sampling pattern with its energy concentrated in higher frequencies and minimal energy in the lower frequencies—precisely the same blue noise criterion that we earlier saw was effective for antialiasing.

An insight that directly follows from Equation (8.12) is that with uniform random sampling (i.e., white noise), is the constant , which leads to the variance of

which is the same variance of the Monte Carlo estimator that was derived earlier using different means in Section 2.1.4. More generally, if the PSD for a sampling technique can be asymptotically bounded, it can be shown that the technique exhibits a higher rate of variance reduction given a suitable function being integrated. One example is that in 2D, a jittered sampling pattern can achieve variance, given a smooth integrand.

Fourier analysis has also revealed that Poisson disk sampling patterns have unexpectedly bad asymptotic convergence. Poisson disk point sets are constructed such that no two points can be closer than some minimum distance (see Figure 8.16). For many years, they were believed to be superior to jittered patterns. The Poisson disk criterion is an appealing one, as it prohibits multiple samples from clumping close together, as is possible with adjacent jittered samples.

Figure 8.16: 256 sample points distributed using (a) a jittered distribution, and (b) a Poisson disk distribution. Poisson disk point sets combine some randomness in the locations of the points with some structure from no two of them being too close together.

Part of the appeal of Poisson disk patterns is that initially they seem to have superior blue noise characters to jittered patterns, with a much larger range of frequencies around the origin where the PSD is low. Figure 8.17 shows the PSDs of 2D jittered and Poisson disk sample points. Both feature a spike at the origin, a ring of low energy around it, and then a transition to fairly equal-energy noise at higher frequencies.

Figure 8.17: PSDs of (a) jittered and (b) Poisson disk–distributed sample points. The origin with the central spike is at the center of each image.

Figure 8.18: Radially averaged PSDs of (a) jittered and (b) Poisson disk–distributed sample points.

Radially averaged plots of the distribution of energy in these PSDs, however, makes their behavior in the low frequencies more clear; see Figure 8.18. We can see that although the Poisson disk pattern has low energy for a larger range of frequencies than the jittered pat- tern, its PSD retains a small amount of energy all the way until 0, while the jittered pattern does not.

Using Fourier analysis of variance, it can be shown that due to this lingering energy, the variance when using Poisson disk sampling is never any better than —worse than jittered points for some integrands. (Though remember that these are asymptotic bounds, and that for small , Poisson disk–distributed points may give lower variance.) Nevertheless, the poor asymptotic convergence for what seems like it should be an effective sampling approach was a surprise, and points to the value of this form of analysis.

8.2.2 Low Discrepancy and Quasi Monte Carlo

Outside of Fourier analysis, another useful approach for evaluating the quality of sample points is based on a concept called discrepancy. Well-distributed sampling patterns have low discrepancy, and thus the sample pattern generation problem can be considered to be one of finding a suitable pattern of points with low discrepancy.

In discussing the discrepancy of sample points, we will draw a distinction between sample sets, which are a specific number of points, and sample sequences, which are defined by an algorithm that can generate an arbitrary number of points. For a fixed number of samples, it is generally possible to distribute points in a sample set slightly better than the same number of points in a sample sequence. However, sequences can be especially useful with adaptive sampling algorithms, thanks to their flexibility in the number of points they generate.

The basic idea of discrepancy is that the quality of a set of points in a -dimensional space can be evaluated by looking at regions of the domain , counting the number of points inside each region, and comparing the volume of each region to the number of sample points inside. In general, a given fraction of the volume should have roughly the same fraction of the total number of sample points inside of it. While it is not possible for this always to be the case, we can still try to use patterns that minimize the maximum difference between the actual volume and the volume estimated by the points (the discrepancy). Figure 8.19 shows an example of the idea in two dimensions.

Figure 8.19: The discrepancy of a box (shaded) given a set of 2D sample points in . One of the four sample points is inside the box, so this set of points would estimate the box’s area to be . The true area of the box is , so the discrepancy for this particular box is . In general, we are interested in finding the maximum discrepancy of all possible boxes (or some other shape).

To compute the discrepancy of a set of points, we first pick a family of shapes that are subsets of . For example, boxes with one corner at the origin are often used. This corresponds to

where . Given a set of sample points , the discrepancy of with respect to is

(8.13)

where is the number of points in and is the volume of .

The intuition for why Equation (8.13) is a reasonable measure of quality is that the value is an approximation of the volume of the box given by the particular points . Therefore, the discrepancy is the worst error over all possible boxes from this way of approximating the volume. When the set of shapes is the set of boxes with a corner at the origin, this value is called the star discrepancy, . Another popular option for is the set of all axis-aligned boxes, where the restriction that one corner be at the origin has been removed.

For some point sets, the discrepancy can be computed analytically. For example, consider the set of points in one dimension

We can see that the star discrepancy of is

For example, take the interval . Then , but . This interval (and the intervals , etc.) is the interval where the largest differences between volume and fraction of points inside the volume are seen.

The star discrepancy of this point set can be improved by modifying it slightly:

(8.14)

Then

The bounds for the star discrepancy of a sequence of points in one dimension have been shown to be

Thus, the earlier set from Equation (8.14) has the lowest possible discrepancy for a sequence in 1D. In general, it is much easier to analyze and compute bounds for the discrepancy of sequences in 1D than for those in higher dimensions. When it is not possible to derive the discrepancy of a sampling technique analytically, it can be estimated numerically by constructing a large number of shapes , computing their discrepancy, and reporting the maximum value found.

The astute reader will notice that according to the discrepancy measure, the uniform sequence in 1D is optimal, but Fourier analysis indicated that jittering was superior to uniform sampling. Fortunately, low-discrepancy patterns in higher dimensions are much less uniform than they are in one dimension and thus usually work reasonably well as sample patterns in practice. Nevertheless, their underlying uniformity means that low-discrepancy patterns can be more prone to visually objectionable aliasing than patterns with pseudo-random variation.

A -dimensional sequence of points is said to have low discrepancy if its discrepancy is of the order

(8.15)

These bounds are the best that are known for arbitrary .

Low-discrepancy point sets and sequences are often generated using deterministic algorithms; we will see a number of such algorithms in Sections 8.6 and 8.7. Using such points to sample functions for integration brings us to quasi–Monte Carlo (QMC) methods. Many of the techniques used in regular Monte Carlo algorithms can be shown to work equally well with such quasi-random sample points.

The Koksma–Hlawka inequality relates the discrepancy of a set of points used for integration to the error of an estimate of the integral of a function . It is:

(8.16)

where is the total variation of the function being integrated. It is defined as

over all partitions of the domain at points . In essence, the total variation represents how quickly the function’s value ever changes between points, and the discrepancy represents how effective the points used for integration are at catching the function’s variation.

Given the definition of low discrepancy from Equation (8.15), we can see from the Koksma–Hlawka inequality that as the dimensionality of the integrand increases, the integration error with low discrepancy approaches , which is asymptotically much better than the error from Monte Carlo integration (Section 2.1.4). Note also that these error bounds are asymptotic; in practice, QMC usually has an even better rate of convergence.

However, because QMC integration is deterministic, it is not possible to use variance as a measure of an estimator’s quality, though of course one can still compute the mean squared error. Alternatively, the sample points can be randomized using approaches that are carefully designed not to harm their discrepancy. We will see later in the chapter that randomization can even lead to improved rates of convergence. Such approaches are randomized quasi–Monte Carlo (RQMC) methods and again allow the use of variance. RQMC is the foundation of most of pbrt’s Monte Carlo integration algorithms.

In most of this text, we have glossed over the differences between Monte Carlo, QMC, and RQMC, and have localized the choice among them in the Samplers in this chapter. Doing so introduces the possibility of subtle errors if a Sampler generates quasi-random sample points that an Integrator then improperly uses as part of an implementation of an algorithm that is not suitable for quasi Monte Carlo, though none of the integrators described in the text do so.