6.8 Managing Rounding Error

Thus far, we have been discussing ray–shape intersection algorithms with respect to idealized arithmetic operations based on the real numbers. This approach has gotten us far, although the fact that computers can only represent finite quantities and therefore cannot actually represent all the real numbers is important. In place of real numbers, computers use floating-point numbers, which have fixed storage requirements. However, error may be introduced each time a floating-point operation is performed, since the result may not be representable in the designated amount of memory.

The accumulation of this error has several implications for the accuracy of intersection tests. First, it is possible that it will cause valid intersections to be missed completely—for example, if a computed intersection’s t value is negative even though the precise value is positive. Furthermore, computed ray–shape intersection points may be above or below the actual surface of the shape. This leads to a problem: when new rays are traced starting from computed intersection points for shadow rays and reflection rays, if the ray origin is below the actual surface, we may find an incorrect reintersection with the surface. Conversely, if the origin is too far above the surface, shadows and reflections may appear detached. (See Figure 6.38.)

Figure 6.38: Geometric Settings for Rounding-Error Issues That Can Cause Visible Errors in Images. The incident ray on the left intersects the surface. On the left, the computed intersection point (black circle) is slightly below the surface and a too-low “epsilon” offsetting the origin of the shadow ray leads to an incorrect self-intersection, as the shadow ray origin (white circle) is still below the surface; thus the light is incorrectly determined to be occluded. On the right, a too-high “epsilon” causes a valid intersection to be missed as the ray’s origin is past the occluding surface.

Typical practice to address this issue in ray tracing is to offset spawned rays by a fixed “ray epsilon” value, ignoring any intersections along the ray normal p Subscript Baseline plus t bold d closer than some t Subscript normal m normal i normal n value. Figure 6.39 shows why this approach requires fairly high t Subscript normal m normal i normal n values to work effectively: if the spawned ray is oblique to the surface, incorrect ray intersections may occur quite some distance from the ray origin. Unfortunately, large t Subscript normal m normal i normal n values cause ray origins to be relatively far from the original intersection points, which in turn can cause valid nearby intersections to be missed, leading to loss of fine detail in shadows and reflections.

Figure 6.39: If the computed intersection point (filled circle) is below the surface and the spawned ray is oblique, incorrect reintersections may occur some distance from the ray origin (open circle). If a minimum t value along the ray is used to discard nearby intersections, a relatively large t Subscript normal m normal i normal n is needed to handle oblique rays well.

In this section, we will introduce the ideas underlying floating-point arithmetic and describe techniques for analyzing the error in floating-point computations. We will then apply these methods to the ray–shape algorithms introduced earlier in this chapter and show how to compute ray intersection points with bounded error. This will allow us to conservatively position ray origins so that incorrect self-intersections are never found, while keeping ray origins extremely close to the actual intersection point so that incorrect misses are minimized. In turn, no additional “ray epsilon” values are needed.

6.8.1 Floating-Point Arithmetic

Computation must be performed on a finite representation of numbers that fits in a finite amount of memory; the infinite set of real numbers cannot be represented on a computer. One such finite representation is fixed point, where given a 16-bit integer, for example, one might map it to positive real numbers by dividing by 256. This would allow us to represent the range left-bracket 0 comma 65535 slash 256 right-bracket equals left-bracket 0 comma 255 plus 255 slash 256 right-bracket with equal spacing of 1 slash 256 between values. Fixed-point numbers can be implemented efficiently using integer arithmetic operations (a property that made them popular on early PCs that did not support floating-point computation), but they suffer from a number of shortcomings; among them, the maximum number they can represent is limited, and they are not able to accurately represent very small numbers near zero.

An alternative representation for real numbers on computers is floating-point numbers. These are based on representing numbers with a sign, a significand, and an exponent: essentially, the same representation as scientific notation but with a fixed number of digits devoted to significand and exponent. (In the following, we will assume base-2 digits exclusively.) This representation makes it possible to represent and perform computations on numbers with a wide range of magnitudes while using a fixed amount of storage.

Programmers using floating-point arithmetic are generally aware that floating-point values may be inaccurate; this understanding sometimes leads to a belief that floating-point arithmetic is unpredictable. In this section we will see that floating-point arithmetic has a carefully designed foundation that in turn makes it possible to compute conservative bounds on the error introduced in a particular computation. For ray-tracing calculations, this error is often surprisingly small.

Modern CPUs and GPUs nearly ubiquitously implement a model of floating-point arithmetic based on a standard promulgated by the Institute of Electrical and Electronics Engineers (1985, 2008). (Henceforth when we refer to floats, we will specifically be referring to 32-bit floating-point numbers as specified by IEEE 754.) The IEEE 754 technical standard specifies the format of floating-point numbers in memory as well as specific rules for precision and rounding of floating-point computations; it is these rules that make it possible to reason rigorously about the error present in a computed floating-point value.

Floating-Point Representation

The IEEE standard specifies that 32-bit floats are represented with a sign bit, 8 bits for the exponent, and 23 bits for the significand. The exponent stored in a float ranges from 0 to 255. We will denote it by e Subscript normal b , with the subscript indicating that it is biased; the actual exponent used in computation, e , is computed as

e equals e Subscript normal b Baseline minus 127 period

The significand actually has 24 bits of precision when a normalized floating-point value is stored. When a number expressed with significand and exponent is normalized, there are no leading 0s in the significand. In binary, this means that the leading digit of the significand must be one; in turn, there is no need to store this value explicitly. Thus, the implicit leading 1 digit with the 23 digits encoding the fractional part of the significand gives a total of 24 bits of precision.

Given a sign s equals plus-or-minus 1 , significand m , and biased exponent e Subscript normal b , the corresponding floating-point value is

s times 1 period m times 2 Superscript e Super Subscript normal b Superscript minus 127 Baseline period

For example, with a normalized significand, the floating-point number 6.5 is written as 1.101 Subscript 2 Baseline times 2 squared , where the 2 subscript denotes a base-2 value. (If non-whole binary numbers are not immediately intuitive, note that the first number to the right of the radix point contributes 2 Superscript negative 1 Baseline equals 1 slash 2 , and so forth.) Thus, we have

left-parenthesis 1 times 2 Superscript 0 Baseline plus 1 times 2 Superscript negative 1 Baseline plus 0 times 2 Superscript negative 2 Baseline plus 1 times 2 Superscript negative 3 Baseline right-parenthesis times 2 squared equals 1.625 times 2 squared equals 6.5 period

e equals 2 , so e Subscript normal b Baseline equals 129 equals 10000001 Subscript 2 and m equals 10100000000000000000000 Subscript 2 .

Floats are laid out in memory with the sign bit at the most significant bit of the 32-bit value (with negative signs encoded with a 1 bit), then the exponent, and the significand. Thus, for the value 6.5 the binary in-memory representation of the value is

0 10000001 10100000000000000000000 equals 40 normal d Baseline 00000 Subscript 16 Baseline period

Similarly, the floating-point value 1.0 has m equals 0 ellipsis 0 Subscript 2 and e equals 0 , so e Subscript normal b Baseline equals 127 equals 01111111 Subscript 2 and its binary representation is:

0 01111111 00000000000000000000000 equals 3 normal f Baseline 800000 Subscript 16 Baseline period

This hexadecimal number is a value worth remembering, as it often comes up in memory dumps when debugging graphics programs.

An implication of this representation is that the spacing between representable floats between two adjacent powers of two is uniform throughout the range. (It corresponds to increments of the significand bits by one.) In a range left-bracket 2 Superscript e Baseline comma 2 Superscript e plus 1 Baseline right-parenthesis , the spacing is

2 Superscript e minus 23 Baseline period

Thus, for floating-point numbers between 1 and 2, e equals 0 , and the spacing between floating-point values is 2 Superscript negative 23 Baseline almost-equals 1.19209 ellipsis times 10 Superscript negative 7 . This spacing is also referred to as the magnitude of a unit in last place (“ulp”); note that the magnitude of an ulp is determined by the floating-point value that it is with respect to—ulps are relatively larger at numbers with larger magnitudes than they are at numbers with smaller magnitudes.

As we have described the representation so far, it is impossible to exactly represent zero as a floating-point number. This is obviously an unacceptable state of affairs, so the minimum exponent e Subscript normal b Baseline equals 0 , or e equals negative 127 , is set aside for special treatment. With this exponent, the floating-point value is interpreted as not having the implicit leading 1 bit in the significand, which means that a significand of all 0 bits results in

s times 0.0 midline-horizontal-ellipsis 0 Subscript 2 Baseline times 2 Superscript negative 127 Baseline equals 0 period

Eliminating the leading 1 significand bit also makes it possible to represent denormalized numbers: if the leading 1 was always present, then the smallest 32-bit float would be

1.0 midline-horizontal-ellipsis 0 Subscript 2 Baseline times 2 Superscript negative 127 Baseline almost-equals 5.8774718 times 10 Superscript negative 39 Baseline period

Without the leading 1 bit, the minimum value is

0.00 midline-horizontal-ellipsis 1 Subscript 2 Baseline times 2 Superscript negative 126 Baseline equals 2 Superscript negative 23 Baseline times 2 Superscript negative 126 Baseline almost-equals 1.4012985 times 10 Superscript negative 45 Baseline period

(The negative 126 exponent is used because denormalized numbers are encoded with e Subscript normal b Baseline equals 0 but are interpreted as if e Subscript normal b Baseline equals 1 so that there is no excess gap between them and the adjacent smallest regular floating-point number.) Providing some capability to represent these small values can make it possible to avoid needing to round very small values to zero.

Note that there is both a “positive” and “negative” zero value with this representation. This detail is mostly transparent to the programmer. For example, the standard guarantees that the comparison -0.0 == 0.0 evaluates to true, even though the in-memory representations of these two values are different. Conveniently, a floating-point zero value with an unset sign bit is represented by the value 0 in memory.

The maximum exponent, e Subscript normal b Baseline equals 255 , is also reserved for special treatment. Therefore, the largest regular floating-point value that can be represented has e Subscript normal b Baseline equals 254 (or e equals 127 ) and is approximately

3.402823 ellipsis times 10 Superscript 38 Baseline period

With e Subscript normal b Baseline equals 255 , if the significand bits are all 0, the value corresponds to positive or negative infinity, according to the sign bit. Infinite values result when performing computations like 1 slash 0 in floating point, for example. Arithmetic operations with infinity and a noninfinite value usually result in infinity, though dividing a finite value by infinity gives 0. For comparisons, positive infinity is larger than any noninfinite value and similarly for negative infinity.

The Infinity constant is initialized to be the “infinity” floating-point value. We make it available in a separate constant so that code that uses its value does not need to use the wordy C++ standard library call.

<<Floating-point Constants>>= 
static constexpr Float Infinity = std::numeric_limits<Float>::infinity();

With e Subscript normal b Baseline equals 255 , nonzero significand bits correspond to special “not a number” (NaN) values, which result from invalid operations like taking the square root of a negative number or trying to compute 0 slash 0 . NaNs propagate through computations: any arithmetic operation where one of the operands is a NaN itself always returns NaN. Thus, if a NaN emerges from a long chain of computations, we know that something went awry somewhere along the way. In debug builds, pbrt has many assertion statements that check for NaN values, as we almost never expect them to come up in the regular course of events. Any comparison with a NaN value returns false; thus, checking for !(x == x) serves to check if a value is not a number.

By default, the majority of floating-point computation in pbrt uses 32-bit floats. However, as discussed in Section 1.3.3, it is possible to configure it to use 64-bit double-precision values instead. In addition to the sign bit, doubles allocate 11 bits to the exponent and 52 to the significand. pbrt also supports 16-bit floats (which are known as halfs) as an in-memory representation for floating-point values stored at pixels in images. Halfs use 5 bits for the exponent and 10 for the significand. (A convenience Half class, not discussed further in the text, provides capabilities for working with halfs and converting to and from 32-bit floats.)

Arithmetic Operations

IEEE 754 provides important guarantees about the properties of floating-point arithmetic: specifically, it guarantees that addition, subtraction, multiplication, division, and square root give the same results given the same inputs and that these results are the floating-point number that is closest to the result of the underlying computation if it had been performed in infinite-precision arithmetic. It is remarkable that this is possible on finite-precision digital computers at all; one of the achievements in IEEE 754 was the demonstration that this level of accuracy is possible and can be implemented fairly efficiently in hardware.

Using circled operators to denote floating-point arithmetic operations and monospace s monospace q monospace r monospace t for floating-point square root, these accuracy guarantees can be written as:

StartLayout 1st Row 1st Column a circled-plus b 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a plus b right-parenthesis 2nd Row 1st Column a minus b 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a minus b right-parenthesis 3rd Row 1st Column a circled-times b 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a asterisk b right-parenthesis 4th Row 1st Column a circled-division-slash b 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a slash b right-parenthesis 5th Row 1st Column monospace s monospace q monospace r monospace t monospace left-parenthesis monospace a monospace right-parenthesis 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis StartRoot a EndRoot right-parenthesis 6th Row 1st Column monospace upper F monospace upper M monospace upper A monospace left-parenthesis monospace a monospace comma monospace b monospace comma monospace c monospace right-parenthesis 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a asterisk b plus c right-parenthesis EndLayout

where normal r normal o normal u normal n normal d left-parenthesis x right-parenthesis indicates the result of rounding a real number to the closest floating-point value and where monospace upper F monospace upper M monospace upper A denotes the fused multiply add operation, which only rounds once. It thus gives better accuracy than computing left-parenthesis a circled-times b right-parenthesis circled-plus c .

This bound on the rounding error can also be represented with an interval of real numbers: for example, for addition, we can say that the rounded result is within an interval

StartLayout 1st Row 1st Column a circled-plus b 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a plus b right-parenthesis element-of left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus-or-minus epsilon right-parenthesis 2nd Row 1st Column Blank 2nd Column equals left-bracket left-parenthesis a plus b right-parenthesis left-parenthesis 1 minus epsilon right-parenthesis comma left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus epsilon right-parenthesis right-bracket EndLayout

for some epsilon . The amount of error introduced from this rounding can be no more than half the floating-point spacing at a plus b —if it was more than half the floating-point spacing, then it would be possible to round to a different floating-point number with less error (Figure 6.40).

Figure 6.40: The IEEE standard specifies that floating-point calculations must be implemented as if the calculation was performed with infinite-precision real numbers and then rounded to the nearest representable float. Here, an infinite-precision result in the real numbers is denoted by a filled dot, with the representable floats around it denoted by ticks on a number line. We can see that the error introduced by rounding to the nearest float, delta , can be no more than half the spacing between floats.

For 32-bit floats, we can bound the floating-point spacing at a plus b from above using Equation (6.18) (i.e., an ulp at that value) by left-parenthesis a plus b right-parenthesis 2 Superscript negative 23 , so half the spacing is bounded from above by left-parenthesis a plus b right-parenthesis 2 Superscript negative 24 and so StartAbsoluteValue epsilon EndAbsoluteValue less-than-or-equal-to 2 Superscript negative 24 . This bound is the machine epsilon. For 32-bit floats, epsilon Subscript normal m Baseline equals 2 Superscript negative 24 Baseline almost-equals 5.960464 ellipsis times 10 Superscript negative 8 .

<<Floating-point Constants>>+=  
static constexpr Float MachineEpsilon = std::numeric_limits<Float>::epsilon() * 0.5;

Thus, we have

StartLayout 1st Row 1st Column a circled-plus b 2nd Column equals normal r normal o normal u normal n normal d left-parenthesis a plus b right-parenthesis element-of left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis 2nd Row 1st Column Blank 2nd Column equals left-bracket left-parenthesis a plus b right-parenthesis left-parenthesis 1 minus epsilon Subscript normal m Baseline right-parenthesis comma left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus epsilon Subscript normal m Baseline right-parenthesis right-bracket period EndLayout

Analogous relations hold for the other arithmetic operators and the square root operator.

A number of useful properties follow directly from Equation (6.19). For a floating-point number x ,

  • 1 circled-times x equals x .
  • x circled-division-slash x equals 1 .
  • x circled-plus 0 equals x .
  • x minus x equals 0 .
  • 2 circled-times x and x circled-division-slash 2 are exact; no rounding is performed to compute the final result. More generally, any multiplication by or division by a power of two gives an exact result (assuming there is no overflow or underflow).
  • x circled-division-slash 2 Superscript i Baseline equals x circled-times 2 Superscript negative i for all integer i , assuming 2 Superscript i does not overflow.

All of these properties follow from the principle that the result must be the nearest floating-point value to the actual result; when the result can be represented exactly, the exact result must be computed.

Utility Routines

A few basic utility routines will be useful in the following. First, we define our own IsNaN() function to check for NaN values. It comes with the baggage of a use of C++’s enable_if construct to declare its return type in a way that requires that this function only be called with floating-point types.

<<Floating-point Inline Functions>>= 
template <typename T> inline typename std::enable_if_t<std::is_floating_point_v<T>, bool> IsNaN(T v) { return std::isnan(v); }

We also define IsNaN() for integer-based types; it trivially returns false, since NaN is not representable in those types. One might wonder why we have bothered with enable_if and this second definition that tells us something that we already know. One motivation is the templated Tuple2 and Tuple3 classes from Section 3.2, which are used with both Float and int for their element types. Given these two functions, they can freely have assertions that their elements do not store NaN values without worrying about which particular type their elements are.

<<Floating-point Inline Functions>>+=  
template <typename T> inline typename std::enable_if_t<std::is_integral_v<T>, bool> IsNaN(T v) { return false; }

For similar motivations, we define a pair of IsInf() functions that test for infinity.

<<Floating-point Inline Functions>>+=  
template <typename T> inline typename std::enable_if_t<std::is_floating_point_v<T>, bool> IsInf(T v) { return std::isinf(v); }

Once again, because infinity is not representable with integer types, the integer variant of this function returns false.

<<Floating-point Inline Functions>>+=  
template <typename T> inline typename std::enable_if_t<std::is_integral_v<T>, bool> IsInf(T v) { return false; }

A pair of IsFinite() functions check whether a number is neither infinite or NaN.

<<Floating-point Inline Functions>>+=  
template <typename T> inline typename std::enable_if_t<std::is_floating_point_v<T>, bool> IsFinite(T v) { return std::isfinite(v); } template <typename T> inline typename std::enable_if_t<std::is_integral_v<T>, bool> IsFinite(T v) { return true; }

Although fused multiply add is available through the standard library, we also provide our own FMA() function.

<<Floating-point Inline Functions>>+=  
float FMA(float a, float b, float c) { return std::fma(a, b, c); }

A separate version for integer types allows calling FMA() from code regardless of the numeric type being used.

<<Math Inline Functions>>+=  
template <typename T> inline typename std::enable_if_t<std::is_integral_v<T>, T> FMA(T a, T b, T c) { return a * b + c; }

For certain low-level operations, it can be useful to be able to interpret a floating-point value in terms of its constituent bits and to convert the bits representing a floating-point value to an actual float or double. A natural approach to this would be to take a pointer to a value to be converted and cast it to a pointer to the other type:

float f = ...; uint32_t bits = *((uint32_t *)&f);

However, modern versions of C++ specify that it is illegal to cast a pointer of one type, float, to a different type, uint32_t. (This restriction allows the compiler to optimize more aggressively in its analysis of whether two pointers may point to the same memory location, which can inhibit storing values in registers.) Another popular alternative, using a union with elements of both types, assigning to one type and reading from the other, is also illegal: the C++ standard says that reading an element of a union different from the last one assigned to is undefined behavior.

Fortunately, as of C++20, the standard library provides a std::bit_cast function that performs such conversions. Because this version of pbrt only requires C++17, we provide an implementation in the pstd library that is used by the following conversion functions.

<<Floating-point Inline Functions>>+=  
inline uint32_t FloatToBits(float f) { return pstd::bit_cast<uint32_t>(f); }

<<Floating-point Inline Functions>>+=  
inline float BitsToFloat(uint32_t ui) { return pstd::bit_cast<float>(ui); }

(Versions of these functions that convert between double and uint64_t are also available but are similar and are therefore not included here.)

The corresponding integer type with a sufficient number of bits to store pbrt’s Float type is available through FloatBits.

<<Float Type Definitions>>+= 
#ifdef PBRT_FLOAT_AS_DOUBLE using FloatBits = uint64_t; #else using FloatBits = uint32_t; #endif // PBRT_FLOAT_AS_DOUBLE

Given the ability to extract the bits of a floating-point value and given the description of their layout in Section 6.8.1, it is easy to extract various useful quantities from a float.

<<Floating-point Inline Functions>>+=  
inline int Exponent(float v) { return (FloatToBits(v) >> 23) - 127; }

<<Floating-point Inline Functions>>+=  
inline int Significand(float v) { return FloatToBits(v) & ((1 << 23) - 1); }

<<Floating-point Inline Functions>>+=  
inline uint32_t SignBit(float v) { return FloatToBits(v) & 0x80000000; }

These conversions can be used to implement functions that bump a floating-point value up or down to the next greater or next smaller representable floating-point value. They are useful for some conservative rounding operations that we will need in code to follow. Thanks to the specifics of the in-memory representation of floats, these operations are quite efficient.

<<Floating-point Inline Functions>>+=  
inline float NextFloatUp(float v) { <<Handle infinity and negative zero for NextFloatUp()>> 
if (IsInf(v) && v > 0.f) return v; if (v == -0.f) v = 0.f;
<<Advance v to next higher float>> 
uint32_t ui = FloatToBits(v); if (v >= 0) ++ui; else --ui; return BitsToFloat(ui);
}

There are two important special cases: first, if v is positive infinity, then this function just returns v unchanged. Second, negative zero is skipped forward to positive zero before continuing on to the code that advances the significand. This step must be handled explicitly, since the bit patterns for negative 0.0 and 0.0 are not adjacent.

<<Handle infinity and negative zero for NextFloatUp()>>= 
if (IsInf(v) && v > 0.f) return v; if (v == -0.f) v = 0.f;

Conceptually, given a floating-point value, we would like to increase the significand by one, where if the result overflows, the significand is reset to zero and the exponent is increased by one. Fortuitously, adding one to the in-memory integer representation of a float achieves this: because the exponent lies at the high bits above the significand, adding one to the low bit of the significand will cause a one to be carried all the way up into the exponent if the significand is all ones and otherwise will advance to the next higher significand for the current exponent. (This is yet another example of the careful thought that was applied to the development of the IEEE floating-point specification.) Note also that when the highest representable finite floating-point value’s bit representation is incremented, the bit pattern for positive floating-point infinity is the result.

For negative values, subtracting one from the bit representation similarly advances to the next higher value.

<<Advance v to next higher float>>= 
uint32_t ui = FloatToBits(v); if (v >= 0) ++ui; else --ui; return BitsToFloat(ui);

The NextFloatDown() function, not included here, follows the same logic but effectively in reverse. pbrt also provides versions of these functions for doubles.

Error Propagation

Using the guarantees of IEEE floating-point arithmetic, it is possible to develop methods to analyze and bound the error in a given floating-point computation. For more details on this topic, see the excellent book by Higham (2002), as well as Wilkinson’s earlier classic (1994).

Two measurements of error are useful in this effort: absolute and relative. If we perform some floating-point computation and get a rounded result a overTilde , we say that the magnitude of the difference between a overTilde and the result of doing that computation in the real numbers is the absolute error, delta Subscript normal a :

delta Subscript normal a Baseline equals StartAbsoluteValue a overTilde minus a EndAbsoluteValue period

Relative error, delta Subscript normal r , is the ratio of the absolute error to the precise result:

delta Subscript normal r Baseline equals StartAbsoluteValue StartFraction a overTilde minus a Over a EndFraction EndAbsoluteValue equals StartAbsoluteValue StartFraction delta Subscript normal a Baseline Over a EndFraction EndAbsoluteValue comma

as long as a not-equals 0 . Using the definition of relative error, we can thus write the computed value a overTilde as a perturbation of the exact result a :

a overTilde element-of a plus-or-minus delta Subscript normal a Baseline equals a left-parenthesis 1 plus-or-minus delta Subscript normal r Baseline right-parenthesis period

As a first application of these ideas, consider computing the sum of four numbers, a , b , c , and d , represented as floats. If we compute this sum as r = (((a + b) + c) + d), Equation (6.20) gives us

StartLayout 1st Row 1st Column left-parenthesis left-parenthesis left-parenthesis a circled-plus b right-parenthesis circled-plus c right-parenthesis circled-plus d right-parenthesis 2nd Column element-of left-parenthesis left-parenthesis left-parenthesis left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis right-parenthesis plus c right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis plus d right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis 2nd Row 1st Column Blank 2nd Column equals left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis cubed plus c left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis squared plus d left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis period EndLayout

Because epsilon Subscript normal m is small, higher-order powers of epsilon Subscript normal m can be bounded by an additional epsilon Subscript normal m term, and so we can bound the left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n terms with

left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n Baseline less-than-or-equal-to left-parenthesis 1 plus-or-minus left-parenthesis n plus 1 right-parenthesis epsilon Subscript normal m Baseline right-parenthesis period

(As a practical matter, left-parenthesis 1 plus-or-minus n epsilon Subscript normal m Baseline right-parenthesis almost bounds these terms, since higher powers of epsilon Subscript normal m get very small very quickly, but the above is a fully conservative bound.)

This bound lets us simplify the result of the addition to:

StartLayout 1st Row 1st Column left-parenthesis a plus b right-parenthesis left-parenthesis 1 plus-or-minus 4 epsilon Subscript normal m Baseline right-parenthesis 2nd Column plus c left-parenthesis 1 plus-or-minus 3 epsilon Subscript normal m Baseline right-parenthesis plus d left-parenthesis 1 plus-or-minus 2 epsilon Subscript normal m Baseline right-parenthesis equals 2nd Row 1st Column Blank 2nd Column a plus b plus c plus d plus left-bracket plus-or-minus 4 epsilon Subscript normal m Baseline left-parenthesis a plus b right-parenthesis plus-or-minus 3 epsilon Subscript normal m Baseline c plus-or-minus 2 epsilon Subscript normal m Baseline d right-bracket period EndLayout

The term in square brackets gives the absolute error: its magnitude is bounded by

4 epsilon Subscript normal m Baseline StartAbsoluteValue a plus b EndAbsoluteValue plus 3 epsilon Subscript normal m Baseline StartAbsoluteValue c EndAbsoluteValue plus 2 epsilon Subscript normal m Baseline StartAbsoluteValue d EndAbsoluteValue period

Thus, if we add four floating-point numbers together with the above parenthesization, we can be certain that the difference between the final rounded result and the result we would get if we added them with infinite-precision real numbers is bounded by Equation (6.22); this error bound is easily computed given specific values of a , b , c , and  d .

This is a fairly interesting result; we see that the magnitude of a plus b makes a relatively large contribution to the error bound, especially compared to d . (This result gives a sense for why, if adding a large number of floating-point numbers together, sorting them from small to large magnitudes generally gives a result with a lower final error than an arbitrary ordering.)

Our analysis here has implicitly assumed that the compiler would generate instructions according to the expression used to define the sum. Compilers are required to follow the form of the given floating-point expressions in order to not break carefully crafted computations that may have been designed to minimize round-off error. Here again is a case where certain transformations that would be valid on expressions with integers cannot be safely applied when floats are involved.

What happens if we change the expression to the algebraically equivalent float r = (a + b) + (c + d)? This corresponds to the floating-point computation

left-parenthesis left-parenthesis a circled-plus b right-parenthesis circled-plus left-parenthesis c circled-plus d right-parenthesis right-parenthesis period

If we use the same process of applying Equation (6.20), expanding out terms, converting higher-order left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n terms to left-parenthesis 1 plus-or-minus left-parenthesis n plus 1 right-parenthesis epsilon Subscript normal m Baseline right-parenthesis , we get absolute error bounds of

3 epsilon Subscript normal m Baseline StartAbsoluteValue a plus b EndAbsoluteValue plus 3 epsilon Subscript normal m Baseline StartAbsoluteValue c plus d EndAbsoluteValue comma

which are lower than the first formulation if StartAbsoluteValue a plus b EndAbsoluteValue is relatively large, but possibly higher if StartAbsoluteValue c plus d EndAbsoluteValue is relatively large.

This approach to computing error is known as forward error analysis; given inputs to a computation, we can apply a fairly mechanical process that provides conservative bounds on the error in the result. The derived bounds in the result may overstate the actual error—in practice, the signs of the error terms are often mixed, so that there is cancellation when they are added. An alternative approach is backward error analysis, which treats the computed result as exact and finds bounds on perturbations on the inputs that give the same result. This approach can be more useful when analyzing the stability of a numerical algorithm but is less applicable to deriving conservative error bounds on the geometric computations we are interested in here.

The conservative bounding of left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n by left-parenthesis 1 plus-or-minus left-parenthesis n plus 1 right-parenthesis epsilon Subscript normal m Baseline right-parenthesis is somewhat unsatisfying since it adds a whole epsilon Subscript normal m term purely to conservatively bound the sum of various higher powers of epsilon Subscript normal m . Higham (2002, Section 3.1) gives an approach to more tightly bound products of left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis error terms. If we have left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n , it can be shown that this value is bounded by 1 plus theta Subscript n , where

StartAbsoluteValue theta Subscript n Baseline EndAbsoluteValue less-than-or-equal-to StartFraction n epsilon Subscript normal m Baseline Over 1 minus n epsilon Subscript normal m Baseline EndFraction comma

as long as n epsilon Subscript normal m Baseline less-than 1 (which will certainly be the case for the calculations we consider). Note that the denominator of this expression will be just less than one for reasonable n values, so it just barely increases n epsilon Subscript normal m to achieve a conservative bound.

We will denote this bound by gamma Subscript n :

gamma Subscript n Baseline equals StartFraction n epsilon Subscript normal m Baseline Over 1 minus n epsilon Subscript normal m Baseline EndFraction period

The function that computes its value is declared as constexpr so that any invocations with compile-time constants will be replaced with the corresponding floating-point return value.

<<Floating-point Inline Functions>>+=  
inline constexpr Float gamma(int n) { return (n * MachineEpsilon) / (1 - n * MachineEpsilon); }

Using the gamma notation, our bound on the error of the first sum of four values is

StartAbsoluteValue a plus b EndAbsoluteValue gamma 3 plus StartAbsoluteValue c EndAbsoluteValue gamma 2 plus StartAbsoluteValue d EndAbsoluteValue gamma 1 period

An advantage of this approach is that quotients of left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n terms can also be bounded with the gamma function. Given

StartFraction left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript m Baseline Over left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n Baseline EndFraction comma

the interval is bounded by left-parenthesis 1 plus-or-minus gamma Subscript m plus n Baseline right-parenthesis . Thus, gamma can be used to collect epsilon Subscript normal m terms from both sides of an equality over to one side by dividing them through; this will be useful in some of the following derivations. (Note that because left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis terms represent intervals, canceling them would be incorrect:

StartFraction left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript m Baseline Over left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript n Baseline EndFraction not-equals left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis Superscript m minus n Baseline semicolon

the gamma Subscript m plus n bounds must be used instead.)

Given inputs to some computation that themselves carry some amount of error, it is instructive to see how this error is carried through various elementary arithmetic operations. Given two values, a left-parenthesis 1 plus-or-minus gamma Subscript i Baseline right-parenthesis and b left-parenthesis 1 plus-or-minus gamma Subscript j Baseline right-parenthesis , that each carry accumulated error from earlier operations, consider their product. Using the definition of circled-times , the result is in the interval:

a left-parenthesis 1 plus-or-minus gamma Subscript i Baseline right-parenthesis circled-times b left-parenthesis 1 plus-or-minus gamma Subscript j Baseline right-parenthesis element-of a b left-parenthesis 1 plus-or-minus gamma Subscript i plus j plus 1 Baseline right-parenthesis comma

where we have used the relationship left-parenthesis 1 plus-or-minus gamma Subscript i Baseline right-parenthesis left-parenthesis 1 plus-or-minus gamma Subscript j Baseline right-parenthesis element-of left-parenthesis 1 plus-or-minus gamma Subscript i plus j Baseline right-parenthesis , which follows directly from Equation (6.23).

The relative error in this result is bounded by

StartAbsoluteValue StartFraction a b gamma Subscript i plus j plus 1 Baseline Over a b EndFraction EndAbsoluteValue equals gamma Subscript i plus j plus 1 Baseline comma

and so the final error is no more than roughly left-parenthesis i plus j plus 1 right-parenthesis slash 2 ulps at the value of the product—about as good as we might hope for, given the error going into the multiplication. (The situation for division is similarly good.)

Unfortunately, with addition and subtraction, it is possible for the relative error to increase substantially. Using the same definitions of the values being operated on, consider

a left-parenthesis 1 plus-or-minus gamma Subscript i Baseline right-parenthesis circled-plus b left-parenthesis 1 plus-or-minus gamma Subscript j Baseline right-parenthesis comma

which is in the interval a left-parenthesis 1 plus-or-minus gamma Subscript i plus 1 Baseline right-parenthesis plus b left-parenthesis 1 plus-or-minus gamma Subscript j plus 1 Baseline right-parenthesis comma and so the absolute error is bounded by StartAbsoluteValue a EndAbsoluteValue gamma Subscript i plus 1 plus StartAbsoluteValue b EndAbsoluteValue gamma Subscript j plus 1 .

If the signs of a and b are the same, then the absolute error is bounded by StartAbsoluteValue a plus b EndAbsoluteValue gamma Subscript i plus j plus 1 and the relative error is approximately left-parenthesis i plus j plus 1 right-parenthesis slash 2 ulps around the computed value.

However, if the signs of a and b differ (or, equivalently, they are the same but subtraction is performed), then the relative error can be quite high. Consider the case where a almost-equals negative b : the relative error is

StartFraction StartAbsoluteValue a EndAbsoluteValue gamma Subscript i plus 1 Baseline plus StartAbsoluteValue b EndAbsoluteValue gamma Subscript j plus 1 Baseline Over a plus b EndFraction almost-equals StartFraction 2 StartAbsoluteValue a EndAbsoluteValue gamma Subscript i plus j plus 1 Baseline Over a plus b EndFraction period

The numerator’s magnitude is proportional to the original value StartAbsoluteValue a EndAbsoluteValue yet is divided by a very small number, and thus the relative error is quite high. This substantial increase in relative error is called catastrophic cancellation. Equivalently, we can have a sense of the issue from the fact that the absolute error is in terms of the magnitude of StartAbsoluteValue a EndAbsoluteValue , though it is in relation to a value much smaller than a .

Running Error Analysis

In addition to working out error bounds algebraically, we can also have the computer do this work for us as some computation is being performed. This approach is known as running error analysis. The idea behind it is simple: each time a floating-point operation is performed, we compute intervals based on Equation (6.20) that bound its true value.

The Interval class, which is defined in Section B.2.15, provides this functionality. The Interval class also tracks rounding errors in floating-point arithmetic and is useful even if none of the initial values are intervals. While computing error bounds in this way has higher runtime overhead than using derived expressions that give an error bound directly, it can be convenient when derivations become unwieldy.

6.8.2 Conservative Ray–Bounds Intersections

Floating-point round-off error can cause the ray–bounding box intersection test to miss cases where a ray actually does intersect the box. While it is acceptable to have occasional false positives from ray–box intersection tests, we would like to never miss an intersection—getting this right is important for the correctness of the BVHAggregate acceleration data structure in Section 7.3 so that valid ray–shape intersections are not missed. The ray–bounding box test introduced in Section 6.1.2 is based on computing a series of ray–slab intersections to find the parametric t Subscript normal m normal i normal n along the ray where the ray enters the bounding box and the t Subscript normal m normal a normal x where it exits. If t Subscript normal m normal i normal n Baseline less-than t Subscript normal m normal a normal x , the ray passes through the box; otherwise, it misses it. With floating-point arithmetic, there may be error in the computed t values—if the computed t Subscript normal m normal i normal n value is greater than t Subscript normal m normal a normal x purely due to round-off error, the intersection test will incorrectly return a false result.

Recall that the computation to find the t value for a ray intersection with a plane perpendicular to the x axis at a point x is t equals left-parenthesis x minus normal o Subscript x Baseline right-parenthesis slash bold d Subscript x . Expressed as a floating-point computation and applying Equation (6.19), we have

t equals left-parenthesis x minus normal o Subscript x Baseline right-parenthesis circled-times left-parenthesis 1 circled-division-slash bold d Subscript x Baseline right-parenthesis element-of StartFraction x minus normal o Subscript x Baseline Over bold d Subscript x Baseline EndFraction left-parenthesis 1 plus-or-minus epsilon right-parenthesis cubed comma

and so

StartFraction x minus normal o Subscript x Baseline Over bold d Subscript x Baseline EndFraction element-of t left-parenthesis 1 plus-or-minus gamma 3 right-parenthesis period

The difference between the computed result t and the precise result is bounded by gamma 3 StartAbsoluteValue t EndAbsoluteValue .

If we consider the intervals around the computed t values that bound the true value of t , then the case we are concerned with is when the intervals overlap; if they do not, then the comparison of computed values will give the correct result (Figure 6.41). If the intervals do overlap, it is impossible to know the true ordering of the t values. In this case, increasing t Subscript normal m normal a normal x by twice the error bound, 2 gamma 3 t Subscript normal m normal a normal x , before performing the comparison ensures that we conservatively return true in this case.

Figure 6.41: If the error bounds of the computed t Subscript normal m normal i normal n and t Subscript normal m normal a normal x values overlap, the comparison t Subscript normal m normal i normal n Baseline less-than t Subscript normal m normal a normal x may not indicate if a ray hit a bounding box. It is better to conservatively return true in this case than to miss an intersection. Extending t Subscript normal m normal a normal x by twice its error bound ensures that the comparison is conservative.

We can now define the fragment for the ray–bounding box test in Section 6.1.2 that makes this adjustment.

<<Update tFar to ensure robust ray–bounds intersection>>= 
tFar *= 1 + 2 * gamma(3);

The fragments for the Bounds3::IntersectP() method, <<Update tMax and tyMax to ensure robust bounds intersection>> and <<Update tzMax to ensure robust bounds intersection>>, are similar and therefore not included here.

6.8.3 Accurate Quadratic Discriminants

Recall from Sections 6.2.2 and 6.3.2 that intersecting a ray with a sphere or cylinder involves finding the zeros of a quadratic equation, which requires calculating its discriminant, b squared minus 4 a c . If the discriminant is computed as written, then when the sphere is far from the ray origin, b squared almost-equals 4 a c and catastrophic cancellation occurs. This issue is made worse since the magnitudes of the two terms of the discriminant are related to the squared distance between the sphere and the ray origin. Even for rays that are far from ever hitting the sphere, a discriminant may be computed that is exactly equal to zero, leading to the intersection code reporting an invalid intersection. See Figure 6.42, which shows that this error can be meaningful in practice.

Figure 6.42: The Effect of Reducing the Error in the Computation of the Discriminant for Ray–Sphere Intersection. Unit sphere, viewed using an orthographic projection with a camera 400 units away. (a) If the quadratic discriminant is computed in the usual fashion, numeric error causes intersections at the edges to be missed. In the found intersections, the inaccuracy is evident in the wobble of the textured lines. (b) With the more precise formulation described in this section, the sphere is rendered correctly. (With the improved discriminant, such a sphere can be translated as far as 7,500 or so units from an orthographic camera and still be rendered accurately.)

Algebraically rewriting the discriminant computation makes it possible to compute it with more accuracy. First, if we rewrite the quadratic discriminant as

b squared minus 4 a c equals 4 a left-parenthesis StartFraction b squared Over 4 a EndFraction minus c right-parenthesis

and then substitute in the values of a , b , and c from Equation (6.3) to the terms inside the parentheses, we have

StartLayout 1st Row 1st Column Blank 2nd Column 4 a left-parenthesis StartFraction 4 left-parenthesis bold o dot bold d right-parenthesis squared Over 4 left-parenthesis bold d dot bold d right-parenthesis EndFraction minus left-parenthesis left-parenthesis bold o dot bold o right-parenthesis minus r squared right-parenthesis right-parenthesis 2nd Row 1st Column equals 2nd Column 4 a left-parenthesis left-bracket left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis squared minus left-parenthesis bold o dot bold o right-parenthesis right-bracket plus r squared right-parenthesis comma EndLayout

where we have denoted the vector from left-parenthesis 0 comma 0 comma 0 right-parenthesis to the ray’s origin as bold o and ModifyingAbove bold d With bold caret is the ray’s normalized direction.

Now consider the decomposition of bold o into the sum of two vectors, bold d Subscript up-tack and bold d Subscript parallel-to , where bold d Subscript parallel-to is parallel to ModifyingAbove bold d With bold caret and bold d Subscript up-tack is perpendicular to it. Those vectors are given by

StartLayout 1st Row 1st Column bold d Subscript parallel-to 2nd Column equals left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis ModifyingAbove bold d With bold caret 2nd Row 1st Column bold d Subscript up-tack 2nd Column equals bold o minus bold d Subscript parallel-to Baseline equals bold o minus left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis ModifyingAbove bold d With bold caret period EndLayout

These three vectors form a right triangle, and therefore double-vertical-bar bold o double-vertical-bar squared equals double-vertical-bar bold d Subscript up-tack Baseline double-vertical-bar squared plus double-vertical-bar bold d Subscript parallel-to Baseline double-vertical-bar squared . Applying Equation (6.25),

StartLayout 1st Row 1st Column left-parenthesis bold o dot bold o right-parenthesis 2nd Column equals double-vertical-bar bold o minus bold left-parenthesis bold o dot ModifyingAbove bold d With bold caret bold right-parenthesis ModifyingAbove bold d With bold caret double-vertical-bar squared plus left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis squared double-vertical-bar ModifyingAbove bold d With bold caret double-vertical-bar squared 2nd Row 1st Column Blank 2nd Column equals double-vertical-bar bold o minus bold left-parenthesis bold o dot ModifyingAbove bold d With bold caret bold right-parenthesis ModifyingAbove bold d With bold caret double-vertical-bar squared plus left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis squared period EndLayout

Rearranging terms gives

left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis squared minus left-parenthesis bold o dot bold o right-parenthesis equals minus double-vertical-bar bold o minus bold left-parenthesis bold o dot ModifyingAbove bold d With bold caret bold right-parenthesis ModifyingAbove bold d With bold caret double-vertical-bar squared period

Expressing the right hand side in terms of the sphere quadratic coefficients from Equation (6.3) gives

left-parenthesis bold o dot ModifyingAbove bold d With bold caret right-parenthesis squared minus left-parenthesis bold o dot bold o right-parenthesis equals minus double-vertical-bar bold o minus StartFraction b Over 2 a EndFraction bold d double-vertical-bar squared period

Note that the left hand side is equal to the term in square brackets in Equation (6.24).

Computing that term in this way eliminates c from the discriminant, which is of great benefit since its magnitude is proportional to the squared distance to the origin, with accordingly limited accuracy. In the implementation below, we take advantage of the fact that the discriminant is now the difference of squared values and make use of the identity x squared minus y squared equals left-parenthesis x plus y right-parenthesis left-parenthesis x minus y right-parenthesis to reduce the magnitudes of the intermediate values, which further reduces error.

<<Compute sphere quadratic discriminant discrim>>= 
Vector3fi v(oi - b / (2 * a) * di); Interval length = Length(v); Interval discrim = 4 * a * (Interval(radius) + length) * (Interval(radius) - length); if (discrim.LowerBound() < 0) return {};

One might ask, why go through this trouble when we could use the DifferenceOfProducts() function to compute the discriminant, presumably with low error? The reason that is not an equivalent alternative is that the values a , b , and c already suffer from rounding error. In turn, a result computed by DifferenceOfProducts() will be inaccurate if its inputs already are inaccurate themselves. c equals normal o Subscript x Superscript 2 Baseline plus normal o Subscript y Superscript 2 Baseline plus normal o Subscript z Superscript 2 Baseline minus r squared is particularly problematic, since it is the difference of two positive values, so is susceptible to catastrophic cancellation.

A similar derivation gives a more accurate discriminant for the cylinder.

<<Compute cylinder quadratic discriminant discrim>>= 
Interval f = b / (2 * a); Interval vx = oi.x - f * di.x, vy = oi.y - f * di.y; Interval length = Sqrt(Sqr(vx) + Sqr(vy)); Interval discrim = 4 * a * (Interval(radius) + length) * (Interval(radius) - length); if (discrim.LowerBound() < 0) return {};

6.8.4 Robust Triangle Intersections

The details of the ray–triangle intersection algorithm described in Section 6.5.3 were carefully designed to avoid cases where rays could incorrectly pass through an edge or vertex shared by two adjacent triangles without generating an intersection. Fittingly, an intersection algorithm with this guarantee is referred to as being watertight.

Recall that the algorithm is based on transforming triangle vertices into a coordinate system with the ray’s origin at its origin and the ray’s direction aligned along the plus z axis. Although round-off error may be introduced by transforming the vertex positions to this coordinate system, this error does not affect the watertightness of the intersection test, since the same transformation is applied to all triangles. (Further, this error is quite small, so it does not significantly impact the accuracy of the computed intersection points.)

Given vertices in this coordinate system, the three edge functions defined in Equation (6.5) are evaluated at the point left-parenthesis 0 comma 0 right-parenthesis ; the corresponding expressions, Equation (6.6), are quite straightforward. The key to the robustness of the algorithm is that with floating-point arithmetic, the edge function evaluations are guaranteed to have the correct sign. In general, we have

left-parenthesis a circled-times b right-parenthesis minus left-parenthesis c circled-times d right-parenthesis period

First, note that if a b equals c d , then Equation (6.26) evaluates to exactly zero, even in floating point. We therefore just need to show that if a b greater-than c d , then left-parenthesis a circled-times b right-parenthesis minus left-parenthesis c circled-times d right-parenthesis is never negative. If a b greater-than c d , then left-parenthesis a circled-times b right-parenthesis must be greater than or equal to left-parenthesis c circled-times d right-parenthesis . In turn, their difference must be greater than or equal to zero. (These properties both follow from the fact that floating-point arithmetic operations are all rounded to the nearest representable floating-point value.)

If the value of the edge function is zero, then it is impossible to tell whether it is exactly zero or whether a small positive or negative value has rounded to zero. In this case, the fragment <<Fall back to double-precision test at triangle edges>> reevaluates the edge function with double precision; it can be shown that doubling the precision suffices to accurately distinguish these cases, given 32-bit floats as input.

The overhead caused by this additional precaution is minimal: in a benchmark with 88 million ray intersection tests, the double-precision fallback had to be used in less than 0.0000023% of the cases.

6.8.5 Bounding Intersection Point Error

We can apply the machinery introduced in this section for analyzing rounding error to derive conservative bounds on the absolute error in computed ray–shape intersection points, which allows us to construct bounding boxes that are guaranteed to include an intersection point on the actual surface (Figure 6.43). These bounding boxes provide the basis of the algorithm for generating spawned ray origins that will be introduced in Section 6.8.6.

Figure 6.43: Shape intersection algorithms in pbrt compute an intersection point, shown here in the 2D setting with a filled circle. The absolute error in this point is bounded by delta Subscript x and delta Subscript y , giving a small box around the point. Because these bounds are conservative, we know that the actual intersection point on the surface (open circle) must lie somewhere within the box.

It is illuminating to start by looking at the sources of error in conventional approaches to computing intersection points. It is common practice in ray tracing to compute 3D intersection points by first solving the parametric ray equation normal o plus t bold d for a value t Subscript normal h normal i normal t where a ray intersects a surface and then computing the hit point normal p Subscript with normal p Subscript Baseline equals normal o plus t Subscript normal h normal i normal t Baseline bold d . If t Subscript normal h normal i normal t carries some error delta Subscript t , then we can bound the error in the computed intersection point. Considering the x coordinate, for example, we have

StartLayout 1st Row 1st Column x 2nd Column equals normal o Subscript x Baseline circled-plus left-parenthesis t Subscript normal h normal i normal t Baseline plus-or-minus delta Subscript t Baseline right-parenthesis circled-times bold d Subscript x Baseline 2nd Row 1st Column Blank 2nd Column element-of normal o Subscript x circled-plus left-parenthesis t Subscript normal h normal i normal t Baseline plus-or-minus delta Subscript t Baseline right-parenthesis bold d Subscript x Baseline left-parenthesis 1 plus-or-minus gamma 1 right-parenthesis 3rd Row 1st Column Blank 2nd Column subset-of normal o Subscript x Baseline left-parenthesis 1 plus-or-minus gamma 1 right-parenthesis plus left-parenthesis t Subscript normal h normal i normal t Baseline plus-or-minus delta Subscript t Baseline right-parenthesis bold d Subscript x Baseline left-parenthesis 1 plus-or-minus gamma 2 right-parenthesis 4th Row 1st Column Blank 2nd Column equals normal o Subscript x Baseline plus t Subscript normal h normal i normal t Baseline bold d Subscript x Baseline plus left-bracket plus-or-minus normal o Subscript x Baseline gamma 1 plus-or-minus delta Subscript t Baseline bold d Subscript x Baseline plus-or-minus t Subscript normal h normal i normal t Baseline bold d Subscript x Baseline gamma 2 plus-or-minus delta Subscript t Baseline bold d Subscript x Baseline gamma 2 right-bracket period EndLayout

The error term (in square brackets) is bounded by

gamma 1 StartAbsoluteValue normal o Subscript x Baseline EndAbsoluteValue plus delta Subscript t Baseline left-parenthesis 1 plus gamma 2 right-parenthesis StartAbsoluteValue bold d Subscript x Baseline EndAbsoluteValue plus gamma 2 StartAbsoluteValue t Subscript normal h normal i normal t Baseline bold d Subscript x Baseline EndAbsoluteValue period

There are two things to see from Equation (6.27): first, the magnitudes of the terms that contribute to the error in the computed intersection point ( normal o Subscript x , bold d Subscript x , and t Subscript normal h normal i normal t Baseline bold d Subscript x ) may be quite different from the magnitude of the intersection point. Thus, there is a danger of catastrophic cancellation in the intersection point’s computed value. Second, ray intersection algorithms generally perform tens of floating-point operations to compute t values, which in turn means that we can expect delta Subscript t to be at least of magnitude gamma Subscript n Baseline t , with n in the tens (and possibly much more, due to catastrophic cancellation).

Each of these terms may introduce a significant amount of error in the computed point x . We introduce better approaches in the following.

Reprojection: Quadrics

We would like to reliably compute surface intersection points with just a few ulps of error rather than the orders of magnitude greater error that intersection points computed with the parametric ray equation may have. Previously, Woo et al. (1996) suggested using the first intersection point computed as a starting point for a second ray–plane intersection, for ray–polygon intersections. From the bounds in Equation (6.27), we can see why the second intersection point will often be much closer to the surface than the first: the t Subscript normal h normal i normal t value along the second ray will be quite close to zero, so that the magnitude of the absolute error in t Subscript normal h normal i normal t will be quite small, and thus using this value in the parametric ray equation will give a point quite close to the surface (Figure 6.44). Further, the ray origin will have similar magnitude to the intersection point, so the gamma 1 StartAbsoluteValue normal o Subscript x Baseline EndAbsoluteValue term will not introduce much additional error.

Figure 6.44: Reintersection to Improve the Accuracy of the Computed Intersection Point. Given a ray and a surface, an initial intersection point has been computed with the ray equation (filled circle). This point may be fairly inaccurate due to rounding error but can be used as the origin for a second ray–shape intersection. The intersection point computed from this second intersection (open circle) is much closer to the surface, though it may be shifted from the true intersection point due to error in the first computed intersection.

Although the second intersection point computed with this approach is much closer to the plane of the surface, it still suffers from error by being offset due to error in the first computed intersection. The farther away the ray origin is from the intersection point (and thus, the larger the absolute error is in t Subscript normal h normal i normal t ), the larger this error will be. In spite of this error, the approach has merit: we are generally better off with a computed intersection point that is quite close to the actual surface, even if offset from the most accurate possible intersection point, than we are with a point that is some distance above or below the surface (and likely also far from the most accurate intersection point).

Rather than doing a full reintersection computation, which may not only be computationally costly but also will still have error in the computed t value, an effective alternative is to refine computed intersection points by reprojecting them to the surface. The error bounds for these reprojected points are often remarkably small. (It should be noted that these reprojection error bounds do not capture tangential errors that were present in the original intersection  normal p Subscript —the main focus here is to detect errors that might cause the reprojected point normal p prime to fall below the surface.)

Consider a ray–sphere intersection: given a computed intersection point (e.g., from the ray equation) normal p Subscript with a sphere at the origin with radius r , we can reproject the point onto the surface of the sphere by scaling it with the ratio of the sphere’s radius to the computed point’s distance to the origin, computing a new point normal p prime equals left-parenthesis x prime comma y prime comma z prime right-parenthesis with

x prime equals x StartFraction r Over StartRoot x squared plus y squared plus z squared EndRoot EndFraction comma

and so forth. The floating-point computation is

StartLayout 1st Row 1st Column x prime 2nd Column equals x circled-times r circled-division-slash monospace s monospace q monospace r monospace t left-parenthesis left-parenthesis x circled-times x right-parenthesis circled-plus left-parenthesis y circled-times y right-parenthesis circled-plus left-parenthesis z circled-times z right-parenthesis right-parenthesis 2nd Row 1st Column Blank 2nd Column element-of StartFraction x r left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis squared Over StartRoot x squared left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis cubed plus y squared left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis cubed plus z squared left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis squared EndRoot left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis EndFraction 3rd Row 1st Column Blank 2nd Column subset-of StartFraction x r left-parenthesis 1 plus-or-minus gamma 2 right-parenthesis Over StartRoot x squared left-parenthesis 1 plus-or-minus gamma 3 right-parenthesis plus y squared left-parenthesis 1 plus-or-minus gamma 3 right-parenthesis plus z squared left-parenthesis 1 plus-or-minus gamma 2 right-parenthesis EndRoot left-parenthesis 1 plus-or-minus gamma 1 right-parenthesis EndFraction period EndLayout

Because x squared , y squared , and z squared are all positive, the terms in the square root can share the same gamma term, and we have

StartLayout 1st Row 1st Column x prime 2nd Column element-of StartFraction x r left-parenthesis 1 plus-or-minus gamma 2 right-parenthesis Over StartRoot left-parenthesis x squared plus y squared plus z squared right-parenthesis left-parenthesis 1 plus-or-minus gamma 4 right-parenthesis EndRoot left-parenthesis 1 plus-or-minus gamma 1 right-parenthesis EndFraction 2nd Row 1st Column Blank 2nd Column equals StartFraction x r left-parenthesis 1 plus-or-minus gamma 2 right-parenthesis Over StartRoot left-parenthesis x squared plus y squared plus z squared right-parenthesis EndRoot StartRoot left-parenthesis 1 plus-or-minus gamma 4 right-parenthesis EndRoot left-parenthesis 1 plus-or-minus gamma 1 right-parenthesis EndFraction 3rd Row 1st Column Blank 2nd Column subset-of StartFraction x r Over StartRoot left-parenthesis x squared plus y squared plus z squared right-parenthesis EndRoot EndFraction left-parenthesis 1 plus-or-minus gamma 5 right-parenthesis 4th Row 1st Column Blank 2nd Column equals x prime left-parenthesis 1 plus-or-minus gamma 5 right-parenthesis period EndLayout

Thus, the absolute error of the reprojected x coordinate is bounded by gamma 5 StartAbsoluteValue x prime EndAbsoluteValue (and similarly for y prime and z prime ) and is thus no more than 2.5 ulps in each dimension from a point on the surface of the sphere.

Here is the fragment that reprojects the intersection point for the Sphere shape.

<<Refine sphere intersection point>>= 
pHit *= radius / Distance(pHit, Point3f(0, 0, 0));

The error bounds follow from Equation (6.28).

<<Compute error bounds for sphere intersection>>= 
Vector3f pError = gamma(5) * Abs((Vector3f)pHit);

Reprojection algorithms and error bounds for other quadrics can be defined similarly: for example, for a cylinder along the z axis, only the x and y coordinates need to be reprojected, and the error bounds in x and y turn out to be only gamma 3 times their magnitudes.

<<Refine cylinder intersection point>>= 
Float hitRad = std::sqrt(Sqr(pHit.x) + Sqr(pHit.y)); pHit.x *= radius / hitRad; pHit.y *= radius / hitRad;

<<Compute error bounds for cylinder intersection>>= 
Vector3f pError = gamma(3) * Abs(Vector3f(pHit.x, pHit.y, 0));

The disk shape is particularly easy; we just need to set the z coordinate of the point to lie on the plane of the disk.

<<Refine disk intersection point>>= 
pHit.z = height;

In turn, we have a point with zero error; it lies exactly on the surface on the disk.

<<Compute error bounds for disk intersection>>= 
Vector3f pError(0, 0, 0);

The quadrics’ Sample() methods also use reprojection. For example, the Sphere’s area sampling method is based on SampleUniformSphere(), which uses std::sin() and std::cos(). Therefore, the error bounds on the computed pObj value depend on the accuracy of those functions. By reprojecting the sampled point to the sphere’s surface, the error bounds derived earlier in Equation (6.28) can be used without needing to worry about those functions’ accuracy.

<<Reproject pObj to sphere surface and compute pObjError>>= 
pObj *= radius / Distance(pObj, Point3f(0, 0, 0)); Vector3f pObjError = gamma(5) * Abs((Vector3f)pObj);

The same issue and solution apply to sampling cylinders.

<<Reproject pObj to cylinder surface and compute pObjError>>= 
Float hitRad = std::sqrt(Sqr(pObj.x) + Sqr(pObj.y)); pObj.x *= radius / hitRad; pObj.y *= radius / hitRad; Vector3f pObjError = gamma(3) * Abs(Vector3f(pObj.x, pObj.y, 0));

Parametric Evaluation: Triangles

Another effective approach to computing accurate intersection points near the surface of a shape uses the shape’s parametric representation. For example, the triangle intersection algorithm in Section 6.5.3 computes three edge function values e 0 , e 1 , and e 2 and reports an intersection if all three have the same sign. Their values can be used to find the barycentric coordinates

b Subscript i Baseline equals StartFraction e Subscript i Baseline Over e 0 plus e 1 plus e 2 EndFraction period

Attributes v Subscript i at the triangle vertices (including the vertex positions) can be interpolated across the face of the triangle by

v prime equals b 0 v 0 plus b 1 v 1 plus b 2 v 2 period

We can show that interpolating the positions of the vertices in this manner gives a point very close to the surface of the triangle. First consider precomputing the reciprocal of the sum of  e Subscript i :

StartLayout 1st Row 1st Column d 2nd Column equals 1 circled-division-slash left-parenthesis e 0 circled-plus e 1 circled-plus e 2 right-parenthesis 2nd Row 1st Column Blank 2nd Column element-of StartFraction 1 Over left-parenthesis e 0 plus e 1 right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis squared plus e 2 left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis EndFraction left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis period EndLayout

Because all e Subscript i have the same sign if there is an intersection, we can collect the e Subscript i terms and conservatively bound d :

StartLayout 1st Row 1st Column d 2nd Column element-of StartFraction 1 Over left-parenthesis e 0 plus e 1 plus e 2 right-parenthesis left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis squared EndFraction left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis 2nd Row 1st Column Blank 2nd Column subset-of StartFraction 1 Over e 0 plus e 1 plus e 2 EndFraction left-parenthesis 1 plus-or-minus gamma 3 right-parenthesis period EndLayout

If we now consider interpolation of the x coordinate of the position in the triangle corresponding to the edge function values, we have

StartLayout 1st Row 1st Column x prime 2nd Column equals left-parenthesis left-parenthesis e 0 circled-times x 0 right-parenthesis circled-plus left-parenthesis e 1 circled-times x 1 right-parenthesis circled-plus left-parenthesis e 2 circled-times x 2 right-parenthesis right-parenthesis circled-times d 2nd Row 1st Column Blank 2nd Column element-of left-parenthesis e 0 x 0 left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis cubed plus e 1 x 1 left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis cubed plus e 2 x 2 left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis squared right-parenthesis d left-parenthesis 1 plus-or-minus epsilon Subscript normal m Baseline right-parenthesis 3rd Row 1st Column Blank 2nd Column subset-of left-parenthesis e 0 x 0 left-parenthesis 1 plus-or-minus gamma 4 right-parenthesis plus e 1 x 1 left-parenthesis 1 plus-or-minus gamma 4 right-parenthesis plus e 2 x 2 left-parenthesis 1 plus-or-minus gamma 3 right-parenthesis right-parenthesis d period EndLayout

Using the bounds on d ,

StartLayout 1st Row 1st Column x 2nd Column element-of StartFraction e 0 x 0 left-parenthesis 1 plus-or-minus gamma 7 right-parenthesis plus e 1 x 1 left-parenthesis 1 plus-or-minus gamma 7 right-parenthesis plus e 2 x 2 left-parenthesis 1 plus-or-minus gamma 6 right-parenthesis Over e 0 plus e 1 plus e 2 EndFraction 2nd Row 1st Column Blank 2nd Column equals b 0 x 0 left-parenthesis 1 plus-or-minus gamma 7 right-parenthesis plus b 1 x 1 left-parenthesis 1 plus-or-minus gamma 7 right-parenthesis plus b 2 x 2 left-parenthesis 1 plus-or-minus gamma 6 right-parenthesis period EndLayout

Thus, we can finally see that the absolute error in the computed x prime value is in the interval

plus-or-minus b 0 x 0 gamma 7 plus-or-minus b 1 x 1 gamma 7 plus-or-minus b 2 x 2 gamma 7 comma

which is bounded by

gamma 7 left-parenthesis StartAbsoluteValue b 0 x 0 EndAbsoluteValue plus StartAbsoluteValue b 1 x 1 EndAbsoluteValue plus StartAbsoluteValue b 2 x 2 EndAbsoluteValue right-parenthesis period

(Note that the b 2 x 2 term could have a gamma 6 factor instead of gamma 7 , but the difference between the two is very small, so we choose a slightly simpler final expression.) Equivalent bounds hold for y prime and z prime .

Equation (6.29) lets us bound the error in the interpolated point computed in Triangle::Intersect().

<<Compute error bounds pError for triangle intersection>>= 
Point3f pAbsSum = Abs(ti.b0 * p0) + Abs(ti.b1 * p1) + Abs(ti.b2 * p2); Vector3f pError = gamma(7) * Vector3f(pAbsSum);

The bounds for a sampled point on a triangle can be found in a similar manner.

<<Compute error bounds pError for sampled point on triangle>>= 
Point3f pAbsSum = Abs(b[0] * p0) + Abs(b[1] * p1) + Abs((1 - b[0] - b[1]) * p2); Vector3f pError = Vector3f(gamma(6) * pAbsSum);

Parametric Evaluation: Bilinear Patches

Bilinear patch intersection points are found by evaluating the bilinear function from Equation (6.11). The computation performed is

left-bracket left-parenthesis 1 minus u right-parenthesis circled-times left-parenthesis left-parenthesis 1 minus v right-parenthesis circled-times normal p Subscript 0 comma 0 Baseline circled-plus v circled-times normal p Subscript 0 comma 1 Baseline right-parenthesis right-bracket circled-plus left-bracket u circled-times left-parenthesis left-parenthesis 1 minus v right-parenthesis circled-times normal p Subscript 1 comma 0 Baseline circled-plus v circled-times normal p Subscript 1 comma 1 Baseline right-parenthesis right-bracket period

Considering just the x