10.4 Image Texture
The ImageTexture class stores a 2D array of point-sampled values of a texture function. It uses these samples to reconstruct a continuous image function that can be evaluated at an arbitrary position. These sample values are often called texels, since they are similar to pixels in an image but are used in the context of a texture. Image textures are the most widely used type of texture in computer graphics; digital photographs, scanned artwork, images created with image-editing programs, and images generated by renderers are all extremely useful sources of data for this particular texture representation (Figure 10.8).
The term texture map is often used to refer to this type of texture, although this usage blurs the distinction between the mapping that computes texture coordinates and the texture function itself. The implementation of this texture is in the files textures/imagemap.h and textures/imagemap.cpp.
The ImageTexture class is different from other textures in the system in that it is parameterized on both the data type of the texels it stores in memory as well as the data type of the value that it returns. Making this distinction allows us to create, for example, ImageTextures that store RGBSpectrum values in memory, but always return Spectrum values. In this way, when the system is compiled with full-spectral rendering enabled, the memory cost to store full SampledSpectrum texels doesn’t need to be paid for source images that only have RGB components.
The caller provides the ImageTexture with the filename of an image map, parameters that control the filtering of the map for antialiasing, and parameters that make it possible to scale and gamma-correct the texture values. The scale parameter will be explained later in this section, and the texture filtering parameters will be explained in Section 10.4.3. The contents of the file are used to create an instance of the MIPMap class that stores the texels in memory and handles the details of reconstruction and filtering to reduce aliasing.
For an ImageTexture that stores RGBSpectrum values in memory, its MIPMap stores the image data using three floating-point values for each sample. This is a somewhat wasteful representation, since a single image map may have millions of texels and may not need the full 32 bits of accuracy from the Floats used to store RGB values for each of them. Exercise 10.1 at the end of this chapter discusses this issue further.
10.4.1 Texture Memory Management
Each image map may require a meaningful amount of memory, and a complex scene may have thousands of image maps. Because each image texture may be reused many times within a scene, pbrt maintains a table of image maps that have been loaded so far, so that they are only loaded into memory once even if they are used in more than one ImageTexture. The ImageTexture constructor calls the static ImageTexture::GetTexture() method to get a MIPMap representation of the desired texture.
TexInfo is a simple structure that holds the image map’s filename and filtering parameters; all of these must match for a MIPMap to be reused in another ImageTexture. Its definition is straightforward (its members exactly correspond to the parameters of the GetTexture() method) and won’t be included here.
If the texture hasn’t been loaded yet, a call to ReadImage() yields the image contents.
Because ReadImage() returns an array of RGBSpectrum values for the texels, it is necessary to convert these values to the particular type Tmemory of texel that this MIPMap is storing (e.g., Float) if the type of Tmemory isn’t RGBSpectrum. The per-texel conversion is handled by the utility routine ImageTexture::convertIn().
Per-texel conversion is done using C++ function overloading. For every type to which we would like to be able to convert these values, a separate ImageTexture::convertIn() function must be provided. In the loop over texels earlier, C++’s function overloading mechanism will select the appropriate instance of ImageTexture::convertIn() based on the destination type. Unfortunately, it is not possible to return the converted value from the function, since C++ doesn’t support overloading by return type.
In addition to converting types, these functions optionally scale and gamma correct the texel values to map them to a desired range. Gamma correction is particularly important to handle carefully: computer displays generally don’t exhibit a linear relationship between the pixel values to be displayed and the radiance that they emit. Thus, an artist may create a texture map where, as seen on an LCD display, one part of the image appears twice as bright as another. However, the corresponding pixel values won’t in fact have a 2:1 relationship. (Conversely, pixels whose values have a 2:1 relationship don’t lead to a 2:1 brightness ratio.) This discrepancy is a problem for a renderer using such an image as a texture map, since the renderer usually expects a linear relationship between texel values and the quantity that they represent.
pbrt follows the sRGB standard, which prescribes a specific transfer curve matching the typical behavior of CRT displays. This standard is widely supported on 2015-era devices; other (non-CRT) devices such as LCDs or inkjet printers typically accept sRGB gamma-corrected values as input and then re-map them internally to match the device-specific behavior.
The sRGB gamma curve is a piecewise function with a linear term for low values and a power law for medium to large values.
The GammaCorrect() will be used to write sRGB-compatible 8-bit image files in the WriteImage() function. To import textures into pbrt, we are interested in the opposite direction: removing an existing gamma correction to reestablish a linear relationship between brightness and pixel values.
Refer to the “Further Reading” section for a more detailed discussion of gamma correction.
InverseGammaCorrect() is only applied when indicated via the gamma argument of convertIn(). By default, this is the case when the input image has an 8-bit color depth.
If the texture file wasn’t found or was unreadable, an image map with a single sample with a value of one is created so that the renderer can continue to generate an image of the scene without needing to abort execution. The ReadImage() function will issue a warning message in this case.
After the image is rendered and the system is cleaning up, the ClearCache() method is called to free the memory for the entries in the texture cache.
10.4.2 ImageTexture Evaluation
The ImageTexture::Evaluate() routine does the usual texture coordinate computation and then hands the image map lookup to the MIPMap, which does the image filtering work for antialiasing. The returned value is still of type Tmemory; another conversion step similar to ImageTexture::convertIn() above converts to the returned type Treturn.
10.4.3 MIP Maps
As always, if the image function has higher frequency detail than can be represented by the texture sampling rate, aliasing will be present in the final image. Any frequencies higher than the Nyquist limit must be removed by prefiltering before the function is evaluated. Figure 10.9 shows the basic problem we face: an image texture has texels that are samples of some image function at a fixed frequency. The filter region for the lookup is given by its center point and offsets to the estimated texture coordinate locations for the adjacent image samples. Because these offsets are estimates of the texture sampling rate, we must remove any frequencies higher than twice the distance to the adjacent samples in order to satisfy the Nyquist criterion.
The texture sampling and reconstruction process has a few key differences from the image sampling process discussed in Chapter 7. These differences make it possible to address the antialiasing problem with more effective and less computationally expensive techniques. For example, here it is inexpensive to get the value of a sample—only an array lookup is necessary (as opposed to having to trace a number of rays to compute radiance). Further, because the texture image function is fully defined by the set of samples and there is no mystery about what its highest frequency could be, there is no uncertainty related to the function’s behavior between samples. These differences make it possible to remove detail from the texture before sampling, thus eliminating aliasing.
However, the texture sampling rate will typically change from pixel to pixel. The sampling rate is determined by scene geometry and its orientation, the texture coordinate mapping function, and the camera projection and image sampling rate. Because the texture sampling rate is not fixed, texture filtering algorithms need to be able to filter over arbitrary regions of texture samples efficiently.
The MIPMap class implements two methods for efficient texture filtering with spatially varying filter widths. The first, trilinear interpolation, is fast and easy to implement and was widely used for texture filtering in early graphics hardware. The second, elliptically weighted averaging, is slower and more complex but returns extremely high-quality results. Figure 10.10 compares the result of texture filtering using trilinear interpolation and the EWA algorithm.
To limit the potential number of texels that need to be accessed, both of these filtering methods use an image pyramid of increasingly lower resolution prefiltered versions of the original image to accelerate their operation. The original image texels are at the bottom level of the pyramid, and the image at each level is half the resolution of the previous level, up to the top level, which has a single texel representing the average of all of the texels in the original image. This collection of images needs at most more memory than storing the most detailed level alone and can be used to quickly find filtered values over large regions of the original image. The basic idea behind the pyramid is that if a large area of texels needs to be filtered a reasonable approximation is to use a higher level of the pyramid and do the filtering over the same area there, accessing many fewer texels.
In the constructor, the MIPMap copies the image data provided by the caller, resizes the image if necessary to ensure that its resolution is a power of two in each direction, and initializes a lookup table used by the elliptically weighted average filtering method in Section 10.4.5. It also records the desired behavior for texture coordinates that fall outside of the legal range in the wrapMode argument.
MIPMap is a template class that is parameterized by the data type of the image texels. pbrt creates MIPMaps of both RGBSpectrum and Float images; Float MIP maps are used for representing directional distributions of intensity from goniometric light sources (Section 12.3.3), for example. The MIPMap implementation requires that the type T support just a few basic operations, including addition and multiplication by a scalar.
The ImageWrap enumerant, passed to the MIPMap constructor, specifies the desired behavior when the supplied texture coordinates are not in the legal range.
Implementation of an image pyramid is somewhat easier if the resolution of the original image is an exact power of two in each direction; this ensures that there is a straightforward relationship between the level of the pyramid and the number of texels at that level. If the user has provided an image where the resolution in one or both of the dimensions is not a power of two, then the MIPMap constructor starts by resizing the image up to the next power-of-two resolution greater than the original resolution before constructing the pyramid. Exercise 10.5 at the end of the chapter describes an approach to building image pyramids with non-power-of-two resolutions.
Image resizing here involves more application of the sampling and reconstruction theory from Chapter 7: we have an image function that has been sampled at one sampling rate, and we’d like to reconstruct a continuous image function from the original samples to resample at a new set of sample positions. Because this represents an increase in the sampling rate from the original rate, we don’t have to worry about introducing aliasing due to undersampling high-frequency components in this step; we only need to reconstruct and directly resample the new function. Figure 10.11 illustrates this task in 1D.
The MIPMap uses a separable reconstruction filter for this task; recall from Section 7.8 that separable filters can be written as the product of 1D filters: . One advantage of using a separable filter is that if we are using one to resample an image from one resolution to another , then we can implement the resampling as two 1D resampling steps, first resampling in to create an image of resolution and then resampling that image to create the final image of resolution . Resampling the image via two 1D steps in this manner simplifies implementation and makes the number of texels accessed for each texel in the final image a linear function of the filter width, rather than a quadratic one.
Reconstructing the original image function and sampling it at a new texel’s position are mathematically equivalent to centering the reconstruction filter kernel at the new texel’s position and weighting the nearby texels in the original image appropriately. Thus, each new texel is a weighted average of a small number of texels in the original image.
The MIPMap::resampleWeights() method determines which original texels contribute to each new texel and what the values are of the contribution weights for each new texel. It returns the values in an array of ResampleWeight structures for all of the texels in a 1D row or column of the image. Because this information is the same for all rows of the image when resampling in and all columns when resampling in , it’s more efficient to compute it once for each of the two passes and then reuse it many times for each one. Given these weights, the image is first magnified in the direction, turning the original image with resolution resolution into an image with resolution (resPow2, resolution), which is stored in resampledImage. The implementation here allocates enough space in resampledImage to hold the final zoomed image with resolution (resPow2, resPow2), so two large allocations can be avoided.
For the reconstruction filter used here, no more than four of the original texels will contribute to each new texel after zooming, so ResampleWeight only needs to hold four weights. Because the four texels are contiguous, we only store the offset to the first one.
Just as it was important to distinguish between discrete and continuous pixel coordinates in Chapter 7, the same issues need to be addressed with texel coordinates here. We will use the same conventions as described in Section 7.1.7. For each new texel, this function starts by computing its continuous coordinates in terms of the old texel coordinates. This value is stored in center, because it is the center of the reconstruction filter for the new texel. Next, it is necessary to find the offset to the first texel that contributes to the new texel. This is a slightly tricky calculation—after subtracting the filter width to find the start of the filter’s nonzero range, it is necessary to add an extra offset to the continuous coordinate before taking the floor to find the discrete coordinate. Figure 10.12 illustrates why this offset is needed.
Starting from this first contributing texel, this function loops over four texels, computing each one’s offset to the center of the filter kernel and the corresponding filter weight.
Depending on the filter function used, the four filter weights may not sum to one. Therefore, to ensure that the resampled image won’t be any brighter or darker than the original image, the weights are normalized here.
Once the weights have been computed, it’s easy to apply them to compute the zoomed texels. For each of the resolution horizontal scan lines in the original image, a pass is made across the resPow2 texels in the -zoomed image using the precomputed weights to compute their values. Because the computation for each texel is completely independent of the computation for every other texel, and because this computation requires a bit of processing, it’s worthwhile to split the image into sections and work on them in parallel with multiple threads.
The ImageWrap parameter to the MIPMap constructor determines the convention to be used for out-of-bounds texel coordinates. It either remaps them to valid values with a modulus or clamp calculation or uses a black texel value.
The process for resampling in the direction is almost the same as for , so we won’t include the implementation here.
Once we have an image with resolutions that are powers of two, the levels of the MIP map can be initialized, starting from the bottom (finest) level. Each higher level is found by filtering the texels from the previous level.
Because image maps use a fair amount of memory, and because 8 to 20 texels are typically used per image texture lookup to compute a filtered value, it’s worth carefully considering how the texels are laid out in memory, since reducing cache misses while accessing the texture map can noticeably improve the renderer’s performance. Because both of the texture filtering methods implemented in this section access a set of texels in a rectangular region of the image map each time a lookup is performed, the MIPMap uses the BlockedArray template class to store the 2D arrays of texel values, rather than a standard C++ array. The BlockedArray reorders the array values in memory in a way that improves cache coherence when the values are accessed with these kinds of rectangular patterns; it is described in Section A.4.4 in Appendix A.
The base level of the MIP map, which holds the original data (or the resampled data, if it didn’t originally have power-of-two resolutions), is initialized by the default BlockedArray constructor.
Before showing how the rest of the levels are initialized, we will first define a texel access function that will be used during that process. MIPMap::Texel() returns a reference to the texel value for the given discrete integer-valued texel position. As described earlier, if an out-of-range texel coordinate is passed in, then based on the value of wrapMode, this method effectively repeats the texture over the entire 2D texture coordinate domain by taking the modulus of the coordinate with respect to the texture size, clamps the coordinates to the valid range so that the border pixels are used, or returns a black texel for out-of-bounds coordinates.
For non-square images, the resolution in one direction must be clamped to one for the upper levels of the image pyramid, where there is still downsampling to do in the larger of the two resolutions. This is handled by the following std::max() calls:
The MIPMap uses a simple box filter to average four texels from the previous level to find the value at the current texel. Using the Lanczos filter here would give a slightly better result for this computation, although this modification is left for Exercise 10.4 at the end of the chapter. As with resampling to power-of-two resolutions, doing this downfiltering using multiple threads rather than with a regular for loop is worthwhile here.
10.4.4 Isotropic Triangle Filter
The first of the two MIPMap::Lookup() methods uses a triangle filter over the texture samples to remove high frequencies. Although this filter function does not give high-quality results, it can be implemented very efficiently. In addition to the coordinates of the evaluation point, the caller passes this method a filter width for the lookup, giving the extent of the region of the texture to filter across. This method filters over a square region in texture space, so the width should be conservatively chosen to avoid aliasing in both the and directions. Filtering techniques like this one that do not support a filter extent that is non-square or non-axis-aligned are known as isotropic. The primary disadvantage of isotropic filtering algorithms is that textures viewed at an oblique angle will appear blurry, since the required sampling rate along one axis will be very different from the sampling rate along the other in this case.
Because filtering over many texels for wide filter widths would be inefficient, this method chooses a MIP map level from the pyramid such that the filter region at that level would cover four texels at that level. Figure 10.13 illustrates this idea.
Since the resolutions of the levels of the pyramid are all powers of two, the resolution of level is . Therefore, to find the level with a texel spacing width requires solving
for . In general, this will be a floating-point value between two MIP map levels.
As shown by Figure 10.13, applying a triangle filter to the four texels around the sample point will either filter over too small a region or too large a region (except for very carefully selected filter widths). The implementation here applies the triangle filter at both of these levels and blends between them according to how close level is to each of them. This helps hide the transitions from one MIP map level to the next at nearby pixels in the final image. While applying a triangle filter to four texels at two levels in this manner doesn’t give exactly the same result as applying it to the original highest resolution texels, the difference isn’t too bad in practice, and the efficiency of this approach is worth this penalty. In any case, the elliptically weighted average filtering in the next section should be used when texture quality is important.
Given floating-point texture coordinates in , the MIPMap::triangle() routine uses a triangle filter to interpolate between the four texels that surround the sample point, as shown in Figure 10.14. This method first scales the coordinates by the texture resolution at the given MIP map level in each direction, turning them into continuous texel coordinates. Because these are continuous coordinates, but the texels in the image map are defined at discrete texture coordinates, it’s important to carefully convert into a common representation. Here, we will do all of our work in discrete coordinates, mapping the continuous texture coordinates to discrete space.
For example, consider the 1D case with a continuous texture coordinate of : this coordinate is a distance of below the discrete texel coordinate (which corresponds to a continuous coordinate of ) and is above the discrete coordinate (continuous coordinate ). Thus, if we subtract from the continuous coordinate , giving , we can correctly compute the correct distances to the discrete coordinates and by subtracting coordinates.
After computing the distances in and to the texel at the lower left of the given coordinates, ds and dt, MIPMap::triangle() determines weights for the four texels and computes the filtered value. Recall that the triangle filter is
the appropriate weights follow directly. Notice the similarity between this computation and BilerpTexture::Evaluate().
10.4.5 Elliptically Weighted Average
The elliptically weighted average (EWA) algorithm fits an ellipse to the two axes in texture space given by the texture coordinate differentials and then filters the texture with a Gaussian filter function (Figure 10.15). It is widely regarded as one of the best texture filtering algorithms in graphics and has been carefully derived from the basic principles of sampling theory. Unlike the triangle filter in the previous section, it can filter over arbitrarily oriented regions of the texture, with some flexibility of having different filter extents in different directions. This type of filter is known as anisotropic. This capability greatly improves the quality of its results, since it can properly adapt to different sampling rates along the two image axes.
We won’t show the full derivation of this filter here, although we do note that it is distinguished by being a unified resampling filter: it simultaneously computes the result of a Gaussian filtered texture function convolved with a Gaussian reconstruction filter in image space. This is in contrast to many other texture filtering methods that ignore the effect of the image space filter or equivalently assume that it is a box. Even if a Gaussian isn’t being used for filtering the samples for the image being rendered, taking some account of the spatial variation of the image filter improves the results, assuming that the filter being used is somewhat similar in shape to the Gaussian, as the Mitchell and windowed sinc filters are.
The screen space partial derivatives of the texture coordinates define the axes of the ellipse. The lookup method starts out by determining which of the two axes is the major axis (the longer of the two) and which is the minor, swapping them if needed so that dst0 is the major axis. The length of the minor axis will be used shortly to select a MIP map level.
Next the eccentricity of the ellipse is computed—the ratio of the length of the major axis to the length of the minor axis. A large eccentricity indicates a very long and skinny ellipse. Because this method filters texels from a MIP map level chosen based on the length of the minor axis, highly eccentric ellipses mean that a large number of texels need to be filtered. To avoid this expense (and to ensure that any EWA lookup takes a bounded amount of time), the length of the minor axis may be increased to limit the eccentricity. The result may be an increase in blurring, although this effect usually isn’t noticeable in practice.
Like the triangle filter, the EWA filter uses the image pyramid to reduce the number of texels to be filtered for a particular texture lookup, choosing a MIP map level based on the length of the minor axis. Given the limited eccentricity of the ellipse due to the clamping above, the total number of texels used is thus bounded. Given the length of the minor axis, the computation to find the appropriate pyramid level is the same as was used for the triangle filter. Similarly, the implementation here blends between the filtered results at the two levels around the computed level of detail, again to reduce artifacts from transitions from one level to another.
The MIPMap::EWA() method actually applies the filter at a particular level.
This method first converts from texture coordinates in to coordinates and differentials in terms of the resolution of the chosen MIP map level. It also subtracts from the continuous position coordinate to align the sample point with the discrete texel coordinates, as was done in MIPMap::triangle().
It next computes the coefficients of the implicit equation for the ellipse with axes (ds0,dt0) and (ds1,dt1) and centered at the origin. Placing the ellipse at the origin rather than at simplifies the implicit equation and the computation of its coefficients and can be easily corrected for when the equation is evaluated later. The general form of the implicit equation for all points inside such an ellipse is
although it is more computationally efficient to divide through by and express this as
We will not derive the equations that give the values of the coefficients, although the interested reader can easily verify their correctness.
The next step is to find the axis-aligned bounding box in discrete integer texel coordinates of the texels that are potentially inside the ellipse. The EWA algorithm loops over all of these candidate texels, filtering the contributions of those that are in fact inside the ellipse. The bounding box is found by determining the minimum and maximum values that the ellipse takes in the and directions. These extrema can be calculated by finding the partial derivatives and , finding their solutions for and , and adding the offset to the ellipse center. For brevity, we will not include the derivation for these expressions here.
Now that the bounding box is known, the EWA algorithm loops over the texels, transforming each one to the coordinate system where the texture lookup point is at the origin with a translation. It then evaluates the ellipse equation to see if the texel is inside the ellipse (Figure 10.16) and computes the filter weight for the texel if so. The final filtered value returned is a weighted sum over texels inside the ellipse, where is the Gaussian filter function:
A nice feature of the implicit equation is that its value at a particular texel is the squared ratio of the distance from the center of the ellipse to the texel to the distance from the center of the ellipse to the ellipse boundary along the line through that texel (Figure 10.16). This value can be used to index into a precomputed lookup table of Gaussian filter function values.
The lookup table is initialized the first time a MIPMap is constructed. Because it will be indexed with squared distances from the filter center , each entry stores a value , rather than .
So that the filter function goes to zero at the end of its extent rather than having an abrupt step, std::exp(-alpha) is subtracted from the filter values here.