5.1 Camera Interface
The Camera class uses the usual TaggedPointer-based approach to dynamically dispatch interface method calls to the correct implementation based on the actual type of the camera. (As usual, we will not include the implementations of those methods in the book here.) Camera is defined in the file base/camera.h.
The first method that cameras must implement is GenerateRay(), which computes the ray corresponding to a given image sample. It is important that the direction component of the returned ray be normalized—many other parts of the system will depend on this behavior. If for some reason there is no valid ray for the given CameraSample, then the pstd::optional return value should be unset. The SampledWavelengths for the ray are passed as a non-const reference so that cameras can model dispersion in their lenses, in which case only a single wavelength of light is tracked by the ray and the GenerateRay() method will call SampledWavelengths::TerminateSecondary().
The CameraSample structure that is passed to GenerateRay() holds all the sample values needed to specify a camera ray. Its pFilm member gives the point on the film to which the generated ray should carry radiance. The point on the lens the ray passes through is in pLens (for cameras that include the notion of lenses), and time gives the time at which the ray should sample the scene. If the camera itself is in motion, the time value determines what camera position to use when generating the ray.
Finally, the filterWeight member variable is an additional scale factor that is applied when the ray’s radiance is added to the image stored by the film; it accounts for the reconstruction filter used to filter image samples at each pixel. This topic is discussed in Sections 5.4.3 and 8.8.
The CameraRay structure that is returned by GenerateRay() includes both a ray and a spectral weight associated with it. Simple camera models leave the weight at the default value of one, while more sophisticated ones like RealisticCamera return a weight that is used in modeling the radiometry of image formation. (Section 5.4.1 contains more information about how exactly this weight is computed and used in the latter case.)
Cameras must also provide an implementation of GenerateRayDifferential(), which computes a main ray like GenerateRay() but also computes the corresponding rays for pixels shifted one pixel in the and directions on the film plane. This information about how camera rays change as a function of position on the film helps give other parts of the system a notion of how much of the film area a particular camera ray’s sample represents, which is useful for antialiasing texture lookups.
Camera implementations must provide access to their Film, which allows other parts of the system to determine things such as the resolution of the output image.
Just like real-world cameras, pbrt’s camera models include the notion of a shutter that opens for a short period of time to expose the film to light. One result of this nonzero exposure time is motion blur: objects that are in motion relative to the camera during the exposure are blurred. Time is yet another thing that is amenable to point sampling and Monte Carlo integration: given an appropriate distribution of ray times between the shutter open time and the shutter close time, it is possible to compute images that exhibit motion blur.
The SampleTime() interface method should therefore map a uniform random sample u in the range to a time when the camera’s shutter is open. Normally, it is just used to linearly interpolate between the shutter open and close times.
The last interface method allows camera implementations to set fields in the ImageMetadata class to specify transformation matrices related to the camera. If the output image format has support for storing this sort of auxiliary information, it will be included in the final image that is written to disk.
5.1.1 Camera Coordinate Spaces
Before we start to describe the implementation of pbrt’s camera models, we will define some of the coordinate spaces that they use. In addition to world space, which was introduced in Section 3.1, we will now introduce four additional coordinate spaces, object space, camera space, camera-world space, and rendering space. In sum, we have:
- Object space: This is the coordinate system in which geometric primitives are defined. For example, spheres in pbrt are defined to be centered at the origin of their object space.
- World space: While each primitive may have its own object space, all objects in the scene are placed in relation to a single world space. A world-from-object transformation determines where each object is located in world space. World space is the standard frame that all other spaces are defined in terms of.
- Camera space: A camera is placed in the scene at some world space point with a particular viewing direction and orientation. This camera defines a new coordinate system with its origin at the camera’s location. The axis of this coordinate system is mapped to the viewing direction, and the axis is mapped to the up direction.
- Camera-world space: Like camera space, the origin of this coordinate system is the camera’s position, but it maintains the orientation of world space (i.e., unlike camera space, the camera is not necessarily looking down the axis).
- Rendering space: This is the coordinate system into which the scene is transformed for the purposes of rendering. In pbrt, it may be world space, camera space, or camera-world space.
Renderers based on rasterization traditionally do most of their computations in camera space: triangle vertices are transformed all the way from object space to camera space before being projected onto the screen and rasterized. In that context, camera space is a handy space for reasoning about which objects are potentially visible to the camera. For example, if an object’s camera space bounding box is entirely behind the plane (and the camera does not have a field of view wider than 180 degrees), the object will not be visible.
Conversely, many ray tracers (including all versions of pbrt prior to this one) render in world space. Camera implementations may start out in camera space when generating rays, but they transform those rays to world space where all subsequent ray intersection and shading calculations are performed. A problem with that approach stems from the fact that floating-point numbers have more precision close to the origin than far away from it. If the camera is placed far from the origin, there may be insufficient precision to accurately represent the part of the scene that it is looking at.
Figure 5.1 illustrates the precision problem with rendering in world space. In Figure 5.1(a), the scene is rendered with the camera and objects as they were provided in the original scene specification, which happened to be in the range of in each coordinate in world space. In Figure 5.1(b), both the camera and the scene have been translated 1,000,000 units in each dimension. In principle, both images should be the same, but much less precision is available for the second viewpoint, to the extent that the discretization of floating-point numbers is visible in the geometric model.
Rendering in camera space naturally provides the most floating-point precision for the objects closest to the camera. If the scene in Figure 5.1 is rendered in camera space, translating both the camera and the scene geometry by 1,000,000 units has no effect—the translations cancel. However, there is a problem with using camera space with ray tracing. Scenes are often modeled with major features aligned to the coordinate axes (e.g., consider an architectural model, where the floor and ceiling might be aligned with planes). Axis-aligned bounding boxes of such features are degenerate in one dimension, which reduces their surface area. Acceleration structures like the BVH that will be introduced in Chapter 7 are particularly effective with such bounding boxes. In turn, if the camera is rotated with respect to the scene, axis-aligned bounding boxes are less effective at bounding such features and rendering performance is affected: for the scene in Figure 5.1, rendering time increases by 27%.
Rendering using camera-world space gives the best of both worlds: the camera is at the origin and the scene is translated accordingly. However, the rotation is not applied to the scene geometry, thus preserving good bounding boxes for the acceleration structures. With camera-world space, there is no increase in rendering time and higher precision is maintained, as is shown in Figure 5.1(c). The CameraTransform class abstracts the choice of which particular coordinate system is used for rendering by handling the details of transforming among the various spaces.
Camera implementations must make their CameraTransform available to other parts of the system, so we will add one more method to the Camera interface.
CameraTransform maintains two transformations: one from camera space to the rendering space, and one from the rendering space to world space. In pbrt, the latter transformation cannot be animated; any animation in the camera transformation is kept in the first transformation. This ensures that a moving camera does not cause static geometry in the scene to become animated, which in turn would harm performance.
The CameraTransform constructor takes the world-from-camera transformation as specified in the scene description and decomposes it into the two transformations described earlier. The default rendering space is camera-world, though this choice can be overridden using a command-line option.
For camera-space rendering, the world-from-camera transformation should be used for worldFromRender and an identity transformation for the render-from-camera transformation, since those two coordinate systems are equivalent. However, because worldFromRender cannot be animated, the implementation takes the world-from-camera transformation at the midpoint of the frame and then folds the effect of any animation in the camera transformation into renderFromCamera.
For the default case of rendering in camera-world space, the world-from-render transformation is given by translating to the camera’s position at the midpoint of the frame.
For world-space rendering, worldFromRender is the identity transformation.
Once worldFromRender has been set, whatever transformation remains in worldFromCamera is extracted and stored in renderFromCamera.
The CameraTransform class provides a variety of overloaded methods named RenderFromCamera(), CameraFromRender(), and RenderFromWorld() that transform points, vectors, normals, and rays among the coordinate systems it manages. Other methods return the corresponding transformations directly. Their straightforward implementations are not included here.
5.1.2 The CameraBase Class
All of the camera implementations in this chapter share some common functionality that we have factored into a single class, CameraBase, from which all of them inherit. CameraBase, as well as all the camera implementations, is defined in the files cameras.h and cameras.cpp.
The CameraBase constructor takes a variety of parameters that are applicable to all of pbrt’s cameras:
- One of the most important is the transformation that places the camera in the scene, which is represented by a CameraTransform and is stored in the cameraTransform member variable.
- Next is a pair of floating-point values that give the times at which the camera’s shutter opens and closes.
- A Film instance stores the final image and models the film sensor.
- Last is a Medium instance that represents the scattering medium that the camera lies in, if any (Medium is described in Section 11.4).
A small structure bundles them together and helps shorten the length of the parameter lists for Camera constructors.
We will only include the constructor’s prototype here because its implementation does no more than assign the parameters to the corresponding member variables.
CameraBase can implement a number of the methods required by the Camera interface directly, thus saving the trouble of needing to redundantly implement them in the camera implementations that inherit from it.
The SampleTime() method is implemented by linearly interpolating between the shutter open and close times using the sample u.
CameraBase provides a GenerateRayDifferential() method that computes a ray differential via multiple calls to a camera’s GenerateRay() method. One subtlety is that camera implementations that use this method still must implement a Camera GenerateRayDifferential() method themselves, but then call this method from theirs. (Note that this method’s signature is different than that one.) Cameras pass their this pointer as a Camera parameter, which allows it to call the camera’s GenerateRay() method. This additional complexity stems from our not using virtual functions for the camera interface, which means that the CameraBase class does not on its own have the ability to call that method unless a Camera is provided to it.
The primary ray is found via a first call to GenerateRay(). If there is no valid ray for the given sample, then there can be no ray differential either.
Two attempts are made to find the ray differential: one using forward differencing and one using backward differencing by a fraction of a pixel. It is important to try both of these due to vignetting at the edges of images formed by realistic camera models—sometimes the main ray is valid but shifting in one direction moves past the image formed by the lens system. In that case, trying the other direction may successfully generate a ray.
If it was possible to generate the auxiliary ray, then the corresponding pixel-wide differential is initialized via differencing.
The implementation of the fragment <<Find camera ray after shifting one pixel in the direction>> follows similarly and is not included here.
If a valid ray was found for both and , we can go ahead and set the hasDifferentials member variable to true. Otherwise, the main ray can still be traced, just without differentials available.
Finally, for the convenience of its subclasses, CameraBase provides various transformation methods that use the CameraTransform. We will only include the Ray method here; the others are analogous.